Stateful Viewers is an art and research project that simulates a visitor walking through a gallery. A vision–language model reflects on images one at a time, carrying forward memory, attention, and affect so that each encounter subtly shapes how subsequent images are perceived.
Rather than treating images as independent inputs, the system models viewing as a continuous, cumulative experience.
The project sits at the intersection of art practice and computational modeling of aesthetic experience.
Stateful Viewers draws on reception theory, phenomenology, and aesthetic psychology. It models a viewer’s perceptual stance prior to viewing, maintains a stable expressive voice across images, and treats emotional response as something that unfolds over time.
The system operationalizes qualitative theories of aesthetic experience within a generative framework, without reducing experience to numerical scores or fixed emotion labels.
Where affective computing often asks what emotion is present, Stateful Viewers asks what it is like to encounter this image, having already encountered the previous ones.
Each viewer has a Profile (perceptual dispositions), a Reflection Style (expressive voice), and an evolving Internal State (momentary inner condition). For each image, the model is conditioned on the image plus these viewer attributes, and outputs both a Reflection and an Updated Internal State. That updated state carries forward to the next image, creating a continuous experiential trajectory across encounters.
The viewer profile describes stable perceptual and interpretive dispositions — how this person characteristically attends to, processes, and makes meaning from visual art. It is defined across seven dimensions.
Independent of the profile, the reflection style defines how experience is expressed in language — the texture, rhythm, and habits of inner speech. It is defined across seven dimensions.
Before encountering any images, the system generates an initial internal state — a momentary snapshot of how the viewer arrives at the gallery on this particular day. This state is expressed across seven qualitative dimensions that will evolve over time:
Each reflection is generated from the viewer profile, reflection style, and current internal state. After each image, the internal state is updated for the next step — typically shifting incrementally, unless an image produces a stronger disruption.
The first image starts from the initial internal state; each subsequent image begins from the state carried forward from the previous reflection.
Because the same seven dimensions are maintained across steps, small variations can accumulate into meaningful experiential trajectories.
Reflection sessions can be treated as experiential trajectories — ordered paths of internal state through a gallery. The system can generate short narrative summaries describing how the experience unfolds (e.g. gradual settling, oscillation, intensification, depletion).
Analysis remains qualitative and phenomenological, rather than reducing experience to valence/arousal axes or sentiment scores.
By modeling how perception evolves across a sequence of images, Stateful Viewers makes it possible to study, compare, and design trajectories of aesthetic experience — not just isolated reactions to individual works.
This opens a space for new forms of artistic experimentation, as well as computational investigations into how memory, attention, and affect shape interpretation over time.