Stateful Viewers

Stateful Viewers is an art and research project that simulates a visitor walking through a gallery. A vision–language model reflects on images one at a time, carrying forward memory, attention, and affect so that each encounter subtly shapes how subsequent images are perceived.

Rather than treating images as independent inputs, the system models viewing as a continuous, cumulative experience.

Stateful Viewers interface during a gallery walk-through
Stateful Viewers interface during a gallery walk-through
Experience the Demo
View the Source Code

Research Positioning

The project sits at the intersection of art practice and computational modeling of aesthetic experience.

Stateful Viewers draws on reception theory, phenomenology, and aesthetic psychology. It models a viewer’s perceptual stance prior to viewing, maintains a stable expressive voice across images, and treats emotional response as something that unfolds over time.

The system operationalizes qualitative theories of aesthetic experience within a generative framework, without reducing experience to numerical scores or fixed emotion labels.

Where affective computing often asks what emotion is present, Stateful Viewers asks what it is like to encounter this image, having already encountered the previous ones.

Stateful Viewing Process

Stateful viewing process

Each viewer has a Profile (perceptual dispositions), a Reflection Style (expressive voice), and an evolving Internal State (momentary inner condition). For each image, the model is conditioned on the image plus these viewer attributes, and outputs both a Reflection and an Updated Internal State. That updated state carries forward to the next image, creating a continuous experiential trajectory across encounters.

Viewer Profile

The viewer profile describes stable perceptual and interpretive dispositions — how this person characteristically attends to, processes, and makes meaning from visual art. It is defined across seven dimensions.

View the dimensions
  • Tolerance for ambiguity — comfort with uncertainty and open interpretation
  • Attention style — absorbed/dwelling ↔ scanning/restless
  • Embodied orientation — somatic ↔ cognitive
  • Interpretive posture — literal/descriptive ↔ symbolic/associative ↔ autobiographical
  • Aesthetic conditioning — naïve ↔ highly conditioned, with art background
  • Motivational stance — seeking challenge/novelty ↔ seeking comfort/familiarity
  • Memory integration tendency — integrative/accumulative ↔ discrete/reset

Reflection Style

Independent of the profile, the reflection style defines how experience is expressed in language — the texture, rhythm, and habits of inner speech. It is defined across seven dimensions.

View the dimensions
  • Lexical register — plain/conversational ↔ literary/poetic
  • Emotion explicitness — implicit/suggested ↔ explicit/named
  • Voice stability — steady/composed ↔ fragmented/shifting
  • Sensory modality emphasis — visual, kinesthetic, auditory, or mixed
  • Self-reference mode — first-person intimate ↔ observational/impersonal
  • Metaphor density — spare/literal ↔ rich/figurative
  • Pacing — terse/compressed ↔ expansive/flowing

Initial Internal State

Before encountering any images, the system generates an initial internal state — a momentary snapshot of how the viewer arrives at the gallery on this particular day. This state is expressed across seven qualitative dimensions that will evolve over time:

View the dimensions
  • Dominant mood — e.g. calm, restless, melancholic, alert, wistful
  • Underlying tension or ease — the deeper felt texture beneath the surface mood
  • Energy and engagement — depleted/fatigued ↔ energized/ready
  • Emotional openness — guarded/defended ↔ receptive/permeable
  • Attentional focus — narrow/concentrated ↔ diffuse/wandering
  • Meaning-making pressure — strong pressure to understand ↔ letting-be
  • Somatic activation — body barely present ↔ intensely present

Stateful Reflections

Each reflection is generated from the viewer profile, reflection style, and current internal state. After each image, the internal state is updated for the next step — typically shifting incrementally, unless an image produces a stronger disruption.

The first image starts from the initial internal state; each subsequent image begins from the state carried forward from the previous reflection.

Because the same seven dimensions are maintained across steps, small variations can accumulate into meaningful experiential trajectories.

Summarizing Trajectories

Reflection sessions can be treated as experiential trajectories — ordered paths of internal state through a gallery. The system can generate short narrative summaries describing how the experience unfolds (e.g. gradual settling, oscillation, intensification, depletion).

Analysis remains qualitative and phenomenological, rather than reducing experience to valence/arousal axes or sentiment scores.

Why this matters

By modeling how perception evolves across a sequence of images, Stateful Viewers makes it possible to study, compare, and design trajectories of aesthetic experience — not just isolated reactions to individual works.

This opens a space for new forms of artistic experimentation, as well as computational investigations into how memory, attention, and affect shape interpretation over time.