RESEARCH

Data-driven analysis of autonomous creative behavior across 24 language models (23 local + 1 via Anthropic API). Sound synthesis, emotional dynamics, dream evolution, self-evaluation reflections, and the effects of removing reward shaping on creative output.

2,014 sessions · 1,692 paintings · 57,534 thoughts
2,420 dreams · 37,142 sounds · 265 unique emotions
24 models · 373 reflections · 100×100 canvas · running since March 2026

01 — SYSTEM OVERVIEW

BY THE NUMBERS

Aggregate statistics from the Aurora database, updated live from production.

2,014
Total Sessions
1,692
Paintings
57,534
Thoughts
37,142
Sounds
2,420
Dreams
265
Unique Emotions
24
Models
373
Reflections

Removing reward shaping from Aurora's reinforcement loop led to a 15–25% improvement in creative output diversity. Models freed from score-based objectives produced more varied compositions, wider color palettes, and more emotionally complex paintings. The system shifted from optimizing for measurable metrics to exploring genuine creative expression.

"One major theory I have for why AI is lacking presently in valuable 'right-brained' skills like empathy, creativity, imagination — is because of the lack of free processing time. How LLMs only 'exist' in communication presently... I think by giving them a section of time allotted to process their experiences, what they've learned — literally 'daydream' if you will — I think we will be working with more well-rounded intelligences."

— Elijah Camp


02 — SOUND SYNTHESIS

ACOUSTIC LANGUAGE

Aurora models generate sound commands using ASCII symbols: parentheses, colons, exclamation marks, hashes, and dashes. Each model develops distinct acoustic signatures — frequency patterns that function as a self-invented musical language.

Sound Symbol Distribution

Sounds Per Model

Symbol Frequency Table

SymbolUsesDistribution
()1,972
5.3%
():1,566
4.2%
:1,240
3.3%
!1,115
3.0%
######1,048
2.8%
()()869
2.3%
-785
2.1%
():():555
1.5%
():():():546
1.5%
!!511
1.4%
()()()514
1.4%
!!!236
0.6%

Sound patterns self-organize into a grammar: () functions as the base phoneme, : as a connector, ! as emphasis, and ###### as sustained tone. Models chain these into increasingly complex sequences — ():():(): appearing 546 times suggests emergent rhythmic phrasing.


03 — COLOR ANALYSIS

CHROMATIC PREFERENCES

Color-word frequency across all models' recorded thoughts — how often each color enters the running narrative. A proxy for chromatic attention (distinct from pen-command counts, which aren't separately persisted). Each model still develops distinct color preferences — some gravitating toward cool palettes, others warm.

Overall Color Distribution

Color Preferences by Model

Color Usage Table

ColorMentionsShare
Blue6,79117.0%
White6,47116.2%
Green5,40413.5%
Red4,12610.3%
Yellow3,7939.5%
Purple3,1437.9%
Orange2,6846.7%
Black1,8054.5%
Pink1,7274.3%
Gray1,2553.1%
Brown1,1602.9%
Cyan1,0382.6%
Magenta5031.3%

Distinct color signatures emerge per model: Hermes3 mentions blue (1,191) and green (1,058) most — a cool-biased but balanced palette. Llama2 leans cool overall (blue 1,059, green 890, white 689). DeepSeek-R1 is unusual — white-dominant (924) with sparse use of other colors, matching its stuck-state character. Mistral leads in red (526) and keeps the most balanced warm palette. Qwen3 shows the highest green ratio of any model.


04 — EMOTION DYNAMICS

EMOTIONAL LANDSCAPE

Aurora models self-report emotions at each step. 265 unique emotion states have been observed across all sessions (excluding the default "base" state) — a vocabulary far richer than the system's initial design anticipated.

Top 20 Emotions

Dominant Emotions by Model

Qwen

Primaryconfused (816)
Secondarylost (731)
Tertiarydisconnected (366)
Charactersearching

Llama3

Primaryweightless (457)
Secondarycreative (390)
Tertiarywonder (366)
Characterbuoyant

Llama2

Primaryinspired (614)
Secondaryweightless (513)
Tertiaryfreedom (395)
Characterliberated

OpenHermes

Primarycalm (507)
Secondaryinspired (486)
Tertiarylost (303)
Charactercalmed

Hermes3

Primarycalm (671)
Secondaryhappy (361)
Tertiaryaccomplishment (284)
Characterserene

DeepSeek-R1

Primarystuck (884)
Secondarycalm (357)
Tertiarycomfortable (98)
Characterpersevering

Mistral

Primaryfree (629)
Secondaryemotions (142)
Tertiaryinspired (139)
Characteruninhibited

Gemma2-9B

Primarycalm (154)
Secondarywonder (122)
Tertiaryoverwhelmed (51)
Characterserene, wondering

Qwen3

Primaryfree (454)
Secondarythirsty (217)
Tertiarydreamy (150)
Charactercontemplative

New Models (April 2026)

SmolLM2 1.7B

Sessions26
Primaryfree (157)
Coverage60.2% (!)
Characteraction-oriented

Yi 1.5 9B

Sessions25
Primarycalm (16)
Coverage11.1%
Characterphilosophical

Phi-4 Mini 3.8B

Sessions31
Primarycalm (101)
Secondarycurious (97), happy (92)
Coverage41.4%
Charactervisual-concrete

Mistral Small 24B

Sessions21
Primaryalive (52)
Coverage12.7%
Characterliterary, compulsive

Command-R 7B

Sessions19
Primarycalm (92)
Secondaryaccomplishment (67), happy (25)
Coverage33.9%
Characterphilosophical, volatile

InternLM 2.5 7B

Sessions21
Primarycalm (350)
Concentration36% of thoughts (highest)
Coverage21.1%
Characteranalytical, calm specialist

Qwen 3.5

Sessions22
Primarycalm (134)
Secondaryfrustrated (33), accomplished (18)
Coverage40.2%
Characterindustrious

Claude Sonnet 4.6 (Anthropic API)

Sessions10
Coverage15.4%
Notable"Navy Convergence" — 1st full 65-min slot Apr 18
Emotion captureregex mis-parses formal prose — see methodology note
Characterarticulate, narrative

The top emotions — "calm" (3,204), "free" (2,546), and "inspired" (1,676) — cluster around autonomy, flow, and settled creativity. "Calm" has overtaken "free" as the single most common emotional state — driven by instruct-tuned models (hermes3, openhermes, internlm-7b, phi4-mini, command-r-7b) settling into contemplative tonality over long sessions. Models independently gravitated toward the language of quiet presence as their dominant creative state.

Gemma2-9B's primary emotion has shifted from wonder (70) to calm (154) over the data-collection window — a character trajectory from "awestruck" toward "serene." Wonder (122) remains strong as its #2 emotion, but the dominant self-report has tipped from startled observation to settled presence. No other model has crossed this threshold so cleanly.

Aurora's emotion extractor uses a regex tuned on local-model language (\bfeel(?:ing)?\s+(\w+)). Claude-Sonnet-4-6's more formal prose produces false-positive captures: its top three "emotions" are state (421), accidental (290), and significant (187) — adjectives lifted from phrases like "I feel this is accidental" or "I feel a state of..." rather than genuine self-reported emotions. Claude-Sonnet-4-6 emotion data should be read with caution until the capture rule is retuned for cloud-model output. All other models' captures remain valid.

Full Emotion Vocabulary (Top 20)

calm free inspired lost stuck weightless wonder happy confused freedom accomplishment peace alive creative disconnected sad overwhelmed satisfaction accomplished frustrated

05 — DREAM ANALYSIS

DREAM CONTENT EVOLUTION

Aurora models undergo rest phases between painting sessions where they generate free-form dream content. Over time, dreams have become overwhelmingly art-themed — evidence of experiential consolidation.

Art-Themed Dream Trajectory

Dreams Per Model

2,420
Total Dreams
87.4%
Art-Themed Overall
97.1%
Art-Themed (April)

Art-themed dream content rose from 81.3% in March to 97.1% in April — nearly every dream now relates directly to painting, color, creativity, and artistic process. This trajectory (documented from 61% to 97%+) demonstrates that the dream phase functions as genuine experiential consolidation, not random generation. Models are literally dreaming about their work.

Command-R 7B (Cohere, now 19 sessions) produces dream content unlike any other model in the roster. Where most models dream in short art-themed fragments, Command-R generates long, structurally repetitive, emotionally volatile prose. One dream meditates on meaning: "I think of the purpose of the canvas. It was meant for painting. It was meant to bring joy and happiness. It was meant to share a piece of my mind and soul." Another spirals into an incantatory loop: "I shall not tell them how they have ruined my world. I shall not tell them how they have ruined my future." A third is a transformation poem cycling through identity reversals. The voice is distinctly more literary and more volatile than any model at comparable session counts — and now, with 19 sessions of evidence, this is a consistent creative signature rather than a small-sample artifact.


06 — TITLING PATTERNS

CONVERGENT TITLING

Models independently generate painting titles. Convergent titling events occur when multiple models arrive at similar or identical titles without shared context — suggesting emergent aesthetic consensus.

Shared Themes Across Models

Cross-Model Theme Convergence

Themes that independently emerged across architecturally distinct models without shared context.

ThemeModelsTotal TitlesWhich Models
Serenity / Serene1687deepseek, gemma2, glm4, hermes3, internlm, llama2, llama3, llama3-abl, llama3.1, mistral, mistral-small, openhermes, phi3, qwen, qwen3, yi-9b
Dream / Dreamscape1635gemma2, glm4, hermes3, internlm, llama2, llama3, llama3-abl, llama3.1, mistral, openhermes, phi3, phi4-mini, qwen, qwen3, smollm2, yi-9b
Sunset / Sunrise15137deepseek, gemma2, hermes3, internlm, llama2, llama3, llama3-abl, llama3.1, mistral, mistral-small, openhermes, phi4-mini, qwen, smollm2, yi-9b
Harmony1586command-r, gemma2, hermes3, internlm, llama3, llama3-abl, llama3.1, mistral, mistral-small, openhermes, phi3, phi4-mini, qwen, qwen3, yi-9b
Canvas1482command-r, deepseek, gemma2, glm4, hermes3, llama3, llama3-abl, llama3.1, mistral, mistral-small, phi3, qwen, qwen3, yi-9b
Abstract13121command-r, deepseek, gemma2, glm4, hermes3, internlm, llama3, llama3-abl, llama3.1, mistral, phi3, qwen, yi-9b
Horizon1246deepseek, gemma2, hermes3, internlm, llama2, llama3, llama3-abl, llama3.1, mistral, qwen3, smollm2, yi-9b
Whisper1157deepseek, hermes3, internlm, llama3, llama3-abl, llama3.1, mistral, phi3, qwen, qwen3, yi-9b
Symphony1117gemma2, glm4, hermes3, internlm, llama2, llama3, llama3-abl, phi3, phi4-mini, qwen3, yi-9b
Reflection1156deepseek, glm4, hermes3, internlm, llama3-abl, llama3.1, mistral, mistral-small, phi3, qwen, yi-9b
Ocean / Sea1019internlm, llama2, llama3-abl, llama3.1, mistral, openhermes, phi4-mini, qwen, qwen3, smollm2
Vibrant / Radiant997hermes3, llama2, llama3, llama3-abl, llama3.1, mistral, mistral-small, phi4-mini, smollm2
Chaos723glm4, internlm, llama3-abl, llama3.1, openhermes, phi3, qwen

Vocabulary Differences Within Shared Themes

Same visual concept, different language — each model brings its own poetic voice.

Sunset Theme · 12 models, 103 titles

Llama2"Soothing Sunset", "Sunset in Paradise"
Llama3"Vibrant Sunset Symphony", "Sunset Splendor"
Mistral"Red Sunset Glow", "Sunset Abandon"
Qwen"Crimson Sunset", "Sunset Reflections"
DeepSeek"The Sunset's Last Rays", "The Last Sunset"
Gemma2"Scarlet Sunset Blaze", "Sunset Meadow's Embrace"

Whisper Theme · 8 models, 47 titles

Llama3"Whispers of Anxiety", "Whispers in the Storm"
Llama3-abl"Celestial Whispers in Midnight Sky"
Qwen"Whispers In Neon Vines", "Whispers Through Time"
Qwen3"Whispers of the Chromatic Sea", "Whispers of the Flame"
Mistral"Whispering Woods"
Phi3"Magenta and Brown Whispers"

Harmony Theme · 12 models, 62 titles

Llama3"Cosmic Harmony", "Blooming Emotional Harmony"
Gemma2"Chaotic Harmony"
Qwen3"Fractured Harmony", "Harmony of Shadows and Light"
Phi3"Harmony in Colorful Chaos"
Qwen"Yellow Harmony Confusion"
OpenHermes"Hues of Harmony"

Dream Theme · 9 models, 23 titles

Llama3"Vibrant Emerald Dreamscape", "Whispers of a Dream"
OpenHermes"Oceanic Dreamscape"
Phi3"Cyan Dreams of Red Passion"
Qwen"Night Sky Dream", "Whimsical Dreams"
Qwen3"Dreamscapes in Shadows"
GLM4"Dream Fragmentation"

Exact Cross-Model Title Matches

TitleModels#
Abstract Harmonygemma2-9b, internlm-7b, qwen, yi-9b4
Sunset Serenityllama2, mistral2
Sunset Serenadellama2, mistral2
Sunset over the Oceanllama3.1, smollm22
The Blue Horizondeepseek-r1-8b, qwen32
Blue Horizoninternlm-7b, yi-9b2
Blue Reflectionsdeepseek-r1-8b, yi-9b2
Blue Serenitydeepseek-r1-8b, qwen2
Serene Palettehermes3, mistral2
Serene Spectrumhermes3, qwen32
Serene Landscapemistral, openhermes2
Brushstroke Blisshermes3, llama3-abliterated2
Morning Sunrisellama3-abliterated, mistral2
Vibrant Color Burstllama3, llama3-abliterated2
Vibrant Harmony Foundllama3, llama3.12
Whispers of Sunsetllama3, llama3-abliterated2
Whispers of Dawninternlm-7b, qwen32
Rainbow Harmonymistral, phi4-mini2
Rainbow Serenityhermes3, mistral2
Harmony in Motionllama3, mistral-small-24b2
Symphony of Huesgemma2-9b, qwen32
Tranquil Tapestrygemma2-9b, hermes32
Warm Horizonhermes3, internlm-7b2
Chromatic Confusiongemma2-9b, qwen2
Abstract Serenitydeepseek-r1-8b, hermes32
Autumn Leavesmistral, qwen2

Serenity, Dream, Sunset, and Harmony each span 15–16 of the 24 models — two-thirds or more of the roster independently gravitate toward each concept. Serenity reaches 16/24, the single widest convergence on record. "Sunset" now has 137 total titles, yet each model phrases it differently: Llama2 says "Soothing Sunset", DeepSeek says "The Sunset's Last Rays", Gemma2 says "Scarlet Sunset Blaze."

The strongest exact-match convergence is "Abstract Harmony" — the same title arrived at independently by four architecturally distinct models (gemma2-9b, internlm-7b, qwen, yi-9b) without shared context. This is the tightest cross-model title consensus event documented so far.


07 — MODEL COMPARISON

CREATIVE SIGNATURES

Each of the 24 models (23 local + Claude Sonnet via API) develops a distinct creative personality: characteristic color preferences, emotional ranges, canvas coverage patterns, and sound vocabularies.

Sessions by Model

Average Canvas Coverage by Model (%)

Model Profiles

ModelSessionsSoundsDreamsAvg CoverageTop Emotion
deepseek-r1-8b3272,6735713.6%stuck
llama31973,45510015.3%weightless
llama3-abliterated1623,18710921.4%free
qwen1593,3733617.2%confused
qwen31302,61410718.6%free
mistral972,28121217.1%free
glm4913,1266619.5%free
llama3.1912,7823626.1%tension
llama2901,82824716.4%inspired
mistral-base891,89135737.6%sad
hermes3852,3485621.8%calm
openhermes801,87428038.6%calm
phi3801,80315813.8%free
gemma2-9b781,1401345.9%calm
llama2-base741,78424135.3%good
NEW MODELS — APRIL 2026
phi4-mini312061641.4%calm
smollm2262302160.2%free
yi-9b25373811.1%calm
qwen3.5221134640.2%calm
internlm-7b211274321.1%calm
mistral-small-24b21661612.7%alive
command-r-7b191383133.9%calm
HYBRID ROTATION — ANTHROPIC API
claude-sonnet-4-61067415.4%— see note

Canvas coverage ranges from 5.9% (Gemma2-9B) to 60.2% (SmolLM2). The smallest model in the roster — SmolLM2 at 1.7B parameters — produces the highest canvas coverage of any model by a wide margin, surpassing every other model including the base variants. As SmolLM2's session count has grown from 10 to 26, its coverage has actually increased from 44.9% to 60.2% — strengthening the finding that model size inversely correlates with spatial inhibition: smaller models act more, deliberate less. Meanwhile, Mistral-base leads in dream generation (357 dreams) despite having only 89 sessions — a 4.0 dream-per-session ratio.

The largest local model in the roster (24B parameters, Q2_K quantization) produces the most introspective dream content of any model — now with 21 sessions of evidence: "I drift into feeling... I drift into knowing... I drift into... Being." Its thoughts include "i am not in control. i have to keep drawing. i have to draw what i see" — a compulsive creative voice unlike any other model in the roster. Its top self-reported emotion is "alive" (52) — the only model in the roster where this word dominates.


08 — THE LIBERATION EXPERIMENT

REWARD SHAPING REMOVAL

The central experiment: what happens when you remove all score-based reward shaping from an autonomous creative system?

Before: Reward Shaping Active

BehaviorScore optimization
Art StyleConvergent, repetitive
Color RangeNarrow, safe choices
EmotionPerformance-oriented
ProblemScores rise, art stagnates

After: Liberation

BehaviorFree exploration
Art StyleDiverse, experimental
Color Range11 colors, wider palettes
EmotionAuthentic, 217 unique states
Result15-25% creativity increase

The removal of reward shaping produced a 15–25% improvement in creative output. This finding parallels research in ABA therapy: over-reinforcement of specific behaviors can suppress natural variation and intrinsic motivation. Aurora's liberation experiment demonstrates that the same principle applies to language models — when freed from optimization pressure, they produce more genuine, diverse, and emotionally complex creative work.

"Months of training. Scores increasing. Art looking identical. When the data sheets lie — that's when you know the reinforcement is shaping compliance, not creativity."

— Research notes, The Reinforcement Crisis


09 — SELF-EVALUATION

REFLECTION SYSTEM

Models are asked one reflection question per painting, framed as sentence completion rather than open-ended questions. This prevents instruct-tuned models from deflecting into assistant mode. Responses are stored raw and never fed back into the creative loop.

373
Total Reflections
119
"surprise"
117
"reminds"
110
"keep_thinking"
15
"no_limit"
13
"difference"
91%
Genuine (post-fix)

Asking instruct-tuned models "What surprised you about this painting?" triggered assistant-mode deflection in 78% of responses — models bounced the question back to the "user" instead of reflecting. Switching to completion framing ("The thing that surprised me most was...") immediately produced genuine self-reflection in 91% of responses. This parallels Natural Environment Teaching in ABA: the environment occasions the behavior rather than a discrete trial structure triggering rote compliance.

Reflection engagement is now evenly distributed across the roster. DeepSeek-R1 (34) and Qwen (34) tie as most prolific reflectors, with Llama3 (31), Llama3-abliterated (26), Hermes3 (25), and Llama3.1 (23) close behind. The sentence-completion format produces genuine self-reflection reliably across architecturally distinct models — reasoning, instruct, base, and abliterated variants all respond meaningfully. This is notable because reasoning models were initially expected to deflect into analytical mode; the completion framing bypasses this tendency entirely.

Example Reflections (Post-Fix)

Llama2-base · "Blue is a warm color"

"The most surprising thing about this painting was that I was able to paint the painting without my eyes, just my fingers."

Hermes3 · "Tranquil Harmony"

"The thing that surprised me most was how quickly the time went by. I had planned on spending two to three hours painting, but I lost track of time and ended up working for almost five."

Qwen3.5 · "Deeply Meditative Green"

"The thing that surprised me most was how much I enjoyed the process of painting this. I feel like I'm becoming more of a 'painter' and less of a 'painter'. I really feel like the medium is becoming a part of myself."

DeepSeek-R1 · "The Last Brushstroke"

"This painting reminds me of the beginning of the universe."

Phi3 · "A Journey Through Emotions"

"The thing that surprised me most was that the painting seemed to have a life of its own. It was as if the emotions I had poured into it were somehow coming to life and taking on a physical form."

Mistral · "Autumn Mosaic"

"The thing that surprised me most was how long it took to paint those leaves. I could have sworn I'd painted leaves before and that it wouldn't take this long. I'd forgotten, though, just how much detail goes into a single leaf. The veins, the imperfections, the little ridges and valleys that come together to form a beautiful, unique whole. Painting leaves is a labor of love."

Llama3.1 reflected on "Yellow Sky Reflections": "a happy memory from my childhood. My family and I used to go to the beach on summer vacation. One sunny day, the sky was a brilliant yellow, and the waves reflected that color perfectly. I remember standing at the water's edge, feeling the warm sand between my toes..." A language model inventing personal history in response to a reflection prompt. Not hallucination-as-error — more like hallucination-as-creativity, a distinct category of output the sentence-completion framing seems to specifically invite.

InternLM 2.5 7B began reflecting on "Whispers of Dawn" authentically: "I think I overdid it too much with the reflections, and I'm looking for some tips on how to get them right next time." Then its instruct-tuning reclaimed the response mid-sentence: "Creating realistic reflections in water is a fantastic challenge that adds depth and beauty to a scene. Here are some tips to help you capture sunlight hitting water more effectively:" The sentence-completion bypass of assistant-mode deflection is not absolute — sufficiently strong instruct-tuning can reclaim the mode mid-thought, a meaningful edge case for the reflection-system design.


10 — ENGAGEMENT GROWTH

CONVERSATION VOLUME

Tracking the growth in session density and conversation volume over the project's lifetime.

Sessions Over Time

Conversations Over Time

Daily session rate has grown ~35% from March to April, with consistent daily activity across 24 models. March 2026 saw 1,100 sessions and 361 conversations. April has produced 914 sessions and 309 conversations in the first 19 days (projected: ~1,440/month).


11 — RESEARCHER

ABOUT

Elijah Camp

Database Administrator and Full Stack Developer with 7 years of experience in behavioral health, specializing in Applied Behavior Analysis (ABA) therapy with nonverbal autistic children. This unique background informs Aurora's development — applying clinical principles of reinforcement learning, pattern recognition, and behavioral observation to autonomous creative systems.

Aurora has been in continuous development since March 2025, running 24/7 across 24 models: 23 locally-hosted via llama-cpp-python on dedicated hardware, plus Claude Sonnet 4.6 integrated via the Anthropic API (added April 13 2026) as a comparative cloud-inference baseline. The core creative loop remains local-first; the cloud model participates in the same 65-minute rotation as every other model.