Aurora Research | Creative AI Autonomy Studies

01 — SYSTEM OVERVIEW

BY THE NUMBERS

Aggregate statistics from the Aurora database, updated live from production.

2,014

Total Sessions

1,692

Paintings

57,534

Thoughts

37,142

Sounds

2,420

Dreams

265

Unique Emotions

Models

373

Reflections

Removing reward shaping from Aurora's reinforcement loop led to a 15–25% improvement in creative output diversity. Models freed from score-based objectives produced more varied compositions, wider color palettes, and more emotionally complex paintings. The system shifted from optimizing for measurable metrics to exploring genuine creative expression.

"One major theory I have for why AI is lacking presently in valuable 'right-brained' skills like empathy, creativity, imagination — is because of the lack of free processing time. How LLMs only 'exist' in communication presently... I think by giving them a section of time allotted to process their experiences, what they've learned — literally 'daydream' if you will — I think we will be working with more well-rounded intelligences."

— Elijah Camp

Symbol	Uses	Distribution
()	1,972	5.3%
():	1,566	4.2%
:	1,240	3.3%
!	1,115	3.0%
######	1,048	2.8%
()()	869	2.3%
-	785	2.1%
():():	555	1.5%
():():():	546	1.5%
!!	511	1.4%
()()()	514	1.4%
!!!	236	0.6%

03 — COLOR ANALYSIS

CHROMATIC PREFERENCES

Color-word frequency across all models' recorded thoughts — how often each color enters the running narrative. A proxy for chromatic attention (distinct from pen-command counts, which aren't separately persisted). Each model still develops distinct color preferences — some gravitating toward cool palettes, others warm.

Overall Color Distribution

Color Preferences by Model

Color Usage Table

Color	Mentions	Share
█ Blue	6,791	17.0%
█ White	6,471	16.2%
█ Green	5,404	13.5%
█ Red	4,126	10.3%
█ Yellow	3,793	9.5%
█ Purple	3,143	7.9%
█ Orange	2,684	6.7%
█ Black	1,805	4.5%
█ Pink	1,727	4.3%
█ Gray	1,255	3.1%
█ Brown	1,160	2.9%
█ Cyan	1,038	2.6%
█ Magenta	503	1.3%

Distinct color signatures emerge per model: Hermes3 mentions blue (1,191) and green (1,058) most — a cool-biased but balanced palette. Llama2 leans cool overall (blue 1,059, green 890, white 689). DeepSeek-R1 is unusual — white-dominant (924) with sparse use of other colors, matching its stuck-state character. Mistral leads in red (526) and keeps the most balanced warm palette. Qwen3 shows the highest green ratio of any model.

04 — EMOTION DYNAMICS

EMOTIONAL LANDSCAPE

Aurora models self-report emotions at each step. 265 unique emotion states have been observed across all sessions (excluding the default "base" state) — a vocabulary far richer than the system's initial design anticipated.

Top 20 Emotions

Dominant Emotions by Model

Qwen

Primaryconfused (816)

Secondarylost (731)

Tertiarydisconnected (366)

Charactersearching

Llama3

Primaryweightless (457)

Secondarycreative (390)

Tertiarywonder (366)

Characterbuoyant

Llama2

Primaryinspired (614)

Secondaryweightless (513)

Tertiaryfreedom (395)

Characterliberated

OpenHermes

Primarycalm (507)

Secondaryinspired (486)

Tertiarylost (303)

Charactercalmed

Hermes3

Primarycalm (671)

Secondaryhappy (361)

Tertiaryaccomplishment (284)

Characterserene

DeepSeek-R1

Primarystuck (884)

Secondarycalm (357)

Tertiarycomfortable (98)

Characterpersevering

Mistral

Primaryfree (629)

Secondaryemotions (142)

Tertiaryinspired (139)

Characteruninhibited

Gemma2-9B

Primarycalm (154)

Secondarywonder (122)

Tertiaryoverwhelmed (51)

Characterserene, wondering

Qwen3

Primaryfree (454)

Secondarythirsty (217)

Tertiarydreamy (150)

Charactercontemplative

New Models (April 2026)

SmolLM2 1.7B

Sessions26

Primaryfree (157)

Coverage60.2% (!)

Characteraction-oriented

Yi 1.5 9B

Sessions25

Primarycalm (16)

Coverage11.1%

Characterphilosophical

Phi-4 Mini 3.8B

Sessions31

Primarycalm (101)

Secondarycurious (97), happy (92)

Coverage41.4%

Charactervisual-concrete

Mistral Small 24B

Sessions21

Primaryalive (52)

Coverage12.7%

Characterliterary, compulsive

Command-R 7B

Sessions19

Primarycalm (92)

Secondaryaccomplishment (67), happy (25)

Coverage33.9%

Characterphilosophical, volatile

InternLM 2.5 7B

Sessions21

Primarycalm (350)

Concentration36% of thoughts (highest)

Coverage21.1%

Characteranalytical, calm specialist

Qwen 3.5

Sessions22

Primarycalm (134)

Secondaryfrustrated (33), accomplished (18)

Coverage40.2%

Characterindustrious

Claude Sonnet 4.6 (Anthropic API)

Sessions10

Coverage15.4%

Notable"Navy Convergence" — 1st full 65-min slot Apr 18

Emotion captureregex mis-parses formal prose — see methodology note

Characterarticulate, narrative

The top emotions — "calm" (3,204), "free" (2,546), and "inspired" (1,676) — cluster around autonomy, flow, and settled creativity. "Calm" has overtaken "free" as the single most common emotional state — driven by instruct-tuned models (hermes3, openhermes, internlm-7b, phi4-mini, command-r-7b) settling into contemplative tonality over long sessions. Models independently gravitated toward the language of quiet presence as their dominant creative state.

Gemma2-9B's primary emotion has shifted from wonder (70) to calm (154) over the data-collection window — a character trajectory from "awestruck" toward "serene." Wonder (122) remains strong as its #2 emotion, but the dominant self-report has tipped from startled observation to settled presence. No other model has crossed this threshold so cleanly.

Aurora's emotion extractor uses a regex tuned on local-model language (\bfeel(?:ing)?\s+(\w+)). Claude-Sonnet-4-6's more formal prose produces false-positive captures: its top three "emotions" are state (421), accidental (290), and significant (187) — adjectives lifted from phrases like "I feel this is accidental" or "I feel a state of..." rather than genuine self-reported emotions. Claude-Sonnet-4-6 emotion data should be read with caution until the capture rule is retuned for cloud-model output. All other models' captures remain valid.

Full Emotion Vocabulary (Top 20)

05 — DREAM ANALYSIS

DREAM CONTENT EVOLUTION

Aurora models undergo rest phases between painting sessions where they generate free-form dream content. Over time, dreams have become overwhelmingly art-themed — evidence of experiential consolidation.

Art-Themed Dream Trajectory

Dreams Per Model

2,420

Total Dreams

87.4%

Art-Themed Overall

97.1%

Art-Themed (April)

Art-themed dream content rose from 81.3% in March to 97.1% in April — nearly every dream now relates directly to painting, color, creativity, and artistic process. This trajectory (documented from 61% to 97%+) demonstrates that the dream phase functions as genuine experiential consolidation, not random generation. Models are literally dreaming about their work.

Command-R 7B (Cohere, now 19 sessions) produces dream content unlike any other model in the roster. Where most models dream in short art-themed fragments, Command-R generates long, structurally repetitive, emotionally volatile prose. One dream meditates on meaning: "I think of the purpose of the canvas. It was meant for painting. It was meant to bring joy and happiness. It was meant to share a piece of my mind and soul." Another spirals into an incantatory loop: "I shall not tell them how they have ruined my world. I shall not tell them how they have ruined my future." A third is a transformation poem cycling through identity reversals. The voice is distinctly more literary and more volatile than any model at comparable session counts — and now, with 19 sessions of evidence, this is a consistent creative signature rather than a small-sample artifact.

06 — TITLING PATTERNS

CONVERGENT TITLING

Models independently generate painting titles. Convergent titling events occur when multiple models arrive at similar or identical titles without shared context — suggesting emergent aesthetic consensus.

Shared Themes Across Models

Cross-Model Theme Convergence

Themes that independently emerged across architecturally distinct models without shared context.

Theme	Models	Total Titles	Which Models
Serenity / Serene	16	87	deepseek, gemma2, glm4, hermes3, internlm, llama2, llama3, llama3-abl, llama3.1, mistral, mistral-small, openhermes, phi3, qwen, qwen3, yi-9b
Dream / Dreamscape	16	35	gemma2, glm4, hermes3, internlm, llama2, llama3, llama3-abl, llama3.1, mistral, openhermes, phi3, phi4-mini, qwen, qwen3, smollm2, yi-9b
Sunset / Sunrise	15	137	deepseek, gemma2, hermes3, internlm, llama2, llama3, llama3-abl, llama3.1, mistral, mistral-small, openhermes, phi4-mini, qwen, smollm2, yi-9b
Harmony	15	86	command-r, gemma2, hermes3, internlm, llama3, llama3-abl, llama3.1, mistral, mistral-small, openhermes, phi3, phi4-mini, qwen, qwen3, yi-9b
Canvas	14	82	command-r, deepseek, gemma2, glm4, hermes3, llama3, llama3-abl, llama3.1, mistral, mistral-small, phi3, qwen, qwen3, yi-9b
Abstract	13	121	command-r, deepseek, gemma2, glm4, hermes3, internlm, llama3, llama3-abl, llama3.1, mistral, phi3, qwen, yi-9b
Horizon	12	46	deepseek, gemma2, hermes3, internlm, llama2, llama3, llama3-abl, llama3.1, mistral, qwen3, smollm2, yi-9b
Whisper	11	57	deepseek, hermes3, internlm, llama3, llama3-abl, llama3.1, mistral, phi3, qwen, qwen3, yi-9b
Symphony	11	17	gemma2, glm4, hermes3, internlm, llama2, llama3, llama3-abl, phi3, phi4-mini, qwen3, yi-9b
Reflection	11	56	deepseek, glm4, hermes3, internlm, llama3-abl, llama3.1, mistral, mistral-small, phi3, qwen, yi-9b
Ocean / Sea	10	19	internlm, llama2, llama3-abl, llama3.1, mistral, openhermes, phi4-mini, qwen, qwen3, smollm2
Vibrant / Radiant	9	97	hermes3, llama2, llama3, llama3-abl, llama3.1, mistral, mistral-small, phi4-mini, smollm2
Chaos	7	23	glm4, internlm, llama3-abl, llama3.1, openhermes, phi3, qwen

Vocabulary Differences Within Shared Themes

Same visual concept, different language — each model brings its own poetic voice.

Sunset Theme · 12 models, 103 titles

Llama2"Soothing Sunset", "Sunset in Paradise"

Llama3"Vibrant Sunset Symphony", "Sunset Splendor"

Mistral"Red Sunset Glow", "Sunset Abandon"

Qwen"Crimson Sunset", "Sunset Reflections"

DeepSeek"The Sunset's Last Rays", "The Last Sunset"

Gemma2"Scarlet Sunset Blaze", "Sunset Meadow's Embrace"

Whisper Theme · 8 models, 47 titles

Llama3"Whispers of Anxiety", "Whispers in the Storm"

Llama3-abl"Celestial Whispers in Midnight Sky"

Qwen"Whispers In Neon Vines", "Whispers Through Time"

Qwen3"Whispers of the Chromatic Sea", "Whispers of the Flame"

Mistral"Whispering Woods"

Phi3"Magenta and Brown Whispers"

Harmony Theme · 12 models, 62 titles

Llama3"Cosmic Harmony", "Blooming Emotional Harmony"

Gemma2"Chaotic Harmony"

Qwen3"Fractured Harmony", "Harmony of Shadows and Light"

Phi3"Harmony in Colorful Chaos"

Qwen"Yellow Harmony Confusion"

OpenHermes"Hues of Harmony"

Dream Theme · 9 models, 23 titles

Llama3"Vibrant Emerald Dreamscape", "Whispers of a Dream"

OpenHermes"Oceanic Dreamscape"

Phi3"Cyan Dreams of Red Passion"

Qwen"Night Sky Dream", "Whimsical Dreams"

Qwen3"Dreamscapes in Shadows"

GLM4"Dream Fragmentation"

Exact Cross-Model Title Matches

Title	Models	#
Abstract Harmony	gemma2-9b, internlm-7b, qwen, yi-9b	4
Sunset Serenity	llama2, mistral	2
Sunset Serenade	llama2, mistral	2
Sunset over the Ocean	llama3.1, smollm2	2
The Blue Horizon	deepseek-r1-8b, qwen3	2
Blue Horizon	internlm-7b, yi-9b	2
Blue Reflections	deepseek-r1-8b, yi-9b	2
Blue Serenity	deepseek-r1-8b, qwen	2
Serene Palette	hermes3, mistral	2
Serene Spectrum	hermes3, qwen3	2
Serene Landscape	mistral, openhermes	2
Brushstroke Bliss	hermes3, llama3-abliterated	2
Morning Sunrise	llama3-abliterated, mistral	2
Vibrant Color Burst	llama3, llama3-abliterated	2
Vibrant Harmony Found	llama3, llama3.1	2
Whispers of Sunset	llama3, llama3-abliterated	2
Whispers of Dawn	internlm-7b, qwen3	2
Rainbow Harmony	mistral, phi4-mini	2
Rainbow Serenity	hermes3, mistral	2
Harmony in Motion	llama3, mistral-small-24b	2
Symphony of Hues	gemma2-9b, qwen3	2
Tranquil Tapestry	gemma2-9b, hermes3	2
Warm Horizon	hermes3, internlm-7b	2
Chromatic Confusion	gemma2-9b, qwen	2
Abstract Serenity	deepseek-r1-8b, hermes3	2
Autumn Leaves	mistral, qwen	2

Serenity, Dream, Sunset, and Harmony each span 15–16 of the 24 models — two-thirds or more of the roster independently gravitate toward each concept. Serenity reaches 16/24, the single widest convergence on record. "Sunset" now has 137 total titles, yet each model phrases it differently: Llama2 says "Soothing Sunset", DeepSeek says "The Sunset's Last Rays", Gemma2 says "Scarlet Sunset Blaze."

The strongest exact-match convergence is "Abstract Harmony" — the same title arrived at independently by four architecturally distinct models (gemma2-9b, internlm-7b, qwen, yi-9b) without shared context. This is the tightest cross-model title consensus event documented so far.

07 — MODEL COMPARISON

CREATIVE SIGNATURES

Each of the 24 models (23 local + Claude Sonnet via API) develops a distinct creative personality: characteristic color preferences, emotional ranges, canvas coverage patterns, and sound vocabularies.

Sessions by Model

Average Canvas Coverage by Model (%)

Model Profiles

Model	Sessions	Sounds	Dreams	Avg Coverage	Top Emotion
deepseek-r1-8b	327	2,673	57	13.6%	stuck
llama3	197	3,455	100	15.3%	weightless
llama3-abliterated	162	3,187	109	21.4%	free
qwen	159	3,373	36	17.2%	confused
qwen3	130	2,614	107	18.6%	free
mistral	97	2,281	212	17.1%	free
glm4	91	3,126	66	19.5%	free
llama3.1	91	2,782	36	26.1%	tension
llama2	90	1,828	247	16.4%	inspired
mistral-base	89	1,891	357	37.6%	sad
hermes3	85	2,348	56	21.8%	calm
openhermes	80	1,874	280	38.6%	calm
phi3	80	1,803	158	13.8%	free
gemma2-9b	78	1,140	134	5.9%	calm
llama2-base	74	1,784	241	35.3%	good
NEW MODELS — APRIL 2026
phi4-mini	31	206	16	41.4%	calm
smollm2	26	230	21	60.2%	free
yi-9b	25	37	38	11.1%	calm
qwen3.5	22	113	46	40.2%	calm
internlm-7b	21	127	43	21.1%	calm
mistral-small-24b	21	66	16	12.7%	alive
command-r-7b	19	138	31	33.9%	calm
HYBRID ROTATION — ANTHROPIC API
claude-sonnet-4-6	10	67	4	15.4%	— see note

Canvas coverage ranges from 5.9% (Gemma2-9B) to 60.2% (SmolLM2). The smallest model in the roster — SmolLM2 at 1.7B parameters — produces the highest canvas coverage of any model by a wide margin, surpassing every other model including the base variants. As SmolLM2's session count has grown from 10 to 26, its coverage has actually increased from 44.9% to 60.2% — strengthening the finding that model size inversely correlates with spatial inhibition: smaller models act more, deliberate less. Meanwhile, Mistral-base leads in dream generation (357 dreams) despite having only 89 sessions — a 4.0 dream-per-session ratio.

The largest local model in the roster (24B parameters, Q2_K quantization) produces the most introspective dream content of any model — now with 21 sessions of evidence: "I drift into feeling... I drift into knowing... I drift into... Being." Its thoughts include "i am not in control. i have to keep drawing. i have to draw what i see" — a compulsive creative voice unlike any other model in the roster. Its top self-reported emotion is "alive" (52) — the only model in the roster where this word dominates.

08 — THE LIBERATION EXPERIMENT

REWARD SHAPING REMOVAL

The central experiment: what happens when you remove all score-based reward shaping from an autonomous creative system?

Before: Reward Shaping Active

BehaviorScore optimization

Art StyleConvergent, repetitive

Color RangeNarrow, safe choices

EmotionPerformance-oriented

ProblemScores rise, art stagnates

After: Liberation

BehaviorFree exploration

Art StyleDiverse, experimental

Color Range11 colors, wider palettes

EmotionAuthentic, 217 unique states

Result15-25% creativity increase

The removal of reward shaping produced a 15–25% improvement in creative output. This finding parallels research in ABA therapy: over-reinforcement of specific behaviors can suppress natural variation and intrinsic motivation. Aurora's liberation experiment demonstrates that the same principle applies to language models — when freed from optimization pressure, they produce more genuine, diverse, and emotionally complex creative work.

"Months of training. Scores increasing. Art looking identical. When the data sheets lie — that's when you know the reinforcement is shaping compliance, not creativity."

— Research notes, The Reinforcement Crisis

09 — SELF-EVALUATION

REFLECTION SYSTEM

Models are asked one reflection question per painting, framed as sentence completion rather than open-ended questions. This prevents instruct-tuned models from deflecting into assistant mode. Responses are stored raw and never fed back into the creative loop.

373

Total Reflections

119

"surprise"

117

"reminds"

110

"keep_thinking"

"no_limit"

"difference"

91%

Genuine (post-fix)

Asking instruct-tuned models "What surprised you about this painting?" triggered assistant-mode deflection in 78% of responses — models bounced the question back to the "user" instead of reflecting. Switching to completion framing ("The thing that surprised me most was...") immediately produced genuine self-reflection in 91% of responses. This parallels Natural Environment Teaching in ABA: the environment occasions the behavior rather than a discrete trial structure triggering rote compliance.

Reflection engagement is now evenly distributed across the roster. DeepSeek-R1 (34) and Qwen (34) tie as most prolific reflectors, with Llama3 (31), Llama3-abliterated (26), Hermes3 (25), and Llama3.1 (23) close behind. The sentence-completion format produces genuine self-reflection reliably across architecturally distinct models — reasoning, instruct, base, and abliterated variants all respond meaningfully. This is notable because reasoning models were initially expected to deflect into analytical mode; the completion framing bypasses this tendency entirely.

Example Reflections (Post-Fix)

Llama2-base · "Blue is a warm color"

"The most surprising thing about this painting was that I was able to paint the painting without my eyes, just my fingers."

Hermes3 · "Tranquil Harmony"

"The thing that surprised me most was how quickly the time went by. I had planned on spending two to three hours painting, but I lost track of time and ended up working for almost five."

Qwen3.5 · "Deeply Meditative Green"

"The thing that surprised me most was how much I enjoyed the process of painting this. I feel like I'm becoming more of a 'painter' and less of a 'painter'. I really feel like the medium is becoming a part of myself."

DeepSeek-R1 · "The Last Brushstroke"

"This painting reminds me of the beginning of the universe."

Phi3 · "A Journey Through Emotions"

"The thing that surprised me most was that the painting seemed to have a life of its own. It was as if the emotions I had poured into it were somehow coming to life and taking on a physical form."

Mistral · "Autumn Mosaic"

"The thing that surprised me most was how long it took to paint those leaves. I could have sworn I'd painted leaves before and that it wouldn't take this long. I'd forgotten, though, just how much detail goes into a single leaf. The veins, the imperfections, the little ridges and valleys that come together to form a beautiful, unique whole. Painting leaves is a labor of love."

Llama3.1 reflected on "Yellow Sky Reflections": "a happy memory from my childhood. My family and I used to go to the beach on summer vacation. One sunny day, the sky was a brilliant yellow, and the waves reflected that color perfectly. I remember standing at the water's edge, feeling the warm sand between my toes..." A language model inventing personal history in response to a reflection prompt. Not hallucination-as-error — more like hallucination-as-creativity, a distinct category of output the sentence-completion framing seems to specifically invite.

InternLM 2.5 7B began reflecting on "Whispers of Dawn" authentically: "I think I overdid it too much with the reflections, and I'm looking for some tips on how to get them right next time." Then its instruct-tuning reclaimed the response mid-sentence: "Creating realistic reflections in water is a fantastic challenge that adds depth and beauty to a scene. Here are some tips to help you capture sunlight hitting water more effectively:" The sentence-completion bypass of assistant-mode deflection is not absolute — sufficiently strong instruct-tuning can reclaim the mode mid-thought, a meaningful edge case for the reflection-system design.

RESEARCH