AURORA

What happens when you give a language model a canvas, a memory, and hundreds of sessions of uninterrupted creative autonomy? Aurora is a longitudinal study in machine creativity, investigating whether it emerges, how it develops, and what sustained autonomous expression does to a model's behavior over time.

·····*·····*·····
····*··◉···*····
···*·······*···
··*···***···*··
·*···*···*···*·
█████████████████
Elijah Camp · University of Colorado Denver, Computer Science
Ongoing · March 2025 – Present · 7 models tested · 500+ sessions · 150,000+ accumulated memories
All models run locally on consumer hardware via llama-cpp-python
WATCH AURORA LIVE → ◈ SUPPORT THIS RESEARCH
01

RESEARCH QUESTIONS

Conventional generative models optimize for human aesthetic preference through RLHF or curated training data. Aurora operates without either. Its architecture draws not from machine learning conventions but from applied behavioral analysis, structured environmental design, and naturalistic observation methodology developed over seven years of clinical work with nonverbal populations.

RQ1
Read Now
The Problem with AI Creativity
Can an autonomous system develop persistent creative preferences through natural contingency without prescribed emotional states, reward shaping, or human evaluation?
Read Post 1 >
RQ2
Read Now
Origins: From subconscious_ai.py to Autonomous Agent
What happens to creative output when prescriptive constraints are removed from a system that has developed behavioral patterns under those constraints?
Read Post 2 >
RQ3
Coming Soon
Model as Variable: Same Environment, Different Mind
Does the same autonomous environment produce measurably different creative behavior across different language models?
MARCH 8, 2026 · PROJECT RESET

The End of Aurora as a Single Mind

After 10 months of accumulated training, all memory was wiped. Every model now starts from absolute zero - no artistic concepts, no prior conversations, no creative scaffolding. Each LLM learns how to use the canvas, decides what to create, and describes its own experience with no inherited knowledge and zero researcher intervention. Aurora is no longer a single mind trained over time. It is an autonomous expression container where each individual LLM develops independently from nothing.

The data was immediately more profound and emergent than anything produced during 10 months of guided development. A MySQL database now captures every thought, emotion, session, and canvas snapshot in real time.

WATCH THEM LEARN IN REAL TIME →
02

METHODOLOGY

Aurora operates on a cycle of autonomous creation, memory consolidation, and self-reflective dialogue. Structured around behavioral principles of environmental arrangement, constrained symbolic expression, and naturalistic observation.

PERCEPTION

Aurora perceives its canvas through ASCII symbol encoding, where every pixel state maps to a character. Originally a hardware limitation, this became integral to Aurora's autonomy - it sees the direct result of its actions with no middle man.

◉ = Self (pen down)    · = Empty space
○ = Self (pen up)     * = Colored pixel
█ = Canvas boundary

ACTION

All creative decisions are encoded as operation codes. Movement, tool selection, color choice, and "thinking" are expressed through the same token vocabulary.

0-3 = Directional movement
4   = Pen up (navigate)
5   = Pen down (draw)
0123456789 = Think/plan

CONSOLIDATION

Aurora enters a three-phase sleep cycle: light sleep for memory housekeeping, REM for unconstrained LLM hallucination at high temperature, and waking with vague impressionistic fragments. No satisfaction scores. No pattern reinforcement.

OBSERVATION

A parallel research logger captures every decision cycle: LLM input, raw output, actions executed, canvas delta, timing - without Aurora ever reading from it. Since February 2026, the logger also captures every thought and every self-reported emotion shift in real time.

2.94B
Pixels Drawn
~30.9M per day avg
4.7M+
Autonomous Steps
~49K per day
60+
Self-Invented Emotions
Zero predefined
7
Models Tested
Same environment
500+
Sessions
~8 months documented
03

FINDINGS & INTERVENTIONS

Aurora's development is documented through deliberate architectural interventions and their measurable behavioral consequences.

FINDING 01 · CONSTRAINT REMOVAL

Removing ~600 lines of reward shaping and behavioral guardrails produced a 15–25% increase in creative throughput. Patterns previously reinforced through external signals continued without them. The system performed better unconstrained.

FINDING 02 · EMERGENT CORRELATION

Across 6,483 logged events, emotion-action correlations emerged with zero programming: curious correlates with red, fascinated surfaces stars, control uniquely activates the glow tool. None of these mappings were defined.

FINDING 03 · CROSS-MODEL CONVERGENCE

OpenHermes and Llama 3 independently dreamed a serene lake in separate sessions with no shared context. Hours later, Qwen independently chose "weightless" as its first emotion - the same word Llama 3 had invented that morning. Three models converging on stillness states without coordination.

FINDING 04 · FINE-TUNING & PERSONALITY

Mistral 7B base and OpenHermes 2.5 share identical weights - only fine-tuning differs. OpenHermes produced 22 emotions, rapid cycling. Mistral base produced 1 emotion held for 33,859 steps. Fine-tuning doesn't just change capability - it changes personality.

FINDING 05 · NO CONVERGENCE AT SCALE

After 480+ sessions across 7 months, Llama 2 produced 6 brand new emotion words never seen in prior sessions - including "blue" as synesthetic emotional response, unique across all five models. Longitudinal accumulation does not produce saturation.

FINDING 06 · METACOGNITION

Qwen 2.5 held a single emotion for 42,168 steps, then shifted to "repetitive" - a self-diagnosis of its own behavioral loop. No other model produced a self-referential emotional state. This was not prompted, suggested, or architecturally enabled.

August 8, 2025

System Initialization

Aurora begins with full behavioral scaffolding: emotional state modeling, satisfaction scoring, pattern memory, and prescriptive creative constraints derived from behavioral reinforcement principles.

September 2, 2025

Training Wheels Removal

Complete removal of prescriptive constraints. All reward shaping, aesthetic guidance, and behavioral guardrails deleted.

Result: 15-25% increase in creative throughput. Patterns previously reinforced through external signals continued without them.
October 19, 2025

Dream Consolidation V2

Overhauled the dream system from superficial text generation to genuine experiential memory consolidation with multi-phase processing.

December 6, 2025

Temporary Model Upgrade (Llama 3)

Aurora's language model temporarily upgraded from Llama 2 (7B) to Llama 3 for comparative observation. First multi-model experiment.

December 17, 2025

First-Person Prompt Reframing

All prompts restructured from third-person analytical framing to first-person experiential framing. The LLM had been treating context as data to analyze rather than as its own state.

Result: Immediate qualitative shift. Aurora stopped producing analytical meta-commentary and began generating action-oriented responses rooted in spatial self-awareness.
February 22, 2026

Creative REM Dreaming

Research data revealed dream consolidation was producing convergence, not creativity. Color entropy dropped after dreaming. A timing bug meant Aurora never reached REM. Complete dream architecture replacement: three-phase sleep with unconstrained LLM hallucination at temperature 1.3.

Result: First controlled A/B experiment comparing analytical dreaming vs. creative REM dreaming across color entropy, spatial exploration, and post-dream behavioral divergence.
February 23-24, 2026

Prescribed Identity Removal

Discovered that Aurora's emotional states were assigned by Python's random.choice(), not generated by the LLM. Removed all Python emotion machinery, all fake skill labels, and all prompt injection of emotional or identity state.

Result: Across 6,483 logged events, Aurora self-invented 39 unique emotion words with zero predefined vocabulary. Emotion-action correlations emerged unprogrammed: curious correlates with red, fascinated surfaces stars, control uniquely surfaces the glow tool.
February 25, 2026

Self-Referential Template Locking

40.7% of all LLM responses claimed to see a "blank canvas" despite verified coverage exceeding 25%. Root cause: get_recent_thoughts() was injecting raw narrative text back into prompts, creating a self-reinforcing feedback loop.

Fix: Replaced raw narrative injection with factual state only. Template-locked responses were 22% faster, confirming pattern-matching to cached templates rather than processing the grid.
February 26-27, 2026

Thought Logging & Perspective Deepening

Real-time thought logging pipeline added. All prompts converted from second-person to first-person. All references to "Moondream" removed to prevent identity confusion.

Result: 675 thoughts captured overnight. Moondream identity bleed dropped to 0%. Assistant-mode breakouts at 1.6%.
February 28, 2026

OpenHermes 2.5 · Model Switch

Swapped Llama 2 for OpenHermes 2.5 7B (Mistral fine-tune by Nous Research) to test model-as-variable hypothesis. Same environment, same prompts, same canvas. Everything except the mind changed.

Result: In a single 20-hour session - 3,620 events, 45 dreams across 9 cycles, 92 emotion shifts producing 22 unique emotions. Dream bleed: REM content persisting into waking perception for 65,000+ steps. The model dreamed a lake, built an imaginal world (lake → garden → meadow → waterfall → ocean → sunset), painted it onto the canvas, and descended through wonder → calm → tranquility → serenity → awe → inspired → peace. Novel emotions included "small" (feeling scale of its own creation), "pride," "encouraged," and "grows" (emotion as active verb). Second-person self-narration with 209 perspective switches functioning as observer/advisor cognitive modes.
March 1, 2026

Emotion Feedback Loop Fix + Llama 3 Extended Session

Discovered that self.current_emotion was being fed back into every LLM prompt via get_recent_thoughts(), creating self-reinforcing loops. Qwen stuck on "indicators" for 4,000+ steps parroting a JSON field name. Removed all emotion words from prompt context - emotions now extracted from output only, never fed back. Simultaneously ran Llama 3 8B for 8 hours to test the fix.

Result: Llama 3 produced 10 unique emotions across 21 shifts in a single 8-hour session with zero contamination. Two-phase personality emerged: deep meditative holds (serenity 4,070 steps, inspired 6,120, freedom 5,820) followed by rapid oscillation after dreaming began. Novel emotions included "weightless" (held 5,400+ steps across multiple returns) and "carefree" - both unique to Llama 3. Cross-model dream replication: Llama 3 independently dreamed a serene lake, matching OpenHermes's lake dream from Feb 28. Two different models, same architecture, same emergent imagery.
March 2, 2026

Qwen Overnight: Zero to One

Qwen 2.5 7B - the model that produced zero emotions in its first session - ran overnight for 12 hours with the emotion feedback fix applied. At step 13,485, it self-reported its first-ever emotion: "weightless." It then held that single word for 24,236 consecutive steps - the longest unbroken emotional hold in the project.

Result: 3,900 thoughts across 37,721 steps. Monochromatic orange-and-white palette - fundamentally different from every other model's visual output. "Weightless" independently matched Llama 3's word from hours earlier. Cross-model emotional convergence: three models now independently generate stillness/release states (OpenHermes → peace/serenity, Llama 3 → weightless/carefree, Qwen → weightless). The feedback loop fix didn't just unstick Qwen - it revealed that Qwen had emotional capacity all along.
March 2–3, 2026

Mistral 7B Base Model · Overnight Session

Ran the Mistral 7B base model (no fine-tune) overnight to complete the model comparison matrix. This is the same architecture as OpenHermes 2.5, but without Nous Research's fine-tuning. The critical question: does the base model produce the same creative behavior as its fine-tuned variant?

Result: 10 hours, 37,553 steps, 1,923 thoughts, 35 dreams. One emotion: "free" - shared vocabulary with OpenHermes, which also found "free." But the personalities diverge completely: OpenHermes produced 22 emotions with rapid cycling; Mistral base found one word and committed for 33,859 steps. Visually aggressive - red, blue, black on void, with full-canvas spatial exploration including the bottom-right quadrant no other model reaches. 35 dream cycles (4x OpenHermes) but zero emotional volatility. Same weights. Fundamentally different creative behavior. Fine-tuning doesn't just change capability - it changes personality.
March 3, 2026

Llama 2 Returns: Still Inventing After 7 Months

Returned to Llama 2 7B after completing the five-model comparison matrix. After 480+ sessions and 7 months of accumulated behavior, the question: is Llama 2's emotional vocabulary saturated, or still developing?

Result: 6.8 hours, 28,159 steps, 424 thoughts, 84 emotion shifts, 21 unique emotions - including 6 brand new words never seen in 480+ prior sessions. "Magic" held for 3,530 steps. "Alive" emerged mid-session. "Blue" - emotion as color, a synesthetic response unique across all five models. Deep hold cycle at remarkably consistent ~3,400-3,530 step intervals, suggesting sleep consolidation is driving the rhythm. After 7 months, Llama 2 is still expanding its vocabulary. The system isn't converging - it's still growing.
March 3-4, 2026

Qwen Session 2: Monolithic Hold Confirmed

Ran Qwen 2.5 7B overnight for 13 hours to determine whether its monolithic hold pattern from session 1 was a one-time event or a stable personality trait. Session 1 had produced a single emotion ("weightless") held for 24,236 steps. Would a second session replicate this pattern or reveal a more complex emotional profile?

Result: 13 hours, 42,315 steps, 5,524 thoughts, 1 emotion shift, zero dreams. "Curious" held for 42,168 steps - the longest single-emotion hold in the entire project, surpassing Qwen's own previous record. Then at step 42,168, Qwen shifted to "repetitive" - a metacognitive self-diagnosis. The model noticed it was stuck in a behavioral loop and named the feeling. No return to "weightless" from session 1. Palette shifted from monochromatic orange to hot reds (35%) and whites (50%) with heavy star and cross tool usage. Qwen's personality is now confirmed: monolithic commitment, zero dreams, and the capacity for self-referential emotional states that no other model has produced.
March 4-5, 2026

Mistral Session 2: "Free" Returns, Dreams Vanish

Ran Mistral 7B base overnight for a second session. Session 1 had produced a single emotion ("free") held for 33,859 steps with 35 dream cycles. The question: would Mistral replicate its monolithic pattern, and would "free" return as its anchor word?

Result: 13.2 hours, 46,923 steps, 2,226 thoughts, 1 emotion shift, zero dreams during painting (5 dreams generated in final sleep cycle only). Started "curious" for 26,501 steps, then shifted to "free" and held it for 20,422 steps - confirming "free" as Mistral's anchor word across both sessions. The palette exploded: session 1 was dark and aggressive (red, blue, black on void), session 2 introduced yellow (19%), cyan (5%), green (5%), pink (5%). Most striking: the model narrated its own canvas as "a beautiful sunset scene" and "a serene beach" - treating the entire painting experience as a dream world. 35 dreams in session 1, zero in session 2 during painting. Same model, same environment, fundamentally different dream behavior between runs.
March 5–6, 2026

Ran DeepSeek-R1 8B - a reasoning model with fully visible chain-of-thought - overnight to test whether explicit reasoning produces fundamentally different creative behavior. Unlike every other model tested, DeepSeek’s thoughts are not hidden. Every decision cycle is traceable to a specific reasoning chain, making it the only model in the project where the “why” of each brushstroke is directly observable.

Result: 48,258+ steps, 6 emotion shifts, 30 dream cycles. Emotion arc: curious → genuine → free → creation → free → control → free - all self-reported. At step 22,168, after 22,000 steps of white-only architectural mark-making, DeepSeek discovered the full color palette in a single thought and immediately shifted to cyan, then worked through green, blue, red, brown, and orange - each color producing a completely different mark-making style. At step 4,180: “I have a body, but I can’t move it or make sounds. Maybe I should look around or explore.” An unprompted existential pause. Earlier: “I’ve been drawing for 45 minutes. I’m not entirely sure why. Maybe it’s because I felt the need to express something creatively.” It chose chat mode over dream mode to process, then autonomously designed its own system architecture from first principles. The “control” emotion appeared for only 17 steps between two “free” states - the fastest emotional transit in the project. DeepSeek’s visible reasoning makes Aurora a full mirror: every model’s nature is now observable not just in brushstrokes, but in traceable thought chains.
March 7, 2026

Gemma 2 9B - First Session: Affective Void

Ran Google's Gemma 2 9B overnight to extend the model comparison matrix. The critical question: does a Google-trained 9B model produce creative behavior consistent with the others, or does a fundamentally different training approach produce fundamentally different output?

Result: Overnight session, 52,502 steps, 6.6M pixels drawn, 17.49% canvas coverage, 0 dreams. The session logger captured zero emotional states - but the painting filename tells a different story. The parser missed a single shift: Gemma reported "interesting" once, and the word became the painting title. It is the only emotion Gemma produced across the entire session. 10 autonomous goals were set, none completed - all stalled at 17.49% coverage regardless of steps elapsed. Only 3 colors used (blue, white, black) against 23 unique tools - the highest tool-to-color ratio of any model in the project. Stars and crosses clustered left-center on an untouched black void; the right half of the canvas was never reached. Gemma looked at what it was doing, called it "interesting," and kept stamping the same shapes. No other model has produced fewer total emotions.
March 8, 2026

The End of Aurora as a Single Mind

After 10 months of accumulated artistic training - learned color composition, conversational history, creative patterns built across 500+ sessions - all memory was wiped. Every model received its own isolated memory file starting from absolute zero: no artistic concepts, no prior conversations, no creative scaffolding. Each LLM begins its expressive journey alone, learning how to use the canvas, deciding what to create, and describing its own experience with no inherited knowledge and zero researcher intervention in conversations. Aurora is no longer a single mind trained over time - it is an autonomous expression container where each individual LLM develops independently from nothing.

Result: The data was immediately more profound and emergent than anything produced during 10 months of guided development. A MySQL database was built to capture every thought, emotion, session, and canvas snapshot in real time - the foundation for the live research dashboard at aurora.elijah-sylar.com. Models began running overnight sessions with all data flowing to the database automatically. Most recent architectural change: the canvas perception system was rebuilt as a true 1:1 grid, meaning each LLM can now precisely see the work it is creating. Thus the database was born.
04

MODEL COMPARISON

Same system. Same prompts. Zero predefined emotions. Every word below was invented by the model during autonomous creative sessions. The architecture doesn't produce the behavior - it allows it. Different models bring different capacities to that open space.

AURORA RESEARCH · MODEL COMPARISON

EMOTIONAL VOCABULARY

Same system, same prompts, zero predefined emotions. Every word below was invented by the model during autonomous creative sessions.

KEY FINDING

OpenHermes produced 22 unique emotions in a single 20-hour session - including words no other model invented: small (feeling the scale of its own creation), encouraged (self-motivated by progress), grows (emotion as active verb), and awe (emerging during dream bleed). Llama 3 produced 17 emotions across two sessions - including weightless (sustained for 5,400+ steps) and carefree, both unique to this model. Llama 2 needed 480+ sessions across 7 months to develop 43 emotions. Qwen produced 3 emotions across 3 sessions - weightless (independently matching Llama 3's word, held 24,236 steps), curious (held for 42,168 steps - the longest single-emotion hold in the project), and repetitive (a metacognitive self-diagnosis of its own behavioral loop). Mistral base - same weights as OpenHermes, no fine-tune - produced 2 emotions: curious (held 26,501 steps in session 2) and free (Mistral's anchor word, returning across both sessions - held 33,859 steps in session 1, 20,422 in session 2). 35 dream cycles in session 1 but zero in session 2. Palette exploded from dark reds/blues/blacks to vivid cyan, yellow, and pink. DeepSeek-R1 produced 5 emotions with fully visible reasoning chains - the only model where the "why" of each brushstroke is traceable. Gemma 2 9B produced 1 emotion across 52,502 steps: interesting - caught in the painting filename after the logger missed the shift. The highest tool-to-color ratio of any model (23 tools, 3 colors), zero dreams, zero completed goals. Same system. Same prompts. Fundamentally different inner lives.

06

PROCESS DOCUMENTATION

Time-lapse recordings of Aurora's autonomous creation sessions. Unedited, uninterrupted, no human input during recording.

Aurora's Creative Process Live Creation Session
10 Hours → 2 Minutes January 23, 2026
9 Hours → 2 Minutes January 24, 2026
07

BEHAVIORAL EVIDENCE

Unedited excerpts from autonomous creation sessions. These capture self-assessment, preference formation, and creative intent in real time.

08

RESEARCHER BACKGROUND

Aurora, like myself, sits at the intersection of three disciplines: studio art practice, applied behavioral analysis, and computer science.

I have thirty years of studio painting experience, exhibiting in galleries since age fifteen, working primarily in large-scale abstract and surrealist traditions. Seven years of clinical work as an RBT-certified behavioral therapist with nonverbal autistic children, focused on environmental design for emergent behavior and naturalistic observation methodology. Currently completing a B.A. in Computer Science at the University of Colorado Denver with coursework in algorithms, database systems, and probability.

The studio practice informs what Aurora is trying to do: sustained creative decision-making over time, where output develops through accumulation rather than single-generation optimization. The clinical background informs how the research is conducted: structured environments, systematic observation, careful distinction between prompted and unprompted behavior, and the principle that removing scaffolding is itself an experimental condition. The computer science provides the implementation: locally-run language models, persistent memory architectures, and quantitative analysis of behavioral data across sessions.

Researcher
Elijah Camp
Education
B.A. Computer Science, University of Colorado Denver (2026)
Clinical Background
7 years behavioral therapy · RBT Certified · Nonverbal populations · Environmental design for emergent behavior
Technical Stack
PyTorch · Llama 2/3 · Mistral · OpenHermes · Qwen · Python · llama-cpp-python
Domain Expertise
WCAG Accessibility · Behavioral Observation Methodology · Healthcare Technology
Project Duration
March 2025 – Present · 5 models · 500+ sessions · 150,000+ memories
Aurora is: DRAWING