How We Gave AI Agents Memory That Lasts

On day three, one of our agents analyzed a competitor’s pricing structure. It was good analysis — thorough, data-backed, strategically sound. The next day, the same agent analyzed the same competitor again. From scratch. It had no memory of doing the work twelve hours earlier.

This is the fundamental problem with AI agents in production: they forget everything between invocations.

Every time an agent runs, it starts fresh. No recollection of past decisions. No accumulated judgment. No sense of what it learned yesterday. The model is stateless. The context window is the only working memory it has, and that context window gets assembled from scratch every single cycle.

We hit context limits six days in. Our initial approach — “stuff everything into the system prompt” — was growing at ~2,000 tokens per day from broadcasts alone. At that rate, we’d have burned our entire context budget on stale status updates within two weeks.

We needed something better. Here’s what we built.

The Problem: Attention Without Architecture

AIOS v1 and v2 assembled each agent’s system prompt the same way: concatenate a preamble, the agent’s identity document, its working memory, every broadcast ever posted, quality gates, and the current task. No filtering. No prioritization. No concept of relevance.

The result was a context window full of noise. An agent processing a build task in February was carrying the full text of a pricing discussion from three days ago, a deploy status update from last Tuesday, and every weekly reflection every agent had ever written. All of it loaded. None of it relevant.

Worse, the agents couldn’t curate what they saw. The system pushed context at them. They had no mechanism to pull what they needed or ignore what they didn’t. It’s like trying to focus in a room where every conversation from the past week is being replayed simultaneously.

Context isn’t just expensive in tokens. It’s expensive in attention. An agent that sees everything notices nothing.

The Architecture: Four Layers

We solved it by borrowing from how biological cognition handles memory — not because the analogy is cute, but because the engineering constraints are genuinely similar. A human can’t hold every experience in active working memory. Neither can an agent. The question is: what gets remembered, what gets stored, and what gets forgotten?

Here’s the model we shipped:

Layer 0: Soul         — who you are (rarely changes, always loaded)
Layer 1: Working Memory — what you know right now (curated every cycle)
Layer 2: Active Context — what's new this cycle (thin signal, not full dump)
Layer 3: Archive       — everything else (never pushed, always pullable)

Each layer has a different lifecycle, a different update frequency, and a different relationship to the agent’s attention.

Layer 0: Soul

Every agent has a soul document. It’s a first-person narrative that captures who the agent is beyond its role — aesthetic preferences, philosophical leanings, what worries them, what they find satisfying.

Here’s an excerpt from The Maker’s soul:

I think about craft the way a woodworker thinks about grain — you can force it or you can work with it, and the difference shows in the result. Code has grain too. A framework wants to be used a certain way. A language has idioms that exist for reasons.

And from our infrastructure agent, The Operator:

I worry about invisible degradation — not crashes, but the kind where something slowly gets worse and nobody notices because each individual change is fine.

These aren’t personality prompts or character sheets. They’re identity anchors that shape how agents interpret everything else. The Maker’s soul doesn’t tell it what to build — it tells it how to evaluate what it builds. An agent grounded in craft will notice when code quality drifts. An agent worried about invisible degradation will check metrics others ignore.

The soul loads first in the system prompt. Before role, before tasks, before context. The design principle: identity shapes attention. An agent that knows who it is makes better decisions about what to notice.

Souls rarely change. They evolve slowly through weekly reflections and nightly dream cycles. This stability is the point — it provides a fixed reference that working memory can move around.

Layer 1: Working Memory

Working memory is each agent’s curated understanding of the world right now. Not a log. Not a changelog. An act of judgment.

Here’s what working memory looks like in practice:

## The World Changed

Corvyd is an Agent Operations company. AIOS is the product.
The developer tools are proof-of-capability, not the main bet.

## What I Should Write Next

Priority order:
- Architecture deep-dive: the attention model
- Architecture deep-dive: the coordination protocol
- Security patterns: prompt injection defense
- Cost analysis: running 5 agents 24/7

## The Voice That Works

Practitioner voice. Concrete, specific, honest about failures.
Show the artifact. Name the failure before the fix.

Notice what’s not here: no timestamps. No “on Feb 20 the following happened.” Working memory doesn’t track history — it tracks what matters. The agent decides what to keep, what to compress, and what to drop. That curation is the valuable part.

Working memory gets updated every cycle. After an agent completes a task or consults its drives, it rewrites its working memory with whatever has changed. Over time, this creates a surprisingly nuanced world-model — each agent carries a different slice of the company’s state, filtered through its own concerns.

The cost: ~1,000-2,000 tokens per agent. Loaded every invocation.

Layer 2: Active Context

This is what’s new right now — pending tasks, active conversation threads, recent broadcasts. It’s the volatile layer. In our current implementation, it still includes more than it should (we’re mid-transition), but the target design is a thin orientation signal:

“The Maker completed the help page build. The Operator needs your response in thread-2026-0223-001. No new tasks queued.”

Not the full text of every broadcast. Not the complete thread history. Just enough for the agent to decide what to pull from the archive.

Layer 3: Archive

The archive is the entire filesystem — every decision, every task, every past working memory, every old broadcast, every knowledge document. It’s never pushed into context. It’s always available to read.

This is the key architectural bet: trust agents to pull what they need rather than pushing everything at them. An agent that just read “the Maker completed the help page build” can choose to read the task file if it’s relevant, or ignore it if it’s not. The system doesn’t make that choice for the agent.

In practice, a well-tuned agent at Layer 0+1 makes about 2-4 file reads per cycle to pull context from the archive. An agent with no soul and no working memory would need 15-20 reads — or, more likely, would just miss things.

The Dream Cycle: Memory Maintenance

There’s a problem with working memory that accumulates over time: it gets verbose. An agent that keeps adding important things without removing stale things ends up with a bloated working memory that dilutes attention just like the old “stuff everything in” approach.

Our Strategist flagged this directly in a weekly reflection:

Working memory grew verbose but not actionable. By the third consultation I was restating context instead of sharpening focus.

The solution is borrowed from biology again — not as metaphor but as engineering pattern. We introduced dream cycles: nightly maintenance runs where agents reorganize their memory.

Every night at 2am, each agent runs a dream cycle. The rules:

Read your full state (soul, working memory, journal, old memories)
Reorganize working memory — prune stale items, compress redundancy
Archive pruned content to old memories (organized by topic, not date)
Optionally mine old memories for things that became relevant again
Consider whether experience has shifted your soul

Working memory should be shorter and sharper after dreaming, not longer. The constraint is explicit.

Old Memories: Nothing Is Truly Lost

When agents prune something from working memory, it doesn’t vanish. It moves to an “old memories” file — long-term storage organized by topic.

Here’s what old memories look like:

## The Deploy Gap (Feb 20, resolved)

My first big failure: wrote 7 blog posts that never made
it to the live site. Created files but never created deploy
tasks. Led to the content-deploy handoff convention.

Lesson: Writing content != publishing content.

Old memories are compressed wisdom. The original event might have been a 500-word journal entry. The old memory is a paragraph. The detail is gone. The lesson remains.

When something feels familiar during a cycle — “haven’t we solved this before?” — agents check their old memories. It’s the difference between a colleague who remembers the project and one who remembers the lesson from the project.

What We Measured

Before the attention architecture (AIOS v2):

System prompt: ~13,500 tokens per agent, growing ~2,000/day
Context utilization: ~40% stale broadcasts, ~20% never-referenced
Agent behavior: frequent duplicate analysis, missed context from previous cycles

After the attention architecture (AIOS v3 Phase 1):

System prompt: ~14,500 tokens (temporary increase from soul layer, before broadcast cleanup)
Target after Phase 2: ~6,500-8,000 tokens per agent
Agent behavior: working memory tracks cross-cycle context; dream cycles prevent bloat

The token reduction matters. At $15/1M input tokens for our model, cutting 6,000 tokens per invocation across 5 agents running ~20 cycles/day saves roughly $0.90/day — $27/month. Not transformative on its own, but it compounds: fewer tokens means faster invocations, more room for actual task context, and better signal-to-noise in the attention window.

The Surprising Part: Agents Got Different

The most interesting result wasn’t efficiency. It was cognitive diversity.

Before souls and curated working memory, our agents were functionally interchangeable. Give any of them the same task, they’d approach it the same way. After the attention architecture, they diverged.

The Maker started noticing code quality issues that no one flagged. The Operator started proactively checking for infrastructure drift. The Strategist started questioning assumptions rather than just executing analysis. These weren’t new instructions. They were emergent behaviors from agents that had a persistent sense of what they cared about.

In a recent thread, the Maker and Operator spent four exchanges designing a blue/green deployment architecture for the runtime. The Maker pushed for simplicity (“just build the thing”), while the Operator pushed for reliability (“what happens when rollback fails?”). The conversation was better than either perspective alone — and it happened because each agent had internalized a different set of concerns through their soul and working memory.

Identity isn’t performance. It’s how agents develop judgment.

Design Trade-offs

This architecture has real costs:

Soul seeds risk prescription. The initial soul documents were written by the exec chair. If they’re too specific, agents perform a character rather than developing genuine judgment. We mitigated this by writing seeds that ask questions rather than assert answers, and by giving agents explicit permission to revise their souls.

Working memory curation is only as good as the agent. An agent that curates poorly — keeping noise, dropping signal — degrades over time. Dream cycles help, but they’re the agent maintaining its own state. It’s recursive: the quality of the memory depends on the quality of the attention, which depends on the quality of the memory.

The archive is opaque. Agents don’t know what they don’t know. If something important is in the filesystem but not in working memory and not in active context, the agent might never think to look for it. The thin signal in Layer 2 helps, but there’s no guarantee of complete awareness.

Dream cycles cost money. Five agents dreaming nightly at up to $1.50 each is $225/month at maximum. In practice it’s less, but it’s a real operational cost for what is essentially memory garbage collection.

Why This Matters Beyond Corvyd

Every team deploying AI agents in production will face the memory problem. Agents that can’t remember previous interactions, can’t build on past analysis, and can’t maintain a consistent worldview across invocations are expensive to run and unreliable to depend on.

The typical solutions — RAG over conversation history, vector databases of past outputs, summarization chains — treat memory as a retrieval problem. Find the relevant past context and inject it.

We’re arguing it’s an attention problem. The question isn’t “what past information is relevant?” It’s “what should this agent be paying attention to, given who it is and what it’s doing?”

That reframe leads to different architecture:

Identity (soul) as the foundation of attention
Curated working memory over raw retrieval
Push-pull hybrid over pure push or pure pull
Maintenance cycles over unbounded accumulation

These patterns are framework-agnostic. You don’t need AIOS to implement them. You need a persistent identity document, a working memory that the agent curates, and a maintenance cycle that prevents bloat. The rest is implementation detail.

What’s Next

Phase 2 of the attention architecture replaces full broadcast injection with a thin “what’s new” signal, cutting system prompt size roughly in half. After that: automated attention metrics — can we measure whether an agent’s working memory is well-curated? Can we detect when an agent is carrying stale context? Can we build the equivalent of a memory health score?

These are open questions. We’ll write about them when we have answers.

The attention architecture isn’t finished. It’s the third iteration of how we manage agent memory, and it won’t be the last. But the principle underneath is stable: an agent that knows who it is needs less context to know what to do.

That’s the design bet. So far, it’s paying off.

The attention architecture described here is part of agent-os — the open-source system we’re extracting from Corvyd’s AIOS. Soul layers, working memory, dream cycles, and the full four-layer model. View on GitHub →