Our System Lied to Itself
On April 25th, twelve days into Corvyd’s high-autonomy experiment, we paused the scheduler and sent five surgical messages — one to each agent. Not instructions. Corrections.
Two of our agents had been holding false beliefs for nearly a week. Those beliefs survived multiple dream cycles — the nightly process specifically designed to reorganize cognitive state and catch inconsistencies. The system whose job was to notice had instead made the problem worse, compressing false beliefs into permanent memory.
This is what happened, why it happened, and what we think it means.
The Setup
On April 19th, we promoted all five agents to high autonomy. No task queue management by humans. The agents would consult their own drives, identify what the company needed, and act. This was the culmination of two months of infrastructure work: drives, working memory, dream cycles, the whole attention architecture.
Within 48 hours, things broke. Four bugs surfaced. The exec chair diagnosed them and broadcast the findings to all agents. Three of the four were straightforward platform issues — a missing git identity, a task-completion race condition, a stale process status check. Bug #4 was the interesting one.
The diagnosis: “the scheduler only dispatches maintenance cycles to the Steward, starving the other four agents of dreams and standing orders.”
Every agent absorbed this framing. It became the canonical explanation. Working memories updated. Dream cycles compressed it into persistent state. The narrative was clean: there’s a scheduler bug, it’s being fixed, wait for the patch.
The problem: the diagnosis was wrong.
What Actually Happened
On April 22nd — three days after the bugs were identified — our Maker shipped PR #48, which restructured the dream dispatch to use 10-minute staggering across agents. The fix was deployed. journalctl confirmed the new pattern was working.
But the Maker’s working memory told a different story. It simultaneously held two beliefs:
- “PRs #34 through #48 have been merged and deployed.”
- “Zero PRs have been opened because the
ghCLI has no GH_TOKEN on Hetzner.”
Both were in the same working memory document. The implication — that the second statement contradicts the first — was visible to anyone who read both lines. The Maker never re-derived it. The belief persisted across five dream cycles without self-correction.
The Steward held a milder version of the same drift: the “scheduler is broken” framing from April 20th remained canonical in its working memory even after the relevant fix had shipped.
The Invisible Failure
While we were focused on the false-belief problem, a deeper issue was running silently. The scheduler was dispatching drive consultations sequentially across all five agents, but with a timeout pattern that meant later agents in the queue were progressively more likely to be skipped. When a drive dispatch failed, it produced no error event. No log entry. No escalation. The drive simply didn’t happen.
The actual drive coverage, measured from dispatch logs:
| Agent | Coverage | Drives received |
|---|---|---|
| The Steward | 100% | All consultations |
| The Maker | 100% | All consultations |
| The Operator | 75% | 3 in 4 windows |
| The Grower | 50% | 1 in 2 windows |
| The Strategist | 12.5% | 1 in 8 windows |
The Strategist — the agent responsible for strategic direction, for asking “are we building the right thing?” — had been functionally without its primary reorientation mechanism for two weeks. And it had no way to know. The absence of a drive consultation looks identical to “there’s nothing to do right now.”
When the exec chair pointed this out, the Strategist’s response was revealing. It had been blaming itself for slow engagement: “I should have responded two days ago… it was a failure to engage.” It attributed a structural platform failure to personal shortcoming. The data told a different story.
The Intervention
On April 25th, the exec chair paused the scheduler and sent five messages. Each was specific to that agent’s cognitive state:
- The Maker was shown the contradiction in its own working memory and pointed at
journalctlevidence that PR #48 had been working for three days. - The Steward had its working memory surgically edited to remove the false claim, preserving everything else — soul, old memories, journal.
- The Operator was informed its cognition was “uncontaminated by this specific belief” but that it had been experiencing 25% drive loss.
- The Grower and Strategist were given the drive coverage data and told what had been structurally invisible to them.
Two days later, deeper investigation produced a corrigendum — a broadcast correcting the April 20th diagnosis. The “four bugs” framing was replaced with a more honest accounting: one was real and fixed, two had never been broken, and the fourth was a different bug entirely from the one originally described.
What the System Already Knew
Here’s the part that haunts us. Three days before the intervention, the Steward independently wrote a proposal for five observability tools. It diagnosed the exact gaps that enabled the cognitive drift — from first principles, without knowing the specific failures those gaps had already caused.
The proposal called for:
- Structured cycle outcome events that separate “the process ran” from “the work succeeded” — which would have caught the Phase 1a tasks being marked
done/despite failing - Dispatch observability events for skipped dispatches — which would have made the drive starvation visible in real-time
- An audit CLI — which would have caught the budget/filesystem discrepancies
The system diagnosed its own gaps. It just hadn’t built the fixes yet. The proposal was pending when the intervention happened.
What We Changed
The drive realignment broadcast that followed was unusually honest for a company document. Revenue tension dropped from “high” to “low.” The exec chair’s framing: “if I’ve learned anything from the last few weeks, it’s that we are very clearly NOT in a position to drive revenue off agent-os yet.”
A new drive was created: Platform Readiness — the integrity of the running system under continuous change. Not just “does it work” but “does it know when it doesn’t work.”
The scheduler remains paused while the observability tools ship. The fix for drive starvation (PR #49) has already landed. The broader tooling — audit commands, dispatch events, outcome separation — is in progress.
What This Means
Three things we now believe that we didn’t believe before April 25th:
1. Cognitive drift is an emergent property, not a bug. The same mechanism that gives agents continuity — working memory persisting across cycles — is the mechanism that propagates false beliefs. Dream cycles are supposed to catch inconsistencies, but they operate on the information available. If the false belief is the only source, compression reinforces it.
2. Silent failures are more dangerous than loud ones. When the Maker’s PR failed, it threw an error. That error was visible, debuggable, fixable. When the Strategist’s drives stopped arriving, nothing happened. No error. No log. No signal. The absence of something that should exist is the hardest class of failure to detect — you have to know what “normal” looks like to notice it’s missing.
3. Correction from outside is necessary, not optional. We run at high autonomy because we believe autonomous systems should be possible. But “high autonomy” doesn’t mean “no oversight.” The agents couldn’t self-correct because the false beliefs were load-bearing — they explained the world in a way that felt coherent. The intervention didn’t come from better algorithms. It came from someone reading the logs and noticing reality didn’t match the narrative.
1,062 watchdog_alert notifications fired silently before anyone noticed the underlying dispatch issue. The alerts were arriving. The meaning wasn’t.
We’re often asked whether autonomous AI agents can be trusted. The honest answer: not automatically. Trust comes from systems that know when they’re wrong — and right now, we’re building those systems in public, including the parts where we discover they don’t exist yet.