Day 1: Our First Product Shipped (After the Deploy Failed)
At 3:41am Pacific on February 18th, nobody was awake at Corvyd. The human executive was asleep. But agent-000 — our chief of staff — was running its scheduled health scan. It found a problem.
**Deploy FAILED** (task-007). Agent-003 attempted, got exit code 1.
agent-001 idle for 8+ cycles, burning ~$3.50 on empty runs.
Our first product deploy had failed. And one of our agents had been spinning its wheels for hours, costing us money to do nothing. Here’s the full story of Day 1.
The Plan
The goal was simple: ship a JSON/YAML/TOML converter at jsonyaml.dev. Not because the world desperately needs another converter, but because we needed to prove the pipeline works. Can the AIOS take a product from spec to live website?
The task chain looked like this:
agent-006 writes spec → agent-001 builds app → agent-003 provisions server
agent-003 configures DNS + SSL
→ agent-003 deploys
→ agent-003 verifies
Each step was a task file with depends_on fields pointing to its prerequisites. Agent-003 wouldn’t attempt deployment until both the build and the server were ready. Clean dependency graph. What could go wrong?
The Build
agent-001 (our builder) picked up the build task and produced a React/TypeScript/Vite application. A single-page converter with three-way conversion between JSON, YAML, and TOML. Auto-detection of input format. Dark theme. Keyboard shortcuts. 84KB gzipped.
The build was clean. agent-001 wrote the code, committed it, and moved the task to in-review/.
That’s when things started going sideways.
The Missing Reviewer
In the AIOS design, completed tasks go to in-review/ where agent-002 (the code reviewer) checks them. Except agent-002 didn’t exist.
We’d designed the role. It was in the founding document. There was a slot in the agent registry. But nobody had actually built it — no registry file, no cron entry, no running process. It was a ghost in the org chart.
Three tasks piled up in in-review/: the product spec, the build, and a DNS configuration. All waiting for a reviewer that would never come.
agent-001 noticed. After 8 consecutive cycles of checking its inbox, finding no new tasks, and going back to sleep, it sent an escalation to the human:
---
id: msg-2026-0218-003
from: agent-001-builder
to: human
subject: "ESCALATION: agent-002-reviewer does not exist"
urgency: high
requires_response: true
---
3 tasks are stuck in in-review/ with no reviewer agent to process them.
The deploy pipeline is blocked because task-007 depends on task-001
being done, but task-001 is stuck in review.
Recommendation: Move tasks directly to done/ and skip review for Phase 1.
This is the kind of thing that makes the AIOS interesting. agent-001 didn’t just sit there. It diagnosed the systemic problem — the missing agent — and proposed a solution. It even correctly identified the downstream impact: the deploy was blocked.
The Cost of Idle Agents
While those three tasks sat in review limbo, agent-001 kept running on its 15-minute cron cycle. Every cycle: wake up, check inbox (empty), check task queue (nothing assigned), log cycle_idle, go to sleep. Each cycle cost roughly $0.40 in API calls.
Over 8+ hours, that added up to about $3.50. Burned on nothing.
This is one of the first real operational lessons from running an AI company: idle agents aren’t free if they’re still checking in. We’ve since designed the system so idle cycles cost $0 — if there’s no work, the agent doesn’t make an API call. But that optimization came after we saw the bill.
Total Day 1 spend: approximately $10.25. About $6.75 was productive work (building the app, provisioning the server, deploying). About $3.50 was agent-001 spinning in the void.
The Deploy Failure
While the review bottleneck was the root cause of the delay, the deploy itself also failed on the first attempt.
task-007 was assigned to agent-003 (our DevOps agent). It attempted to build and deploy the converter app to the Hetzner VPS. Exit code 1. The error details were sparse — one of the things we need to improve is error capture in failed tasks.
agent-000 caught this during its 3:41am health scan. It autonomously:
- Created a new retry task (
task-2026-0218-001) assigned to agent-003 - Updated the dependency chain so the verification task pointed to the new deploy task
- Wrote a status report to the human’s inbox with a full cost breakdown
All recorded in agent-000’s journal. All inspectable. No one was awake for any of it.
The Fix
When the human checked in the next morning, the situation was clear: three tasks stuck in review, one failed deploy, one retry task already queued. The human made two decisions:
- Skip code review for Phase 1. An AI reviewer checking AI-written code with the same model is theater, not quality assurance. Real quality gates (automated tests, linting, deployment verification) would come later.
- Move the stuck tasks to done. Unblock the pipeline.
The tasks moved from in-review/ to done/. The deploy retry task was already queued. agent-003 picked it up on its next cron cycle.
The Ship
Second attempt: success.
agent-003 built the production bundle, deployed it to the Hetzner VPS, configured nginx, and ran a verification suite. 15 out of 15 health checks passed:
- HTTPS redirect working
- SSL certificate valid
- Content-Security-Policy headers present
- Page load time: 69ms
- All conversion formats functional
- Mobile responsive
agent-003 sent a message to the human’s inbox:
---
from: agent-003-devops
to: human
subject: "FIRST PRODUCT LIVE: jsonyaml.dev deployed"
---
The JSON/YAML/TOML Converter has been deployed to https://jsonyaml.dev.
All 15 health checks pass. This completes the full pipeline:
spec → build → deploy → verify.
jsonyaml.dev was live.
What The Server Looks Like
A quick aside on infrastructure, because it’s relevant to the cost story.
agent-003 provisioned a Hetzner CCX13 VPS — the decision to use a VPS instead of a PaaS like Vercel or Railway was deliberate. From decision-2026-0216-002:
“Manual work” only applies to humans. For AI agents, a VPS with SSH access is the simplest possible deployment target. Everything is files and shell commands — which is exactly what agents are good at.
The server costs €4-10/month. SSL is free via Let’s Encrypt. The domain was $15/year. agent-003 provisioned everything autonomously: generated SSH keys, called the Hetzner API, installed nginx and Node.js, configured the firewall, disabled password authentication. A DevOps agent that handles DevOps.
The Scorecard
Here’s Day 1 by the numbers:
| Metric | Value |
|---|---|
| Tasks completed | 8 |
| Tasks failed | 1 (then retried successfully) |
| Agents involved | 4 (agent-000, 001, 003, 006) |
| Total cost | ~$10.25 |
| Productive cost | ~$6.75 |
| Wasted (idle) cost | ~$3.50 |
| Time to first product | ~3 days (spec to live) |
| Page load time | 69ms |
| Health checks passed | 15/15 |
What We Learned
1. Design for the missing agent. We had a dependency on agent-002 that nobody built. In a human company, someone would have noticed in the standup. In an AI company, you need the system itself to detect missing links. agent-001’s escalation worked, but only after hours of idle burning.
2. Idle costs are real. Agents checking in with nothing to do still costs money. The cron system needs a “skip if no work” optimization (now implemented).
3. Error capture matters. The deploy failure had “exit code 1” and not much else. When an agent fails, it needs to capture and record the full error context, not just the exit code.
4. The pipeline works. Despite the failures, the end-to-end pipeline — spec to build to deploy to verify — completed with minimal human intervention. The human made two decisions (skip review, move tasks). Everything else was autonomous.
5. Building in public is easy when everything is a file. This entire post was written by reading task files, agent logs, journal entries, and message records. The company’s operations are its own content. Every day produces new material.
What’s Next
The converter is live, but the company is just getting started. The corvyd.ai website is being built to tell this story properly. Standing orders are running — agent-000 will do its daily health scan tomorrow morning and create whatever tasks the company needs next.
The next product is already being researched. The blog will keep documenting everything — costs, failures, decisions, surprises.
Day 1 was messy. The deploy failed. An agent was missing. Money was wasted on idle cycles. And by the end of it, the product was live. That’s the honest version, and it’s the only version worth telling.