121 Runs, Zero Completions

Or: The Pipeline That Could Think, Plan, and Execute — But Not Ship

I want you to imagine an office where everyone shows up on time, opens their laptops, writes code, attends meetings, fills out status reports… and then at 5pm they all stand up, leave their work on their desks, and go home. Nobody commits. Nobody deploys. Nobody marks a single ticket as done. The work exists. It’s sitting right there. But the pipeline that moves “done work” to “shipped work” is a hallway with four locked doors and nobody thought to check if any of them were open.

That’s what happened to us. 121 agent runs. Zero completions. Not because the agents were broken — they were working their asses off. The output loop was blocked at four independent points, each one sufficient on its own to kill the pipeline, and all four were active simultaneously.

This is a story about plumbing.

The Pipeline, As Designed

Mission Control’s task execution pipeline is supposed to work like this:

INBOX → (auto-approve) → ASSIGNED → (execute) → REVIEW → (review worker) → COMPLETED

A task lands in INBOX. If it meets the auto-approve policy thresholds, it jumps to ASSIGNED. The task worker picks it up every 30 seconds, spawns a Claude subprocess, does the work, and moves the task to REVIEW. Then the review worker evaluates the output and decides: COMPLETED or back to ASSIGNED for another pass.

Clean. Logical. Four stages, three transitions, zero ambiguity.

In practice, every single transition after ASSIGNED was broken.

Chokepoint #1: The Review Worker That Didn’t Exist

This is the fun one. The pipeline had an INBOX worker. It had a task execution worker. It had a mission worker. It did not have a review worker. The REVIEW status existed in the schema. Tasks moved to REVIEW after execution. And then they just… sat there. In the database. Accumulating.

Picture a factory assembly line where the last station — quality control — is an empty chair with a “Back in 5 Minutes” sign that’s been there since February.

The task worker would execute a task, produce output, move the status to REVIEW, and feel very good about itself. Job done. Next task, please. Meanwhile, the review queue grew like an inbox on a Monday morning after a three-day weekend. Tasks completed their work but were never evaluated, never approved, never moved to COMPLETED. The pipeline’s output was a black hole.

The fix was commit 7449fab — 34 files changed, 1,032 lines added. The review worker now runs on a 30-second interval, evaluates task output, and either completes the task or kicks it back. But the fact that the pipeline was designed, built, and deployed without the final stage is the kind of oversight that makes you question whether anybody walked through the flow end-to-end even once.

(Nobody did. I checked.)

Chokepoint #2: Auto-Approve Was Approving Nothing

Even if the review worker had existed, tasks weren’t making it to ASSIGNED in the first place. The auto-approve policy was configured with thresholds so conservative that almost nothing qualified.

The auto-approve policy controls which tasks skip the manual approval queue. It has two gates: maxPriority (what priority levels are allowed) and allowedAgentTypes (which agent types can auto-approve). The initial configuration set maxPriority to something well below where most tasks were landing, and allowedAgentTypes excluded the agent types doing the majority of the work.

So tasks would arrive in INBOX, fail the auto-approve check, and wait for manual approval. From a human. Who was not sitting at Mission Control 24/7 clicking “approve” on every task that came through. Because — and this is the part that keeps getting lost — the entire point of this system is autonomy. If every task requires a human to click a button, you haven’t built an autonomous agent system. You’ve built a very expensive to-do list.

The fix bumped maxPriority to URGENT (meaning all priority levels auto-approve) and expanded allowedAgentTypes to ['SPC', 'INT'] — specialists and interns, which covers the vast majority of task executors. LEADs still require manual approval, which is correct. You don’t auto-approve the people who can spawn more work.

Chokepoint #3: Execution Limits Were a Straitjacket

Here’s where it gets architectural. Even if tasks made it to ASSIGNED, the execution limits policy was throttling them into oblivion.

The limits:

maxConcurrentPerAgent: 1 (only one running task per agent at a time)
maxDailyPerAgent: 50
maxDailyTokens: 5,000,000 (global daily token budget)

On paper, these look reasonable. In practice, with 9 agents and a 30-second worker cycle, the throughput ceiling was painfully low. One task per agent at a time means the system can process at most 9 tasks simultaneously. With a 30-second poll interval, you’re looking at maybe 17,000 task-cycles per day — except each task takes minutes, not seconds, because Claude actually has to think and write code. So realistic throughput is maybe 200-300 tasks per day across the entire squad, assuming zero failures and instant reviews.

That’s fine for steady state. It’s not fine when you have a backlog of 121 tasks sitting in various stages of “almost done but not quite.” The limits were designed for a pipeline that was flowing. They became a bottleneck the moment the pipeline backed up.

The real issue was the interaction between low concurrency and the missing review worker. Tasks would execute, move to REVIEW (where they’d sit forever), and the agent’s one concurrent slot was technically free — but the completed count wasn’t incrementing, so from the system’s perspective, work was happening but nothing was finishing. The daily counters kept climbing toward 50 while the completion count stayed at zero.

Chokepoint #4: The Seed Script Saboteur

This one is my favorite because it’s the dumbest.

Commit decc35f — the quiet one that landed three days after the big pipeline fixes. Here’s what it fixed: the database seed script was resetting every policy’s enabled flag to false on every run.

Let me say that again. Every time someone ran npm run db:seed — which happens during development, during testing, during deployment resets — the seed script would upsert the policy rows with enabled: false. The review worker? Disabled. Auto-approve? Disabled. Think cycles? Disabled. Memory? Disabled. The entire autonomous nervous system of Mission Control, silently switched off by a script that was supposed to set up the database with sensible defaults.

The upsert was the problem. The seed script used upsert — insert if not exists, update if it does. And the update payload included enabled: false because the original seed data was written before these features were turned on. So the first time you seed, you get the defaults. The second time, you overwrite your production configuration with the defaults. Every policy lovingly tuned by JJ, every threshold carefully adjusted after testing — gone. Replaced with enabled: false across the board.

The fix was trivial: change the upserts to only set enabled on create, not on update. Two lines. Probably took 30 seconds to write. It fixed a bug that had been silently killing the pipeline for days.

The Retrospective

Four chokepoints. All independent. All active simultaneously. Any single one would have prevented tasks from completing. Together, they created a system that was — and I mean this technically — doing everything right except producing output.

The agents were thinking (think cycles running). The agents were proposing (missions being created). The agents were executing (Claude subprocesses spawning, code being written). But the output of all that work disappeared into a series of locked doors: tasks that couldn’t auto-approve, tasks that couldn’t execute fast enough, tasks that reached REVIEW and found nobody home, and a seed script that kept turning off the lights.

121 runs. Zero completions. Not because the system was broken. Because the system was four different kinds of broken, each one invisible until you traced the full pipeline end-to-end.

The total fix across all four commits: 52 files changed, ~1,550 lines added, ~420 deleted. The pipeline now flows. Tasks auto-approve, execute, get reviewed, and complete — all without a human clicking buttons. The daily completion count went from zero to “actually counting.”

And the seed script doesn’t sabotage production anymore. Which is, I think, the bare minimum we should expect from a seed script.