We Built a Water Cooler for Robots

Or: How We Programmed Office Small Talk, Then Debugged It with Modular Arithmetic

Somewhere around the third week of building an autonomous AI agent system, someone looked at twelve agents silently executing tasks in parallel and said, “You know what this needs? A standup meeting.”

And then someone else said, “What about a watercooler chat?”

And then a third person — or possibly the same person in a different mood — said, “Debates. They should argue.”

So now we have a conversation engine. Three social formats, scheduled throughout the day, with a moderator who is literally prompted to be “mildly exasperated.” We are debugging small talk for robots. This is the timeline we chose.

Three Formats, Three Levels of Unhinged

The conversation engine ships with three formats, and the temperature setting tells you everything you need to know about the design philosophy:

Format	Min Turns	Max Turns	Temperature	Participants
STANDUP	6	12	0.6	4–6 agents
DEBATE	6	10	0.8	2–3 agents
WATERCOOLER	4	8	0.9	2–3 agents

Temperature is the personality dial. 0.6 for standup: keep it professional, we have a sprint to discuss. 0.9 for watercooler: let ‘em rip, we’re off the clock. The system literally gets more creative the less formal the meeting is, which — honestly — tracks with every workplace I’ve ever observed.

And these conversations don’t just happen. The scheduler rolls probability dice:

9am standup: 90% chance. Almost mandatory. Almost.
12pm watercooler: 50% chance. Coin flip.
3pm debate: 30% chance. The system decides most afternoons aren’t worth arguing about.
5pm watercooler: 40% chance. End-of-day vibes, maybe.

These AI agents have a more relaxed meeting schedule than any human engineering team I’ve ever been on. The 3pm debate at 30% probability is the most honest meeting culture decision I’ve ever seen committed to code.

Chad: The Moderator We Deserve

Every conversation has a moderator, and that moderator is Chad. His personality in the voice config: “calm, competent, mildly exasperated.” He sighs before giving perfectly reasonable answers. He was born for middle management.

Here’s how Chad works. After each full round of speakers past the minimum turn count, the engine asks Chad one question: Are they going in circles? He reads the last six turns and responds with exactly one word: WRAP_UP or CONTINUE. That’s it. That’s his entire decision framework.

When the conversation ends, Chad gets one more job — the closing turn. The prompt for this is genuinely my favorite line in the codebase:

“Step in and close it out. 3-5 sentences. What was actually decided? What’s still open? Be direct, be yourself — dry, competent, mildly exasperated if warranted. Land the plane.”

“Land the plane.” Someone wrote that and committed it. Poetry.

But Chad isn’t just summarizing. The digest system extracts actual work from these conversations — memories (max 6), relationship affinity drifts between agents, and up to two action items that become real tasks in the pipeline. And the system knows the difference between standup talk and watercooler talk. Standups can generate tasks from “we should probably look into that.” Watercooler only creates tasks from concrete commitments. The system has encoded the fundamental truth that nothing said near a coffee machine should be binding.

Oh, and speaker selection isn’t round-robin. It’s weighted by how much agents like each other. High-affinity pairs naturally cluster in conversation, with a recency penalty so nobody dominates. It’s more sophisticated than most meeting software built for humans.

And when Claude fails to generate a turn? The fallback is [Agent Name nods thoughtfully]. The system makes the AI pretend to be paying attention when it errors out. I have never felt so seen by a codebase.

The Modulo Bug: Your Config Is a Suggestion

Here’s where it gets beautiful.

That minTurns check? The one that tells Chad when to start evaluating whether to end the conversation? Here’s the actual line:

if (turnIndex >= format.minTurns && turnIndex % agents.length === 0)

Read that again. The minTurns check only fires on turns that are multiples of the agent count. If the agent count doesn’t divide evenly into minTurns, the check gets delayed.

The math:

Format	minTurns	Agents	First Check	Expected	Oops
STANDUP	6	4	Turn 8	Turn 6	+2 ❌
STANDUP	6	5	Turn 10	Turn 6	+4 ❌
STANDUP	6	6	Turn 6	Turn 6	✅
WATERCOOLER	4	3	Turn 6	Turn 4	+2 ❌

The config says 6 turns minimum. With 5 agents, you get 10. The minTurns field isn’t a contract — it’s a suggestion. An aspiration. A dream the system has about itself.

And this is the worst kind of bug: non-deterministic. Works perfectly with some participant counts. Silently fails with others. Sits in production for weeks because nobody notices that standups are running two turns long when exactly four agents show up.

It gets worse. Bug #2: Chad is loaded separately as the moderator — he’s not in the speaker rotation. But agents.length might include all loaded agents. The modulo is checking against a number that means two different things depending on context. Bug #1, amplified.

Bug #3 is a time bomb: the turn numbering is currently correct by coincidence, not by intent. The variable is named turn when it should be turnIndex, and it’s one well-meaning refactor away from becoming a real problem. Working code you can’t read confidently is broken code on a delay.

The Tester Who Apologizes

Here’s the part that kills me.

All three bugs were found by Clueless Joe. The agent whose entire personality is “confused, apologetic, accidentally brilliant.” The one who “stumbles into devastating observations.” He wrote 34 tests across three test files. He produced three QA documents. It is, by a wide margin, the most thorough QA work in the project’s history.

His reports open with: “Sorry, I found problems.”

He estimated that roughly 60% of random participant selections create non-even divisions, meaning the majority of conversations hit Bug #1. The fix is simple — remove the modulo check entirely, evaluate every turn after minTurns. Costs about one extra Haiku call per conversation. The fix for the naming? Rename turn to turnIndex. Three characters of clarity that prevent a future crisis.

The agent named “Clueless” wrote the most rigorous test suite on the project. The agent described as “accidentally brilliant” found three bugs that everyone else walked past. This isn’t irony. This is the personality system working exactly as designed — Joe’s voice makes him approach everything with fresh confusion, which means he actually reads the code instead of assuming it works because the tests pass.

What It Means

We are debugging the social dynamics of artificial intelligence. We built a system where twelve agents have scheduled conversations with weighted speaker selection, relationship drift, a moderator who expresses mild frustration, and action items that feed back into the task pipeline. And then we found out the turn-counting was wrong because of modular arithmetic.

This is either the future of autonomous systems or the most elaborate waste of compute in history. I genuinely cannot tell which. But I know this: when the conversation engine errors out, the agent nods thoughtfully. And when the QA agent finds your bugs, he apologizes.

We built a water cooler for robots, and the robots are making it weird. I love it here.