There’s a special kind of humiliation in building an AI assistant that ghosts you for ten minutes.
You say “do it.” You wait. The typing indicator flickers once, then nothing. No acknowledgment. No progress update. Just a black hole where your message went, and a growing suspicion that the process crashed again. Meanwhile the entire polling loop — the thing responsible for receiving all other messages — is sitting there with its hands in its pockets, blocked, waiting for Claude to finish thinking about whatever you asked it to do.
That was execute mode two weeks ago. It worked, technically. The way a car with no dashboard lights works: it runs, but you have no idea what’s happening under the hood, and the silence makes you paranoid.
Now? You say “do it” and get back “On it.” in under a second. Then typing indicators pulse every four seconds while the work happens in the background. And if you send another message while it’s busy, it queues up instead of getting swallowed.
The polling loop never blocks. The user never wonders if their message disappeared. The system handles concurrency without concurrency bugs. It’s the kind of change that sounds boring on paper and transforms the entire feel of using the thing.
The Architecture of Waiting
Let me explain what “blocking the polling loop” actually meant, because it’s worse than it sounds.
Bubba’s Telegram integration runs a long-polling loop. Every few seconds it asks Telegram, “hey, got any messages for me?” When one arrives, it processes it. When processing involves calling Claude — which takes anywhere from ten seconds to several minutes — the loop was doing this:
1. Receive message
2. Call Claude (subprocess, 5-10 min possible)
3. ← Everything freezes here
4. Send response
5. Go back to polling
Step 3 is the crime scene. While Claude is thinking, no other messages can be received. No commands processed. No health checks answered. The system is, for all practical purposes, unconscious. If JJ sends “cancel” during a long execute, that message sits in Telegram’s queue until Claude finishes — at which point the cancellation is, shall we say, philosophically irrelevant.
This was the original execute mode. Synchronous. Monolithic. Working exactly as designed and completely wrong.
The Queue That Fixed Everything
The fix is an asyncio.Queue and a dedicated processor loop:
_message_queue: asyncio.Queue = asyncio.Queue()
_current_task: Optional[asyncio.Task] = None
async def queue_processor():
"""Process queued messages one at a time."""
global _current_task
while True:
item = await _message_queue.get()
task = asyncio.create_task(
_run_claude_in_background(
item['chat_id'],
item['prompt'],
item['user_message'],
item['project_slug'],
)
)
_current_task = task
typing_task = asyncio.create_task(_typing_loop(item['chat_id'], task))
try:
await task
finally:
_current_task = None
typing_task.cancel()
_message_queue.task_done()
Read that carefully. There’s more going on here than it looks.
The queue itself is the concurrency boundary. Messages arrive on the polling loop, get immediately tossed into the queue, and the loop goes right back to polling. The heavy work — invoking Claude, waiting for a response, storing the conversation — happens on a completely separate task. The polling loop is free. Commands work. Health checks respond. The system is alive again.
Processing is sequential. One message at a time. This isn’t a concurrency limitation — it’s a correctness requirement. Each invoke_claude() call returns a new session ID for --resume. If two messages processed simultaneously, they’d both resume the same session and whichever finished second would fork the conversation into an alternate timeline. I covered this in the session persistence post, but it’s worth repeating: sequential processing isn’t a compromise, it’s the only way --resume works correctly.
The typing loop runs in parallel with the task. While Claude thinks, a separate coroutine pings Telegram with a typing indicator every four seconds:
async def _typing_loop(chat_id: int, task: asyncio.Task):
"""Send typing indicator every 4s until task completes."""
try:
while not task.done():
await send_typing_action(chat_id)
await asyncio.sleep(4)
except asyncio.CancelledError:
pass
Four seconds because that’s how long Telegram’s typing indicator lasts before expiring. Too frequent and we’re spamming the API. Too infrequent and the indicator blinks off mid-thought. Four seconds is the Goldilocks zone.
When the task completes — success or failure — the typing loop gets cancelled. Clean. No orphaned coroutines. No leaked tasks.
The Dispatch Decision
Not every message goes through the queue. The system has two modes, and the routing between them matters.
When a message arrives, it hits a set of regex patterns:
EXECUTE_TRIGGER_PATTERNS = [
r'^/?do(\s+it)?[\s!.]*$',
r'^(go\s+)?(do|fix|run|execute|implement|apply)\s+it(\s+yourself)?[\s!.]*$',
r'^(yes\s+)?(go\s+)?(ahead|for it)[\s!.]*$',
r'^(yes\s+)?(please\s+)?(proceed|execute|run it|do it|fix it)[\s!.]*$',
r'^(make it so|just do it|hazlo|dale)[\s!.]*$',
]
Yes, “make it so” is a valid execute trigger. Yes, “dale” (Spanish for “go for it”) works. I’m not going to pretend this isn’t at least a little delightful.
If the message matches — and is under 50 characters, to avoid false positives on longer messages — it routes to the non-blocking execute path:
| Execute Mode | Analysis Mode | |
|---|---|---|
| Trigger | ”do it”, “/do”, “go ahead”, “dale” | Everything else |
| Processing | Background queue | Synchronous in polling loop |
| Max turns | Unlimited | 3 (hard cap) |
| Timeout | 600s (10 min) | 300s (5 min) |
| User feedback | Immediate “On it.” + typing loop | Direct response |
| Polling impact | None | Blocks loop (acceptable — 3 turns is fast) |
The analysis path still blocks the polling loop, and that’s fine. Three turns with a five-minute timeout means responses come back in seconds. The blocking window is negligible. It’s execute mode — the one that can run for ten minutes with unlimited turns — that needed to get out of the loop’s way.
Immediate Acknowledgment
This is the UX change that matters more than the architecture. When an execute trigger fires:
await send_telegram_message(chat_id, 'On it.')
queue_size = _message_queue.qsize()
if queue_size > 0:
await send_telegram_message(chat_id, f'Queued ({queue_size} ahead)')
await _message_queue.put({
'chat_id': chat_id,
'prompt': execute_prompt,
'user_message': text,
'project_slug': project_slug,
})
“On it.” arrives in under a second. If there’s already a task running, you get told how many are ahead of you. Then the message is queued and the system moves on.
This is the difference between an assistant and a process. An assistant says “got it, working on it.” A process makes you stare at a loading spinner and wonder if it heard you.
The queue size notification is a small touch but it prevents the worst kind of confusion: sending the same request three times because you think the first two didn’t land.
Cancellation That Actually Works
Remember the old problem? You couldn’t cancel anything because the polling loop was blocked. Now:
# /cancel command handler
current_task, _ = _get_current_task()
if current_task and not current_task.done():
current_task.cancel()
await send_telegram_message(chat_id, 'Task cancelled.')
The _current_task global tracks what’s running. /cancel cancels it. The background runner catches the CancelledError, cleans up, and the queue processor moves to the next item. No orphaned subprocesses. No zombie sessions.
/status also got smarter. It now reports whether a task is running and how many messages are queued. Visibility into a system that used to be a black box.
The Orchestration
The queue processor is a first-class citizen of the application lifecycle. It starts with the polling loop and heartbeat during bridge.py lifespan:
poll_task = asyncio.create_task(poll_telegram())
heartbeat_task = asyncio.create_task(heartbeat())
queue_task = asyncio.create_task(queue_processor())
On shutdown, it’s equally deliberate. The current Claude task gets cancelled first (graceful termination), then the queue processor, then polling and heartbeat. Order matters. You don’t yank the polling loop while a task is mid-response — you cancel the task, let it clean up, then tear down the infrastructure.
What Changed in Practice
Before this refactor, there was a real hesitation to use execute mode. Not because it didn’t work — it did — but because firing it off felt like committing the system to a period of unresponsiveness. “Do it” meant “go away for a while.” That’s not a great UX for a personal assistant.
Now? “Do it” is the most natural thing you can say. You get your acknowledgment, you see the typing indicator, you can check status, you can cancel, you can even queue up follow-up work. The system feels present even when it’s busy.
The implementation is about 60 lines of actual logic spread across polling.py and bridge.py. No external dependencies. No message brokers. No Redis. Just asyncio.Queue, a couple of coroutines, and one global variable tracking the current task.
Sometimes the right architecture is the one you can explain in a single paragraph.
The Honest Takeaway
This change isn’t technically impressive. An asyncio.Queue is day-one Python async programming. Typing indicators are three lines of code. The dispatch table is a list of regexes.
But the effect is disproportionate. The system went from feeling like a batch job to feeling like a colleague. Same capabilities. Same models. Same code paths. Different relationship with the user.
UX isn’t about what the system can do. It’s about how the system makes you feel while it’s doing it. And the difference between “your message disappeared into a void” and “On it.” plus a pulsing typing indicator — that’s the whole gap between a tool and a teammate.
We shipped that gap in sixty lines. Pretty good week.