Show HN: 20+ Claude Code agents coordinating on real work (open source)

34 points • by austinbaggio • today at 4:23 PM • 31 comments • view on HN

Single-agent LLMs suck at long-running complex tasks.

We’ve open-sourced a multi-agent orchestrator that we’ve been using to handle long-running LLM tasks. We found that single LLM agents tend to stall, loop, or generate non-compiling code, so we built a harness for agents to coordinate over shared context while work is in progress.

How it works: 1. Orchestrator agent that manages task decomposition 2. Sub-agents for parallel work 3. Subscriptions to task state and progress 4. Real-time sharing of intermediate discoveries between agents

We tested this on a Putnam-level math problem, but the pattern generalizes to things like refactors, app builds, and long research. It’s packaged as a Claude Code skill and designed to be small, readable, and modifiable.

Use it, break it, tell me about what workloads we should try and run next!

Comments

giancarlostoro • today at 7:16 PM

I feel like there's two camps:

* Throw more agents * Use something like Beads

I'm in the latter, I don't have infinite resources, I'd rather stick to one agent and optimize what it can do. When I hit my Claude Code limit, I stop, I use Claude Code primarily for side projects.

➕ show 2 replies

raniazyane • today at 8:27 PM

I wonder if there’s a third camp that isn’t about agent count at all, but about decision boundaries.

At some point the interesting question isn’t whether one agent or twenty agents can coordinate better, but which decisions we’re comfortable fully delegating versus which ones feel like they need a human checkpoint.

Multi-agent systems solve coordination and memory scaling, but they also make it easier to move further away from direct human oversight. I’m curious how people here think about where that boundary should sit — especially for tasks that have real downstream consequences.

➕ show 1 reply

visarga • today at 6:59 PM

Great work! I like the approach of maximum freedom inside bounded blast radius and how you use code to encode policy.

➕ show 1 reply

miligauss • today at 4:39 PM

It's a more of a black box with claude, at least with this you see the proof strategy and mistakes made by the model when it decomposes the problem. I think instead of Ralph looping you get something that is top-down. If models were smarter and context windows bigger i am sure complex tasks like this one would be simpler, but braking it down into sub agents and having a collective --"we already tried this strategy and it backtracked"-- intelligence is a nice way to scope a limited context window to an independent sub problem.

➕ show 1 reply

yodon • today at 5:26 PM

The first screen of your signup flow asks for "organization" - is that used as a username or as an organization name or both (I can't tell what if anything will be on the next screen)

If your registration process is eventually going to ask me for a username, can the org name and user name be the same?

➕ show 2 replies

clairekart • today at 4:31 PM

What’s the failure mode you see with single-agent Claude Code on complex tasks? (looping, context drift, plan collapse, tool misuse?)

➕ show 1 reply

yodon • today at 4:40 PM

Can you add a license.txt file so we know we have permission to run this (eg MIT and GPL V3 are very different)

➕ show 1 reply

christinetyip • today at 4:49 PM

Cool, what’s a good first task to try this on where it’s likely to beat a single agent?

➕ show 2 replies

slopusila • today at 5:13 PM

seems like it requires an API key to your proprietary Ensue memory system

➕ show 1 reply

zmanian • today at 4:52 PM

“How does progress subscription work — are agents watching specific signals (test failures, TODO list, build status), or just a global feed?”

➕ show 1 reply

alt Hacker News

Show HN: 20+ Claude Code agents coordinating on real work (open source)

Comments