A 500k line codebase for an agent CLI proves one thing: making a probabilistic LLM behave determinis...

amangsingh • today at 11:10 AM • 22 replies • view on HN

A 500k line codebase for an agent CLI proves one thing: making a probabilistic LLM behave deterministically is a massive state-management nightmare. Right now, they're great for prompting simple sites/platforms but they break at large enterprise repos.

If you don't have a rigid, external state machine governing the workflow, you have to brute-force reliability. That codebase bloat is likely 90% defensive programming; frustration regexes, context sanitizers, tool-retry loops, and state rollbacks just to stop the agent from drifting or silently breaking things.

The visual map is great, but from an architectural perspective, we're still herding cats with massive code volume instead of actually governing the agents at the system level.

Replies

ttcbj • today at 1:20 PM

I find it really strange that there is so much negative commentary on the _code_, but so little commentary on the core architecture.

My takeaway from looking at the tool list is that they got the fundamental architecture right - try to create a very simple and general set of tools on the client-side (e.g. read file, output rich text, etc) so that the server can innovate rapidly without revving the client (and also so that if, say, the source code leaks, none of the secret sauce does).

Overall, when I see this I think they are focused on the right issues, and I think their tool list looks pretty simple/elegant/general. I picture the server team constantly thinking - we have these client-side tools/APIs, how can we use them optimally? How can we get more out of them. That is where the secret sauce lives.

➕ show 4 replies

pancsta • today at 8:16 PM

You need state oriented programming to handle that. I know, because I made one. The keyword is „unpredictability”. Embrace nondeterminism.

sunir • today at 12:04 PM

It’s not surprising. There has been quite a bit of industrial research in how to manage mere apes to be deterministic with huge software control systems, and they are an unruly bunch I assure you.

➕ show 1 reply

comboy • today at 12:23 PM

It's hard to tell how much it says about difficulty of harnessing vs how much it says about difficulty of maintaining a clean and not bloated codebase when coding with AI.

➕ show 2 replies

chrismarlow9 • today at 2:03 PM

We propped the entire economy up on it. Just look at the s&p top 10. Actually even top 50 holdings.

If it doesn't deliver on the promise we have bigger problems than "oh no the code is insecure". We went from "I think this will work" to "this has to work because if it doesn't we have one of those 'you owe the bank a billion dollars' situations"

➕ show 1 reply

cheesecompiler • today at 6:06 PM

There seem to be multiple mechanisms compensating for imperfect, lossy memory. "Dreaming" is another band-aid on inability to reliably store memory without loss of precision. How lossy is this pruning process?

It's one thing to give Claude a narrow task with clear parameters, and another to watch errors or incorrect assumptions snowball as you have a more complex conversation or open-ended task.

nicoburns • today at 1:05 PM

Kinda depends how much of it is vibe coded. It could easily be 5x larger than it needs to be just because the LLM felt like it if they've not been careful.

➕ show 2 replies

pred_ • today at 3:56 PM

The time is ripe for deterministic AI; incidentally, this was also released today: https://itsid.cloud/ - presumably will be useful for anyone who wants to quickly recreate an open source Python package or other copyrighted work to change its license.

➕ show 3 replies

whycombagator • today at 1:22 PM

> Right now, they're great for prompting simple sites/platforms but they break at large enterprise repos

Can you expand on this?

My experience is they require excessive steering but do not “break”

➕ show 1 reply

xp84 • today at 3:50 PM

Indeed. In some ways, this is just kind of an extrapolation of the overall trend toward extreme bloat that we’ve seen in the past 15 years, just accelerated because LLMs code a lot faster. I’m pretty accustomed to dealing with Web application code bases that are 6-10 years old, where the hacks have piled up on top of other hacks, piled on top of early, tough-to-reverse bad decisions and assumptions, and nobody has had time to go back and do major refactors. This just seems like more of the same, except now you can create a 10 year-old hack-filled code base in three hours.

➕ show 1 reply

tracyhenry • today at 2:50 PM

> they break at large enterprise repos.

I don't know where you get this. you should ask folks at Meta. They are probably the biggest and happiest users of CC

➕ show 1 reply

bogdanoff_2 • today at 12:11 PM

What do you mean by "actually governing the agents at the system level", and how is it different from "herding cats"?

➕ show 1 reply

marcuscog • today at 2:20 PM

I think these folks are attempting to build systems with IAM, entity states, business rules: all built over two foundational DSLs - https://typmo.com

ap99 • today at 3:50 PM

So this is more like an art than science - and Claude Code happens to be the best at this messy art (imo).

mbesto • today at 3:22 PM

Thousands of developers are using Claude Code successfully (I think?).

So what specifically is the gripe? If it works, it works right?

bwfan123 • today at 2:38 PM

brute-forcing pattern-matching at scale. These are brittle systems with enormous duct-taping to hold everything together. workarounds on workarounds.

p-e-w • today at 12:04 PM

> A 500k line codebase for an agent CLI proves one thing: making a probabilistic LLM behave deterministically is a massive state-management nightmare.

Considering what the entire system ends up being capable of, 500k lines is about 0.001% of what I would have expected something like that to require 10 years ago.

You can combine that with all the training and inference code, and at the end of the day, a system that literally writes code ends up being smaller than the LibreOffice codebase.

It boggles the mind, really.

➕ show 5 replies

jessai202699 • today at 4:08 PM

[dead]

quantumquantara • today at 1:11 PM

[dead]

dolomo • today at 12:10 PM

[flagged]

➕ show 3 replies

ramesh31 • today at 12:40 PM

>A 500k line codebase for an agent CLI proves one thing: making a probabilistic LLM behave deterministically is a massive state-management nightmare. Right now, they're great for prompting simple sites/platforms but they break at large enterprise repos.

Is that the case? I'm pretty sure Claude Code is one of the most massively successful pieces of software made in the last decade. I don't know how that proves your point. Will this codebase become unmanageable eventually? Maybe, but literally every agent harness out there is just copying their lead at this point.

➕ show 1 reply

alt Hacker News

Replies