I’ve come to the realization that these kind of systems don’t work, and that a human in the loop is ...

stingraycharles • today at 9:28 AM • 3 replies • view on HN

I’ve come to the realization that these kind of systems don’t work, and that a human in the loop is crucial for task planning; the LLM’s role being to identify issues, communicate the design / architecture, etc before it’s handed off, otherwise the LLM always ends up doing not entirely the correct thing.

How is this part tackled when all that you have is GH issues? Doesn’t this work only for the most trivial issues?

Replies

vidarh • today at 6:02 PM

I've come to the opposite conclusions: The big limitation of systems like this is starting and ending with human involvement at the same level, instead of directing at a higher level. You end up quibbling over detail the agents can handle themselves with sufficient guardrails and process, instead of setting higher level requirements and reviewing higher level decisions and outcomes, and dealing with exceptions.

You can afford a lot of extra guardrails and process to ensure sufficient quality when the result is a system that gets improved autonomously 24/7.

I'm on my way home from a client, and meanwhile another project has spent the last 10 hours improving with no involvement from me. I spent a few minutes reviewing things this morning, after it's spent the whole night improving unattended.

➕ show 1 reply

mshark • today at 10:25 AM

Had the same realization which inspired eforge (shameless plug) https://github.com/eforge-build/eforge - planning stays in the developer’s control with all engineering (agent orchestration) handed off to eforge. This has been working well for a solo or siloed developer (me) that is free to plan independently. Allows the developer to confidently stay in the planning plane while eforge handles the rest using a methodology that in my experience works well. Of course, garbage in garbage out - thorough human planning (AI assisted, not autonomous) is key.

➕ show 1 reply

jawiggins • today at 3:22 PM

Maybe - I do think as the model get better they'll be able to handle more and more difficult tasks. And yet, even if they can only solve the simplest issues now, why not let them so you can focus on the more important things?

alt Hacker News

Replies