logoalt Hacker News

overgardtoday at 12:56 AM1 replyview on HN

My feeling is that the code it generates is locally ok, but globally kind of bad. What I mean is, in a diff it looks ok. But when you start comparing it to the surrounding code, there's a pretty big lack of coherency and it'll happily march down a very bad architectural path.

In fairness, this is true of many human developers too.. but they're generally not doing it at a 1000 miles per hour and they theoretically get better at working with your codebase and learn. LLMs will always get worse as your codebase grows, and I just watched a video about how AGENTS.md actually usually results in worse outcomes so it's not like you can just start treating MD files as memory and hope it works out.


Replies

jihadjihadtoday at 2:49 AM

> But when you start comparing it to the surrounding code, there's a pretty big lack of coherency and it'll happily march down a very bad architectural path.

I had an idea earlier this week about this, but haven’t had a chance to try it. Since the agent can now “see” the whole stack, or at least most of it, by having access to the repos, there’s becoming less of a reason to suspect they won’t be able to take the whole stack into account when proposing a change.

The idea is that it’s like grep: you can call grep by itself, but when a match is found you only see one line per match, not any surrounding context. But that’s what the -A and -B flags are for!

So you could tell the agent that if its proposed solution lies at layer N of the system, it needs to consider at least layers N-1 (dependencies) and N+1 (consumers) to prevent the local optimum problem you mentioned.

The model should avoid writing a pretty solution in the application layer that conceals and does not address a deeper issue below, and it should keep whatever contract it has with higher-level consumers in good standing.

Anyway, I haven’t tried that yet, but hope to next week. Maybe someone else has done something similar and (in)validated it, not sure!