An LLM can modify the code, rebuild and restart the next iteration, bring it up to a known state and run tests against that state before you've even finished typing in the code. It can do this over and over while you sleep. With the proper agentic loop it can even indeed inject code into a running application, test it, and unload it before injecting the next iteration. But there will be much less of a need for that kind of workflow. LLMs will probably just run in loops, standing up entire containers or Kubernetes pods with the latest changes, testing them, and tearing them down again to make room for the next iteration.
As for hallucinations, I believe those are like version 0 of the thing we call lateral thinking and creativity when humans manifest it. Hallucinations can be controlled and corrected for. And again—you really need to spend some time with the paid version of a frontier model because it is fundamentally different from what you've been conditioned to expect from generative AI. It is now analyzing and reasoning about code and coming back with good solutions to the problems you pose it.
Ah, so I need to pay 100s of $ and use the "frontier" model, which is always a moving BS excuse. Last month Opus 4.5 was the frontier, gotta use it, now it's 4.6, and none of them so far have produced anything consistently good.
It is NOT reasoning about code. It's a glorified autocomplete that wastes energy. Associating "reasoning" to it is an antropomorphizateion.
And calling hallucinations "lateral thinking" is a fucking stretch.
"Let's use tool `foo` with flag `-b`" even if the man page doesn't even mention said flag.
Sure, they might be able to create numerous iterations of containers, testing them, burning resources....but that is literally a thousand monkeys smashing their heads on typewriters to crank out 4chan posts.