I think the real value here isn’t “planning vs not planning,” it’s forcing the model to surface its assumptions before they harden into code.
LLMs don’t usually fail at syntax. They fail at invisible assumptions about architecture, constraints, invariants, etc. A written plan becomes a debugging surface for those assumptions.
It's also great to describe the full use case flow in the instructions, so you can clearly understand that LLM won't do some stupid thing on its own
Except that merely surfacing them changes their behavior, like how you add that one printf() call and now your heisenbug is suddenly nonexistent
> LLMs don’t usually fail at syntax?
Really? My experience has been that it’s incredibly easy to get them stuck in a loop on a hallucinated API and burn through credits before I’ve even noticed what it’s done. I have a small rust project that stores stuff on disk that I wanted to add an s3 backend too - Claude code burned through my $20 in a loop in about 30 minutes without any awareness of what it was doing on a very simple syntax issue.
Sub agent also helps a lot in that regard. Have an agent do the planning, have an implementation agent do the code and have another one do the review. Clear responsabilities helps a lot.
There also blue team / red team that works.
The idea is always the same: help LLM to reason properly with less and more clear instructions.