Partially why I’m surprised there isn’t more focus on coding harnesses that lean towards strong typing / testing / quasi formal verification type paradigms
If you could funnel it through something like that then the ability to generate vast amounts of code is a lot more commercially useful
100%
This is what explains the difference in using apps like Claude Code versus almost any other harness/wrapper.
And the model can be the same, but if the harness sucks then the usefulness of the harness+model tanks.
It's like harness * model = usefulness.