You don't magically get better results by spending 10x more on a model. If your prompt is crap and harness is crap, you get crap results, regardless of model. And if you run into limits, you aren't working at all.
Buying the most expensive circular saw doesn't get you the best woodworking, but it is the most expensive woodworking.
Not really true. Remember the prompt engineering craze a few years ago with crazy complex prompt composers (langchain) that don’t need to exist any more because the underlying model got so much better at understanding what the humans are actually asking for?