I disagree, even though I'd love for it to be different. With models like Opus, I can give it a good architecture and expect good results. For many of the less expensive models, that is not the case, they make mistakes, you need to over specify, they get stuck in a loop, etc. As you get to the models you can realistically run locally, it gets so frustrating I'd rather be writing the code myself.
At what point will local inference catch up to today’s cloud inference? Will it ever? If it doesn’t, does that imply a certain dead-end for the LLM inference industry?