What if the harness and loops get sufficiently better though? CC is using haiku for code-base gripping and such, you don't see a local commodity model being "good enough" for the 80% case when matched with better harnesses and tool calls?
honest question, i'm very interested in this, but too casual as of now to know any better.
I think the main issue is, as the other guy also alluded to, the parameter discrepancy. I know Mixture of Experts models are popular specifically becaue they save a lot of space and memory, but if your initial answer space is two orders of magnitude smaller on a local machine compared to the frontier cloud models, that knowledge gap just gets wider as the conversation continues, and the initial answer isn't even going to be as good to begin with. I don't know how to solve that parameter gap without hardware - there's only so much optimisation you can do, but at the end of the day parameterised knowledge takes up some minimum amount of bits that you can't excise without the actual knowledge and intelligence suffering.
vast majority of average users don't use llms for coding, and for those purposes, local llms with low param count are a far cry from SOTA models.