I wonder if the future in ~5 years is almost all local models? High-end computers and GPUs can already do it for decent models, but not sota models. 5 years is enough time to ramp up memory production, consumers to level-up their hardware, and models to optimize down to lower-end hardware while still being really good.
Plus a long queue of yet-undiscovered architectural improvements
I'm hoping so. What's amazing is that with local models you don't suffer from what I call "usage anxiety" where I find myself saving my Claude usage for hypothetical more important things that may come up, or constantly adjusting prompts and doing some manual work myself to spare token usage.
Having this power locally means you can play around and experiment more without worries, it sounds like a wonderful future.
A lot of manufacturers are bailing on consumer lines to focus on enterprise from what I've read. Not great.
Even without leveling up hardware, 5 years is a loooong time to squeeze the juice out of lower-end model capability. Although in this specific niche we do seem to be leaning on Qwen a lot.
Opensource or local models will always heavily lag frontier.
Who pays for a free model? GPU training isn't free!
I remember early on people saying 100B+ models will run on your phone like nowish. They were completely wrong and I don't think it's going to ever really change.
People always will want the fastest, best, easiest setup method.
"Good enough" massively changes when your marketing team is managing k8s clusters with frontier systems in the near future.