Are there any indications that this will be possible? Consumer hardware will continue getting better but I can't see 512GB RAM in a MacBook Pro any time soon. I'm hoping linear attention techniques plus MoE will make breakthroughs in size/compression and throughput.
Certainly not any time soon, but I have faith it'll happen one day.
Well, we're probably not going to be running frontier models anytime soon, but I think the general assumption is smaller models will continue to improve until they're sufficiently good frontier models aren't needed.
There's potentially also augmentation through tools, harnesses and RAG to help boost how well they work without tons of parameters.