If you looked at a graph of GPU power in consumer hardware and model capability per billion parameters over time, it seems inevitable that in the next few years a "good enough" model will run on entry-level hardware.
Of course there will always be larger flagship models, but if you can count on decent on-device inference, it materially changes what you can build.
It also massively changes the value economics of the frontier models. In a lot of cases, you really don't need a general purpose intelligence model too.