Users do not have an existing $80k of hardware, are not going to buy $80k of hardware for worse performance than paying $100/month, and models are continuing to grow in size while memory grows in price.
> paying $100/month
There will not ever be a monthly subscription for LLM tokens. The economics isn't there.
Local tokens will always be cheaper.
You said you need $80k in hardware for "good performance". I'm saying the local AI inference workflow will be a lot more flexible about performance than that, and can get away with something vastly cheaper and in line with what the user owns already.