One thing I’ve noticed with local models is that people tolerate a lot more trial and error behavior. When a hosted model wastes tokens it feels expensive, but when a local model loops a bit it just feels like it’s “thinking.”
If models like Qwen can get good enough for coding tasks locally, the real shift might be economic rather than purely capability.
Wasted tokens are preferred for local models, I need the GPU mainframe in my bedroom to heat it as I live in a third world country with unreliable heating (Switzerland).