It's gotten significantly better with the advent of local/offline MoE models (e.g. qwen3.5...

Casteil • today at 6:55 PM • 0 replies • view on HN

It's gotten significantly better with the advent of local/offline MoE models (e.g. qwen3.5:35b-a3b, qwen3:30b-a3b, gpt-oss:20b-3.6b), which offer a good balance of prompt response speed and output quality.

'Dense' models of yesteryear (e.g. llama:70b, gemma2/3:27b) tend to be significantly slower by comparison, therefore, your hardware spends a lot more time 'maxed out' for a given prompt.

alt Hacker News