is local llm inference on modern macbook pros comfortable yet? when i played with it a year or so a...

a-dub • today at 1:43 PM • 1 reply • view on HN

is local llm inference on modern macbook pros comfortable yet? when i played with it a year or so ago, it worked fairly ok but definitely produced uncomfortable levels of heat.

(regarding mlx, there were toolkits built on mlx that supported qlora fine tuning and inference, but also produced a bunch of heat)

Replies

Casteil • today at 6:55 PM

It's gotten significantly better with the advent of local/offline MoE models (e.g. qwen3.5:35b-a3b, qwen3:30b-a3b, gpt-oss:20b-3.6b), which offer a good balance of prompt response speed and output quality.

'Dense' models of yesteryear (e.g. llama:70b, gemma2/3:27b) tend to be significantly slower by comparison, therefore, your hardware spends a lot more time 'maxed out' for a given prompt.

alt Hacker News

Replies