I'd go for at least 32GB+. It'll fit in 24GB but leaves you little to no room for context,...

thewebguyd • last Monday at 5:57 PM • 3 replies • view on HN

I'd go for at least 32GB+. It'll fit in 24GB but leaves you little to no room for context, and that's at 4-bit quantization.

If you want to run unquantized, you definitely need 128GB.

Catloafdev • last Monday at 6:01 PM

Nobody runs unquantized, there's literally no reason to. Q8 would be the largest anyone actually runs on consumer hardware for inference.

➕ show 2 replies

bitexploder • last Monday at 6:43 PM

It also comes down to inference speed, not "can I run this". 8-bit quant is quite a bit slower on an M5 Pro.

gchamonlive • last Monday at 6:42 PM

[dead]

alt Hacker News