Which takes a $20k thunderbolt cluster of 2 512GB RAM Mac Studio Ultras to run at full quality…

reilly3000 • last Wednesday at 10:11 PM • 7 replies • view on HN

Replies

0xbadcafebee • yesterday at 12:46 AM

Most benchmarks show very little improvement of "full quality" over a quantized lower-bit model. You can shrink the model to a fraction of its "full" size and get 92-95% same performance, with less VRAM use.

➕ show 1 reply

deaux • yesterday at 2:59 AM

And that's at unusable speeds - it takes about triple that amount to run it decently fast at int4.

Now as the other replies say, you should very likely run a quantized version anyway.

polynomial • yesterday at 6:23 AM

Depending on what your usage requirements are, Mac Minis running UMA over RDMA is becoming a feasible option. At roughly 1/10 of the cost you're getting much much more than 1/10 the performance. (YMMV)

https://buildai.substack.com/i/181542049/the-mac-mini-moment

➕ show 1 reply

bigyabai • yesterday at 12:23 AM

"Full quality" being a relative assessment, here. You're still deeply compute constrained, that machine would crawl at longer contexts.

PlatoIsADisease • last Wednesday at 11:46 PM

[flagged]

➕ show 3 replies

teaearlgraycold • last Wednesday at 10:13 PM

Which while expensive is dirt cheap compared to a comparable NVidia or AMD system.

➕ show 2 replies

alt Hacker News

Replies