logoalt Hacker News

reilly3000last Friday at 10:51 PM7 repliesview on HN

dang I wish I could share md tables.

Here’s a text edition: For $50k the inference hardware market forces a trade-off between capacity and throughput:

* Apple M3 Ultra Cluster ($50k): Maximizes capacity (3TB). It is the only option in this price class capable of running 3T+ parameter models (e.g., Kimi k2), albeit at low speeds (~15 t/s).

* NVIDIA RTX 6000 Workstation ($50k): Maximizes throughput (>80 t/s). It is superior for training and inference but is hard-capped at 384GB VRAM, restricting model size to <400B parameters.

To achieve both high capacity (3TB) and high throughput (>100 t/s) requires a ~$270,000 NVIDIA GH200 cluster and data center infrastructure. The Apple cluster provides 87% of that capacity for 18% of the cost.


Replies

mechagodzillalast Friday at 10:59 PM

You can keep scaling down! I spent $2k on an old dual-socket xeon workstation with 768GB of RAM - I can run Deepseek-R1 at ~1-2 tokens/sec.

show 6 replies
icedchailast Friday at 11:19 PM

For $50K, you could buy 25 Framework desktop motherboards (128G VRAM each w/Strix Halo, so over 3TB total) Not sure how you'll cluster all of them but it might be fun to try. ;)

show 2 replies
3abitonlast Friday at 11:45 PM

What's the math on the $50k nvidia cluster? My understanding these things cost ~$8k and you can at least get 5 for $40k, that's around half a tb.

That being said, for inference mac still remain the best, and the M5 Ultra will even be a better value with its better PP.

show 1 reply
FuckButtonslast Friday at 11:38 PM

Are you factoring in the above comment about as yet un-implemented parallel speed up in there? For on prem inference without any kind of asic this seems quite a bargain relatively speaking.

conradevlast Friday at 11:45 PM

Apple deploys LPDDR5X for the energy efficiency and cost (lower is better), whereas NVIDIA will always prefer GDDR and HBM for performance and cost (higher is better).

show 1 reply
dsrtslnd23last Saturday at 1:27 PM

what about a GB300 workstation with 784GB unified mem?

show 2 replies
yieldcrvlast Saturday at 4:18 AM

15 t/s way too slow for anything but chatting, call and response, and you don't need a 3T parameter model for that

Wake me up when the situation improves

show 1 reply