logoalt Hacker News

voidsparklast Saturday at 7:17 PM2 repliesview on HN

Not for inferencing. M3 Ultra runs big LLMs twice as fast as RTX 5090.

https://creativestrategies.com/mac-studio-m3-ultra-ai-workst...

RTX 5090 only has 32GB RAM. M3 Ultra has up to 512 GB with 819 GB/sec bandwidth. It can run models that will not fit on an RTX card.

EDIT: Benchmark may not be properly utilizing the 5090. But the M3 Ultra is way more capable than an entry level RTX card at LLM inferencing.


Replies

Spooky23last Saturday at 7:58 PM

My little $599 Mac Mini does inference about 15-20% slower than a 5070 in my kids’ gaming rig. They cost about the same, and I got a free computer.

Nvidia makes an incredible product, but apples different market segmentation strategy might make it a real player in the long run.

balnazzarlast Saturday at 11:03 PM

It can run models that cannot fit on TEN rtx 5090s (yes, it can run DeepSeek V3/R1, quantized at 4 bit, at a honest 18-19 tok/s, and that's a model you cannot fit into 10 5090s..).

show 1 reply