logoalt Hacker News

behnamohlast Saturday at 7:22 PM2 repliesview on HN

> LLM performance is twice as fast as RTX 5090

your tests are wrong. you used MLX for Mac Studio (optimized for Apple Silicon) but you didn't use vLLM for 5090. There's no way a machine with half the bandwidth of 5090 delivers twice as fast tok/s.


Replies

seanmcdirmidlast Saturday at 7:23 PM

Unless it’s a large model that doesn’t fit in the 5090, bust that’s no longer a $4k macstudio I think.

show 2 replies
voidsparklast Saturday at 7:33 PM

Yeah that's probably wrong. But the M3 Ultra is good enough for local inferencing, in any case.