logoalt Hacker News

behnamoh05/03/20252 repliesview on HN

> LLM performance is twice as fast as RTX 5090

your tests are wrong. you used MLX for Mac Studio (optimized for Apple Silicon) but you didn't use vLLM for 5090. There's no way a machine with half the bandwidth of 5090 delivers twice as fast tok/s.


Replies

seanmcdirmid05/03/2025

Unless it’s a large model that doesn’t fit in the 5090, bust that’s no longer a $4k macstudio I think.

show 2 replies
voidspark05/03/2025

Yeah that's probably wrong. But the M3 Ultra is good enough for local inferencing, in any case.