> LLM performance is twice as fast as RTX 5090 your tests are wrong. you used MLX for Mac Studi...

behnamoh • 05/03/2025 • 2 replies • view on HN

> LLM performance is twice as fast as RTX 5090

your tests are wrong. you used MLX for Mac Studio (optimized for Apple Silicon) but you didn't use vLLM for 5090. There's no way a machine with half the bandwidth of 5090 delivers twice as fast tok/s.

Replies

seanmcdirmid • 05/03/2025

Unless it’s a large model that doesn’t fit in the 5090, bust that’s no longer a $4k macstudio I think.

➕ show 2 replies

voidspark • 05/03/2025

Yeah that's probably wrong. But the M3 Ultra is good enough for local inferencing, in any case.

alt Hacker News

Replies