> LLM performance is twice as fast as RTX 5090
your tests are wrong. you used MLX for Mac Studio (optimized for Apple Silicon) but you didn't use vLLM for 5090. There's no way a machine with half the bandwidth of 5090 delivers twice as fast tok/s.
Yeah that's probably wrong. But the M3 Ultra is good enough for local inferencing, in any case.
Unless it’s a large model that doesn’t fit in the 5090, bust that’s no longer a $4k macstudio I think.