M3 Ultra has a big GPU with 819 GB/sec bandwidth.
LLM performance is twice as fast as RTX 5090
https://creativestrategies.com/mac-studio-m3-ultra-ai-workst...
> LLM performance is twice as fast as RTX 5090
your tests are wrong. you used MLX for Mac Studio (optimized for Apple Silicon) but you didn't use vLLM for 5090. There's no way a machine with half the bandwidth of 5090 delivers twice as fast tok/s.
> LLM performance is twice as fast as RTX 5090
your tests are wrong. you used MLX for Mac Studio (optimized for Apple Silicon) but you didn't use vLLM for 5090. There's no way a machine with half the bandwidth of 5090 delivers twice as fast tok/s.