My token throughput is much better using vLLM-mlx on my M2 ultra than llama.cpp. It might be worth a...

dzr0001 • yesterday at 7:42 PM • 0 replies • view on HN

My token throughput is much better using vLLM-mlx on my M2 ultra than llama.cpp. It might be worth a shot to give it a try.

alt Hacker News