> My M5 Pro can generate 130 tok/s (4 streams) on Gemma 4 26B.
This seems high. At which quantization? Using LM Studio or something else?
Note: Darkbloom seems to run everything on Q8 MLX.
Ah good point, this is using Q4, benchmarked total throughout serving with Llama.cpp.
Ah good point, this is using Q4, benchmarked total throughout serving with Llama.cpp.