Not sure where 40 tokens per second is coming from. I’ve seen 95-100 tokens per second on M5 Max 128GB running Gemma 4 31B. I’ve done experiments where it is faster than Claude Opus 4.5 for the same prompts.
can you provide your configurations pls ?
can you provide your configurations pls ?