I ran them again several times to make sure the results were fair. My previous runs also had a different 30B model loaded in the background that I forgot about.
LM Studio is an easy way to use both mlx and llama.cpp
anemll [0]: ~9.3 tok/sec
mlx [1]: ~50 tok/sec
gguf (llama.cpp b5219) [2]: ~41 tok/sec
[0] https://huggingface.co/anemll/anemll-DeepSeekR1-8B-ctx1024_0...
[1] https://huggingface.co/mlx-community/DeepSeek-R1-Distill-Lla...
[2] (8bit) https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B-...
Thank you very much.