Can you add a recent build of llama.cpp (arm64) to the results pool? I'm really interested in comparing mlx to llama.cpp, but setting up the mlx seems too difficult for me to do by myself.
I ran them again several times to make sure the results were fair. My previous runs also had a different 30B model loaded in the background that I forgot about.
LM Studio is an easy way to use both mlx and llama.cpp
I ran them again several times to make sure the results were fair. My previous runs also had a different 30B model loaded in the background that I forgot about.
LM Studio is an easy way to use both mlx and llama.cpp
anemll [0]: ~9.3 tok/sec
mlx [1]: ~50 tok/sec
gguf (llama.cpp b5219) [2]: ~41 tok/sec
[0] https://huggingface.co/anemll/anemll-DeepSeekR1-8B-ctx1024_0...
[1] https://huggingface.co/mlx-community/DeepSeek-R1-Distill-Lla...
[2] (8bit) https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B-...