deepseek v4 flash on mlx at 1m context runs at 20 t/s decode on a mac studio m3 ultra with 512g...

doctorpangloss • yesterday at 2:40 AM • 2 replies • view on HN

deepseek v4 flash on mlx at 1m context runs at 20 t/s decode on a mac studio m3 ultra with 512gb of RAM

Replies

What is everyone running DeepSeek v4 Flash with?!

It’s currently unsupported on Llama.cpp and vllm doesn’t support GPU+CPU MoE, so unless all of you have an array of DGX Sparks in your bedroom, what’s the secret sauce?!

➕ show 2 replies

dakolli • yesterday at 3:05 AM

Just because you read it on a github repo doesn't make it true, it also doesn't take into account cpu temps and inevitable throttling you'll encounter.

➕ show 1 reply

alt Hacker News

Replies