logoalt Hacker News

doctorpanglossyesterday at 2:40 AM2 repliesview on HN

deepseek v4 flash on mlx at 1m context runs at 20 t/s decode on a mac studio m3 ultra with 512gb of RAM


Replies

alfiedotwtfyesterday at 6:37 AM

What is everyone running DeepSeek v4 Flash with?!

It’s currently unsupported on Llama.cpp and vllm doesn’t support GPU+CPU MoE, so unless all of you have an array of DGX Sparks in your bedroom, what’s the secret sauce?!

show 2 replies
dakolliyesterday at 3:05 AM

Just because you read it on a github repo doesn't make it true, it also doesn't take into account cpu temps and inevitable throttling you'll encounter.

show 1 reply