you can run it today with mlx if you have 256g or 512g mac studio. no "antirez" fork neede...

doctorpangloss • yesterday at 4:27 PM • 2 replies • view on HN

you can run it today with mlx if you have 256g or 512g mac studio. no "antirez" fork needed.

it isn't that large of a model and the compressed kv implementation is not that complicated

the problem is that they released the model in a quantized format that is more complex than it appears, and people make a lot of mistakes working with it. it is quantization-aware-trained, so you can't "just" upscale it and scale down.

vllm runs dsv4 flash fine right right now

dgx sparks cannot really run it correctly right now with released vllm but there are PRs, it's just a matter of time. you would need 3 of them. they will still be almost 1/2 as fast as the mac studio.

so the punchline is, well, this is why the 512g mac studio is such a hot commodity right now.

Replies

zozbot234 • yesterday at 4:36 PM

If you have a 256 GB or 512 GB Mac Studio, the real game is to run multiple sessions in parallel in order to make the best use of your limited memory bandwidth. You'd have plenty of excess RAM for that given how small the KV cache is even at max context.

alfiedotwtf • yesterday at 5:20 PM

Unfortunately I didn't get a Mac with big ram at the time it was cheap, and I'd personally focus on moving away from Apple and going Linux fulltime at work and home (currently Macbook for laptop connected to my big rig, well it's not that big compared to the AI people in here).

➕ show 1 reply

alt Hacker News

Replies