And unless I'm mistaken, the repo is about running it with 2bit quantization. This is probabl...

bel8 • yesterday at 6:55 PM • 1 reply • view on HN

And unless I'm mistaken, the repo is about running it with 2bit quantization.

This is probably far from the raw intelligence provided by cloud providers.

Still, this shines more light on local LLMs for agentic workflows.

Replies

It runs both q2 and original (4 bit routed experts). At the same speed more or less. The q2 quants are not what you could expect: it works extremely well for a few reasons. For the full model you need a Mac with 256GB.

➕ show 1 reply

alt Hacker News

Replies