People thinking to self-host Kimi K2.6 had better be prepared for how big it is. Q8 K XL quantizat...

walrus01 • today at 5:38 AM • 2 replies • view on HN

People thinking to self-host Kimi K2.6 had better be prepared for how big it is.

Q8 K XL quantization for instance is around 600GB on disk. I would bet about 700GB of VRAM needed.

Quantizations lower than Q8 are probably worthless for quality.

Or 2.05TB on disk for the full precision GGUF.

https://huggingface.co/unsloth/Kimi-K2.6-GGUF

If you can afford the hardware to run Kimi K2.6 at any decent speed for more than 1 simultaneous user, you probably have a whole team of people on staff who are already very familiar with how to benchmark it vs Claude, GPT-5.5, etc.

Replies

adrian_b • today at 10:48 AM

While most people would not be able to run Kimi K2.6 fast enough for a chat, as a coding assistant the low speed matters much less, especially when many tasks can be batched to progress during a single pass over the weights.

If you run it on your own hardware, you can run it 24/7 without worrying about token price or reaching the subscription limits and it is likely that you can do more work, even on much slower hardware. Customizing an open-source harness can also provide a much greater efficiency than something like Claude Code.

For any serious application, you might be more limited by your ability to review the code, than by hardware speed.

➕ show 1 reply

zozbot234 • today at 5:45 AM

Kimi is a natively quantized model, the lossless full precision release is 595GB. Your own link mentions that.

➕ show 2 replies

alt Hacker News

Replies