Once someone generates a MTP layer for 26B A4B 4 QAT I'll be singing from the hills with my 5 y...

WhiteDawn • yesterday at 6:50 PM • 2 replies • view on HN

Once someone generates a MTP layer for 26B A4B 4 QAT I'll be singing from the hills with my 5 year old GPU.

Replies

Models:

- Safetensors: https://huggingface.co/google/gemma-4-26B-A4B-it-qat-q4_0-un...

- GGUF: https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/tree/...

Note the README in the Unsloth list of files: llama.cpp is working on a PR to support the gemma4 drafters: https://github.com/ggml-org/llama.cpp/pull/23398. Also note the PR submitter didn't experience much speedup with 26B (seems typical that MoE models don't generally benefit from MTP).

dist-epoch • yesterday at 7:00 PM

Google already did

https://huggingface.co/google/gemma-4-26B-A4B-it-qat-q4_0-un...

➕ show 1 reply

alt Hacker News

Replies