logoalt Hacker News

WhiteDawnyesterday at 6:50 PM2 repliesview on HN

Once someone generates a MTP layer for 26B A4B 4 QAT I'll be singing from the hills with my 5 year old GPU.


Replies

pfheatwoleyesterday at 11:05 PM

Models:

- Safetensors: https://huggingface.co/google/gemma-4-26B-A4B-it-qat-q4_0-un...

- GGUF: https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/tree/...

Note the README in the Unsloth list of files: llama.cpp is working on a PR to support the gemma4 drafters: https://github.com/ggml-org/llama.cpp/pull/23398. Also note the PR submitter didn't experience much speedup with 26B (seems typical that MoE models don't generally benefit from MTP).