Models:
- Safetensors: https://huggingface.co/google/gemma-4-26B-A4B-it-qat-q4_0-un...
- GGUF: https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/tree/...
Note the README in the Unsloth list of files: llama.cpp is working on a PR to support the gemma4 drafters: https://github.com/ggml-org/llama.cpp/pull/23398. Also note the PR submitter didn't experience much speedup with 26B (seems typical that MoE models don't generally benefit from MTP).