logoalt Hacker News

somewhatrandom9yesterday at 6:09 PM1 replyview on HN

Could these quantized models make MTP (Multi-Token Prediction) significantly faster when used as drafters for larger regular Gemma 4 models?


Replies

dist-epochyesterday at 7:01 PM

Google already released specialized drafters for Gemma 4.

show 1 reply