logoalt Hacker News

fulafeltoday at 8:24 AM0 repliesview on HN

Looks like DeepSeek did this as well since V3: https://deepwiki.com/deepseek-ai/DeepSeek-V3/4.4-multi-token...

Credit for the MTP technique is due to https://arxiv.org/abs/2404.19737 from 2024:

Better & Faster Large Language Models via Multi-token Prediction Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozière, David Lopez-Paz, Gabriel Synnaeve