> What kind of benefit does Multi-Token Prediction bring to the inference side? Is it only releva...

cubefox • yesterday at 2:02 PM • 1 reply • view on HN

> What kind of benefit does Multi-Token Prediction bring to the inference side? Is it only relevant in pretraining efficiency?

It is only useful for inference and doesn't help with pretraining. Which actually points to speculative decoding not being sufficiently general, as the same underlying property (some sequences of tokens are easy to predict) could be exploited for training as well. See here: https://goombalab.github.io/blog/2025/hnet-future/#d-footnot...

Replies

Zacharias030 • today at 6:55 AM

There is no reason that it couldn’t be beneficial for training though.

alt Hacker News

Replies