logoalt Hacker News

maxiniolyesterday at 11:30 PM1 replyview on HN

Wondering about Google Multi-Token prediction, why isn't this being implemented into every new major model ? Is the 750 token/s achieved using this technique ?


Replies

adam_arthuryesterday at 11:37 PM

MTP or similar probably is being used on the backend, but that's transparent to the end user