Wondering about Google Multi-Token prediction, why isn't this being implemented into every new major model ? Is the 750 token/s achieved using this technique ?
MTP or similar probably is being used on the backend, but that's transparent to the end user
MTP or similar probably is being used on the backend, but that's transparent to the end user