I assume these are just output layers that are trained on the hidden state from the larger model - t...

zozbot234 • today at 7:15 AM • 0 replies • view on HN

I assume these are just output layers that are trained on the hidden state from the larger model - that's how MTP works. It's not a separate drafting model.

alt Hacker News