logoalt Hacker News

zozbot234today at 7:15 AM0 repliesview on HN

I assume these are just output layers that are trained on the hidden state from the larger model - that's how MTP works. It's not a separate drafting model.