logoalt Hacker News

rao-vtoday at 6:06 AM0 repliesview on HN

To the previous poster's point, soft distributions are useful, even saving the top 10 logits is significantly more training signal than just the final token.