logoalt Hacker News

andaiyesterday at 9:50 PM2 repliesview on HN

What's the downside? Don't they stop when they hit diminishing returns?


Replies

hgoeltoday at 4:39 PM

Wouldn't the model start overfitting at some point? Degrading generalization for accuracy on the training set.

Ifkaluvatoday at 12:49 AM

You’d think so, but I haven’t seen it explicitly discussed in their papers, and nobody else that I know of trains on that many tokens