logoalt Hacker News

Ifkaluvayesterday at 9:40 PM1 replyview on HN

Liquid does amazing work, but I kinda feel like they are overtraining their models. 38T tokens seems like a lot for an 8B model


Replies

andaiyesterday at 9:50 PM

What's the downside? Don't they stop when they hit diminishing returns?

show 2 replies