Liquid does amazing work, but I kinda feel like they are overtraining their models. 38T tokens seems like a lot for an 8B model
What's the downside? Don't they stop when they hit diminishing returns?
What's the downside? Don't they stop when they hit diminishing returns?