Tokens per second is nice but I would also like to see quality benchmarks especially against other models. I mean eventually someone’s gonna write a blog post comparing models, so why not just do it yourself… that way your marketing department at least get to control the narrative rather than a random blogger
It's a checkpoint in the middle of training, it makes sense to report speed, which will stay the same and to report quality as they did.