everytime a new benchmark appears, Chinese models are far lower than the level where they are supposed to be according to existing benchmarks. then after a while they recover :)
The magic of distillation!
The magic of distillation!