Open models, in actual practice, don't match up to even one or two generation prior models from Anthropic/OpenAI/Google. They've clearly been trained on the benchmarks. Entirely possible it was by mistake, but it's definitely happening.