Nobody releases numbers that show them to be worse than competitors lol.
This even applies to OpenAI & Anthropic who don't even eval on the same datasets a lot of the time.