Can you explain where you're seeing that? From what I see, the first two graphs have OpenAI models above Claude models (including Mythos) on the Technical Non-Expert and the Practitioner evals. Mythos now beats Codex 5.3 on the Expert eval and Opus was already on top for the Apprentice one although now Mythos leads there.
So, even including Mythos, OpenAI still has 2 models on top for the 4 evals listed.
Can you explain where you're seeing that? From what I see, the first two graphs have OpenAI models above Claude models (including Mythos) on the Technical Non-Expert and the Practitioner evals. Mythos now beats Codex 5.3 on the Expert eval and Opus was already on top for the Apprentice one although now Mythos leads there.
So, even including Mythos, OpenAI still has 2 models on top for the 4 evals listed.