Can you explain where you're seeing that? From what I see, the first two graphs have OpenAI mod...

superfrank • yesterday at 10:10 PM • 2 replies • view on HN

Can you explain where you're seeing that? From what I see, the first two graphs have OpenAI models above Claude models (including Mythos) on the Technical Non-Expert and the Practitioner evals. Mythos now beats Codex 5.3 on the Expert eval and Opus was already on top for the Apprentice one although now Mythos leads there.

So, even including Mythos, OpenAI still has 2 models on top for the 4 evals listed.

Replies

bonsai_spool • yesterday at 11:43 PM

> From what I see, the first two graphs have OpenAI models above Claude

That's just in that final graph, and that graph is perhaps the least instructive - they talk about ranges of outcomes but they don't show whether all of the models besides Mythos / Opus 4.6 overlap

Take a look at all three graphs together and it's clear Anthropic are doing better in this arena

➕ show 1 reply

Escafati • yesterday at 10:48 PM

[dead]

alt Hacker News

Replies