logoalt Hacker News

bonsai_spoolyesterday at 11:43 PM1 replyview on HN

> From what I see, the first two graphs have OpenAI models above Claude

That's just in that final graph, and that graph is perhaps the least instructive - they talk about ranges of outcomes but they don't show whether all of the models besides Mythos / Opus 4.6 overlap

Take a look at all three graphs together and it's clear Anthropic are doing better in this arena


Replies

superfranktoday at 2:48 AM

Yes. I know. That was exactly what I said in my first comment.

On individual tasks Claude and GPT are comparable (as shown in the first two graphs), but on multiple step problems that require more autonomy Mythos is far better (as shown in the third graph).

This is the exact wording from my original comment

> So with that said, I think the graph under the "Cyber range results" is the important one. The ones at the top show that, yes, Mythos isn't too much better than any of the existing models on well constrained problems, but when the models are given ambiguous challenges that require multiple steps it's much, much better than anything on the market.

show 1 reply