logoalt Hacker News

superfrankyesterday at 6:44 PM2 repliesview on HN

This article reinforces something I've heard a lot of people say for a while now and what I've personally felt. Claude and GPT are fairly evenly matched on any individual task (GPT might even be a little better), but Claude is far more autonomous.

So with that said, I think the graph under the "Cyber range results" is the important one. The ones at the top show that, yes, Mythos isn't too much better than any of the existing models on well constrained problems, but when the models are given ambiguous challenges that require multiple steps it's much, much better than anything on the market.

I think that's why there's been such a big deal made out of Mythos (well, that and marketing). If Mythos really is so much better than the current models at just working autonomously to find security issues then it becomes much more realistic that someone with deep pockets could just spin up an army of them running 24/7 and point them at a target.


Replies

bonsai_spoolyesterday at 6:48 PM

Looking closely at the graphs, the anthropic models are clearly all higher than the openai models

Whether the difference is meaningful can’t be determined from the graphs (and picking one graph over the ensemble also doesn't have a reasoned basis given that these are all arbitrary).

PunchTornadoyesterday at 7:43 PM

Look at those graphs another time. Claude beats gpt.

show 1 reply