logoalt Hacker News

Topfiyesterday at 9:52 PM1 replyview on HN

Interesting. Hope there is any clarification on what "Mythos level" is and why 5.5-cyber doesn't arise to it. Any metric I could come up with (parameters, pre-train compute, benchmark scores, etc.) seems somewhere between imperfect and utterly nonsensical. Pure speculation, but GPT-5 series models including the new 5.5 pre-train appear far closer to Sonnet than Opus or Fable in pure parameter count, so maybe that's it, but the "they do not surpass the bar that Mythos set" line sounds more like there is a believe that Mythos/Fable are more capable in cybersecurity tasks, whereas the data [0] doesn't seem to bare this out. I did not do any cybersecurity assessment of Fable 5 myself, partly due to personal reasons that make that something I'm abstaining from, but my coding evals showed that while task adherence and assessment wise it was neck and neck with 5.5, the task inference was a major jump again (something prior Anthropic models tended to already do incredibly well on) and while that makes it a far better model to work with for UX experiments, I don't see how that translates to cybersecurity, along with the aforementioned publicly available evals by AISI.

Seeing as neither Mythos nor GPT-5.5 had been pre-trained with a particular focus on cybersecurity, this would have to mean any model that benchmarks better than GPT-5.4 or Opus 4.6 on these tasks cannot be used by None-US-Citizens. If such guidance isn't enforced for all US labs, I think that's irrefutable evidence that this isn't about cybersecurity or "the bar that Mythos set"...

[0] https://xcancel.com/AISecurityInst/status/205458976317312633...


Replies

handoflixuetoday at 3:04 AM

Firefox bugs found per month, actively advertised as a sign of how powerful Mythos is: https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2F...

I am, thus far, not aware of 5.5-Cyber managing anything similar to "Project Glasswing"

That said, the government also knew about Mythos since Project Glasswing was announced... April 7th, two months ago, so if they wanted to block a public release, they had more than enough time to do it in an orderly way.

And basically every sign that Mythos is well above the previous baseline was pretty publicly known by early May, when we started getting stuff like the Firefox bug reports.

I can see an argument that Mythos is just barely a "cut above" enough to regulate, but I cannot see any argument for doing this by a fiat order three days after the release.

show 2 replies