Maybe it's marketing, but I think it's regrettable that Anthropic paired project Glasswing with Mythos. It really makes it seem like Mythos is the threat, rather than the fact that tons of vulnerabilities have always been ignored throughout the software world.
If Glasswing has been started years ago with the goal of applying fixes to AI-found gaps, then this would just be another model to add to that effort. But doing so in the ominous shadow of some new super model boosts panic IMO.
This. I’ve been hearing panic from the non-security community about Mythos because “zomg z3r0 d4y5!!” Since the announcement but these are the same people running production servers 10 updates and 2 critical security fixes behind for years.
I don’t need cutting edge AI to take you down. I need MetaSploit with a CVE list that’s been updated in the last 6 months.
You're making a hubris-laden assumption coders know the gaps their baking into their software — that any human has a decent enough grip on the multitudes of spinning logic duct taped together to make the internet run. Most vulnerabilities aren't "ignored"; they're in a neverending backlog or unknown.
If you closed all of the AI-discovered security vulnerabilities tomorrow - by the next day there'd be a host of new ones. That's software, baby.
The strongest model we've benchmarked on our comprehensive, little known, and difficult to game benchmark, is still Claude Opus 4.5 for agentic workflows. That's not a typo.
Interpret that how you will, but if Anthropic had to take cost/resource savings measures after the last major release, less than 6 months ago, it's unlikely they have the economics to offer what Mythos is promised to be, at any sort of product scale. But I agree, it would be great to get stronger models and start securing all the junk on the web. Of course, that requires maintainers to know how to use these tools.
Benchmarks at https://gertlabs.com/?agentic=all
I'm particularly interested if someone with relevant expertise could comment on the types of bugs Mythos found, e.g. the 27 year old OpenBSD bug.
I ask because the media around Mythos is leaning into the "Mythos is a super intelligence that can find bugs that no human can" story. But in my mind it's pretty obvious that any software that is complex enough will have a lot of lurking zero days, and better tools will asymptomatically find more of them. So it seems to me something like Mythos would just be able to do more analysis/searching for bugs at a much faster rate than previously possible. But I'm skeptical that the bugs that were found required an insane amount of analytical abilities to locate, so would really appreciate if someone could comment on that (e.g. was it "yeah, with enough time we would have found it eventually" vs. "Wow, this was an insanely difficult bug to find in the first place")
I do agree that medium/long term that tools like Mythos will be a huge boon for cyber security, because it will inherently make it easier to write bug-free code in the first place. But yeah, we're now at a point where all these "pre-AI bugs" need to be fixed and patched before folks in the wild find all these zero days.
A year ago the LLM's weren't good enough to find these security issues. They could have done other stuff. But then again, the big tech companies were already doing other stuff, with bug bounties, fuzzing, rewriting key libraries, and so on.
This initiative probably could have started a few months sooner with Opus and similar models, though.
I guess I'm not sure why you frame this as a "rather than". What Anthropic is saying is that the norm of having tons of vulnerabilities lying around historically worked OK, but Mythos shows it will soon become catastrophically not OK, and everyone who's responsible for software security needs to know this so they can take action.
Cybersecurity is taken too lightly and it mostly boils down to recklessness of developers, they are just "praying" that no-one act on the issues they already know and it's something we must start talking about.
Common recklessness obviously include devs running binaries on their work machine, not using basic isolation (why?), sticky IP addresses that straight-up identify them, even worse, using same browsers to access admin panels and some random memes, obviously, hundred more like those that are ALREADY solved and KNOWN by the developers themselves. You literally have developers that still use cleartext DNS (apparently they are ok with their history accessible by random employees outsourced)