A year ago the LLM's weren't good enough to find these security issues. They could have do...

skybrian • today at 3:45 PM • 5 replies • view on HN

A year ago the LLM's weren't good enough to find these security issues. They could have done other stuff. But then again, the big tech companies were already doing other stuff, with bug bounties, fuzzing, rewriting key libraries, and so on.

This initiative probably could have started a few months sooner with Opus and similar models, though.

Replies

adrian_b • today at 4:47 PM

Using multiple older open weights models can find all the security issues that have been found by Mythos.

However, no single model of those could find everything that was found by Mythos.

https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jag...

Nevertheless, the distance between free models and Mythos is not so great as claimed by the Anthropic marketing, which of course is not surprising.

In general, this is expected to be also true for other applications, because no single model is equally good for everything, even the SOTA models, trying multiple models may be necessary for obtaining the best results, but with open weights models trying many of them may add negligible cost, especially if they are hosted locally.

causal • today at 3:49 PM

That's not quite true, even a year ago LLMs were finding vulnerabilities, especially when paired with an agent harness and lots of compute. And even before that security researchers have been shouting about systemic fragility.

Mythos certainly represents a big increase in exploitation capability, and we should have anticipated this coming.

➕ show 1 reply

pixel_popping • today at 4:05 PM

If you run Opus 4.6 and GPT 5.4 in a loop right now (maybe 100 times) against top XXXX repos, I guarantee you that you'll find at the very least, medium vulnerabilities.

alephnerd • today at 4:37 PM

> A year ago the LLM's weren't good enough to find these security issues

I know of two F100s that already started using foundation models for SCA in tandem with other products back in 2024. It's noisy, but a false positive is less harmful than an undetected true positive depending on the environment.

vonneumannstan • today at 3:48 PM

>This initiative probably could have started a few months sooner with Opus and similar models, though.

Evidently they tried and even the most recent Opus 4.6 models couldn't find much. Theres been a step change in capabilities here.

➕ show 1 reply

alt Hacker News

Replies