logoalt Hacker News

cpercivayesterday at 5:43 AM2 repliesview on HN

I've seen plenty of people saying "Mythos isn't all that exceptional, lots of LLMs can find security vulnerabilities" -- and indeed there is some evidence for that; it sounds like Anthropic was taken somewhat by surprise at how easily a simple prompt managed to get Mythos to deliver exploits and didn't distinguish immediately between the effectiveness of Mythos and the effectiveness of the prompt.

But the claim of "LLMs aren't making a difference in vulnerability discovery" has been laughable to anyone who has been reading security advisories for the past 3 months. Just look at the Credits lines.


Replies

wrsyesterday at 6:21 AM

I thought the point was not that Mythos finds more vulnerabilities, but that it can exploit them much more successfully. I thought the report showed it didn’t find much more than Opus 4.8. (Or did I misread?)

show 1 reply
fweimeryesterday at 12:21 PM

I still have to see a single glibc bug that truly matters. I don't have illusions about our code quality, so there must be something to find.

We got many high-quality bug reports, some of them with a security aspect to them. Several of them received CVSSv3.1 scores of around 9.8 from the rating agencies, but these high numbers are misleading. (Vulnerability scoring is hard, and it's pretty much impossible for a library without reference to an application that uses the library.) Looking beyond the numbers, everything reported this year (and late in last year) was pretty harmless so far.

Does this mean LLMs are making a difference? For upstream developers, definitely. For end users? Not that much yet.

Maybe the picture changes once the organizations sitting on the good findings figure out how to disclose them to the relevant upstream projects. When I read the announcement of Project Glasswing, I immediately thought that this was going to be the hardest part.