> Wasn't the scaffolding for the Mythos run basically a line of bash that loops through ever...

johnfn • yesterday at 6:34 PM • 10 replies • view on HN

> Wasn't the scaffolding for the Mythos run basically a line of bash that loops through every file of the codebase and prompts the model to find vulnerabilities in it? That sounds pretty close to "any gold there?" to me, only automated.

But the entire value is that it can be automated. If you try to automate a small model to look for vulnerabilities over 10,000 files, it's going to say there are 9,500 vulns. Or none. Both are worthless without human intervention.

I definitely breathed a sigh of relief when I read it was $20,000 to find these vulnerabilities with Mythos. But I also don't think it's hype. $20,000 is, optimistically, a tenth the price of a security researcher, and that shift does change the calculus of how we should think about security vulnerabilities.

Replies

sweezyjeezy • yesterday at 7:30 PM

> But the entire value is that it can be automated. If you try to automate a small model to look for vulnerabilities over 10,000 files, it's going to say there are 9,500 vulns. Or none.

'Or none' is ruled out since it found the same vulnerability - I agree that there is a question on precision on the smaller model, but barring further analysis it just feels like '9500' is pure vibes from yourself? Also (out of interest) did Anthropic post their false-positive rate?

The smaller model is clearly the more automatable one IMO if it has comparable precision, since it's just so much cheaper - you could even run it multiple times for consensus.

➕ show 2 replies

mnicky • yesterday at 8:20 PM

Also, what is $20,000 today can be $2000 next year. Or $20...

See e.g. https://epoch.ai/data-insights/llm-inference-price-trends/

➕ show 1 reply

integralid • yesterday at 6:44 PM

>Or none

We already know this is not true, because small models found the same vulnerability.

➕ show 3 replies

sandeepkd • yesterday at 11:30 PM

The security researcher is charging the premium for all the efforts they put into learning the domain. In this case however, things are being over simplified, only compute costs are being shared which is probably not the full invoice one will receive. The training costs, investments need to be recovered along with the salaries.

Machines being faster, more accurate is the differentiating factor once the context is well understand

john_minsk • yesterday at 9:15 PM

In the future there shouldn't be any bugs. I'm not paying $20 per month to get non-secure code base from AGI.

ALittleLight • yesterday at 10:12 PM

3 years ago the best model was DaVinci. It cost 3 cents per 1k tokens (in and out the same price). Today, GPT-5.4 Nano is much better than DaVinci was and it costs 0.02 cents in and .125 cents out per 1k tokens.

In other words, a significantly better model is also 1-2 orders of magnitude cheaper. You can cut it in half by doing batch. You could cut it another order of magnitude by running something like Gemma 4 on cloud hardware, or even more on local hardware.

If this trend continues another 3 years, what costs 20k today might cost $100.

➕ show 1 reply

SpicyLemonZest • yesterday at 6:45 PM

What the source article claims is that small models are not uniformly worse at this, and in fact they might be better at certain classes of false positive exclusion. This is what Test 1 seems to show.

(I would emphasize that the article doesn't claim and I don't believe that this proves Mythos is "fake" or doesn't matter.)

siva7 • yesterday at 8:14 PM

Except you would need about 10,000 security researches in parallel to inspect the whole FreeBSD codebase. So about 200 million dollars at least.

amazingamazing • yesterday at 6:38 PM

Citation needed for basically all of this. You basically are creating a double standard for small models vs mythos…

➕ show 1 reply

youre-wrong3 • yesterday at 9:18 PM

[dead]

alt Hacker News

Replies