> Evaluations also show that it has a much lower ability to perform cybersecurity tasks than o...

satvikpendem • today at 6:10 PM • 20 replies • view on HN

> Evaluations also show that it has a much lower ability to perform cybersecurity tasks than our current Opus models.

Why would they brag about something like this? It's like they know people want to use models to perform cybersecurity tasks yet knowingly deny them the ability.

And Opus 4.8 is still cheaper for a higher pass rate (much less open weight models like GLM 5.2) so not sure why I'd use Sonnet except on the low effort level for I suppose trivial tasks where I want it to work only 50% of the time judging by the graph. The pricing doesn't really make any sense.

Replies

secretslol • today at 6:27 PM

"Lower ability to perform cybersecurity-related tasks" makes me super concerned it will leave my codebase like Swiss cheese for any American granny with access to Fable 5, when we non-American Brits, or rest-of-worlders, don't have access to it to clean our codebases.

➕ show 5 replies

zlurker • today at 6:15 PM

They spent months hyping up Mythos and ended up with it banned. I’d assume they want to both differentiate their products and appeal to regulators here

➕ show 2 replies

MostlyStable • today at 6:23 PM

Why do you think they are bragging? Anthropic has long been the company to give us by far the most in-depth information about their models, both positive and negative. I read this as them just stating a fact about this model that users would want to know.

➕ show 3 replies

bluepeter • today at 6:41 PM

Flowers for Algernon. And, sadly, expect this from now on. You saw it with OpenAI releasing Sol/Terra/Luna with a chart showing how they weren't quite as good as Mythos. It's all messaging to the USG to try to avoid/minimize arbitrary review from multiple agencies. 'Hey, it's smart, but look how stupid it is at "cyber."'

kristianc • today at 6:25 PM

There's two classes of models now - the cybersecurity ones that none of us are getting, and the 'safe' models released for general consumption. This is letting us know which side of the divide it sits on.

➕ show 2 replies

dgacmu • today at 6:35 PM

One of the best queries I've done with an LLM recently was: Create a plan for improving the robustness and resilience of this code, particularly to untrusted inputs.

Gemini wouldn't do a security audit. But it came up with a great set of mitigations and identified an extant XSS flaw in the process of improving robustness.

There's an awful lot of good that can come from proactive, defensive use of LLMs. I realize there's also a lot of pain when the difficulty of exploit finding drops suddenly, but in the long term we may all benefit from the defensive side of this.

K0balt • today at 6:22 PM

Restricting the models isn’t about restricting offensive capabilities. They were already very well aligned to reduce that risk.

This recent government interference is about trying to preserve US offensive cyberwarfare and cyberespionage capabilities. It’s not about “bad actors”. It’s about defensive capabilities becoming pervasive and cheap, which would kneecap us cyberoffensive capability.

It’s like making seatbelts illegal so that police chases can be more effective.

pseudosavant • today at 8:23 PM

So that the current US administration doesn't block broad usage of Sonnet 5 probably. They'd have to collect your ID and approve you if it was good at cybersecurity. Because such is the freedom in the U.S. right now.

lanthissa • today at 6:20 PM

so it doesn't get blocked. last time they said a model was great at cyber it didnt turn out well

nozzlegear • today at 7:10 PM

It seems obvious to me that they put that in there in an effort to avoid another reaming out by the long, orange dick of the US government.

Philpax • today at 6:12 PM

To avoid Lutnick getting on their case again.

➕ show 1 reply

johnfn • today at 6:41 PM

> Why would they brag about something like this? It's like they know people want to use models to perform cybersecurity tasks yet knowingly deny them the ability.

What exactly do you want Anthropic to say here? "This model, the one we are about to give to the entire world for cheap, is really good at hacking"? Saying Sonnet is terrible at cybersecurity is the most reasonable thing they can say, out of a lot of bad options.

2001zhaozhao • today at 6:48 PM

They are obviously trying to avoid getting Sonnet 5 blocked.

doctoboggan • today at 6:12 PM

You have to pay more for that, and/or go through some USG vetting process.

WithinReason • today at 6:31 PM

That part is likely directly addressed to the US government.

chvid • today at 6:33 PM

Does it mean it generates code with random security holes?

jayd16 • today at 6:34 PM

Market segmentation?

re-thc • today at 6:34 PM

> And Opus 4.8 is still cheaper for a higher pass rate

Unless it spams as much as Opus, I doubt it. Opus 4.8 literally spams text like puke. On a longer run especially if you get cache misses here and there the bulk of the cost is all the extra context it adds.

drcongo • today at 6:33 PM

What makes that a brag?

alt Hacker News

Replies