logoalt Hacker News

davesqueyesterday at 9:45 PM8 repliesview on HN

> We stated that we would keep Claude Mythos Preview’s release limited and test new cyber safeguards on less capable models first. Opus 4.7 is the first such model: its cyber capabilities are not as advanced as those of Mythos Preview (indeed, during its training we experimented with efforts to differentially reduce these capabilities). We are releasing Opus 4.7 with safeguards that automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses.

It feels like this is a losing strategy. Claude should be developing secure software and also properly advising on how to do so. The goals of censoring cyber security knowledge and also enabling the development of secure software are fundamentally in conflict. Also, unless all AI vendors take this approach, it's not going to have much of an effect in the world in general. Seems pretty naive of them to see this as a viable strategy. I think they're going to have to give up on this eventually.


Replies

andaiyesterday at 10:23 PM

The fundamental tension is that the models are getting weirdly good at hacking while still sort of sucking at a bunch of economically valuable tasks.

So they've hit the point where the models are simultaneously too smart (dangerous hacking abilities) and too stupid (can't actually replace most employees). So at this point they need to make the models bigger, but they're already too big.

So the only thing left to do is to make them selectively stupider. I didn't think that would be possible, but it seems like they're already working on that.

show 3 replies
SJMGtoday at 2:08 AM

Yes, it's a losing strategy; no one else is going to do this. They are inviting parties to partner with them, so it's not totally in conflict, but yeah I'm sure there's genuine concern coming out of Anthropic, but I also think as this point they've likely culturally internalized "Dangerous [think: powerful] AI" as a brand narrative.

"The Beware of Mythos!" reads to me as standard Anthropic/Dario copy. Is it more true now than it was before? Sure. Is now the moment that the world's digital infrastructure succumbs to waves of hackers using countless exploits; I doubt it.

show 1 reply
shohan99today at 5:25 AM

While I believe that mythos is better than the models we have right now, the "too dangerous to release" sounds largely a marketing gimmick to me. Well not for me to speculate, I simply need to wait for the huge wave of security patches to all software in the coming weeks, as per Anthropic's claims

Jagerbizzletoday at 12:18 AM

This is the company that allowed a vibe-release resulting in the leaking the entirety of the Claude Code codebase. What is the bar you're expecting here exactly?

zmmmmmyesterday at 11:40 PM

Curious how the safeguards work and what impact they will have.

In general I feel that over-engineering safeguards in training comes at a noticeable cost to general intelligence. Like asking someone to solve a problem on a white board in a job interview. In that situation, the stress slices off at least 10% of my IQ.

earthnailyesterday at 9:50 PM

I feel it’s fine as a short term solution, and probably a good thing. Gives the good guys some time to stay on top.

Always remember: a defender must succeed every time , an attacker only once.

show 3 replies
willis936today at 12:08 AM

I'm not a security expert and don't know how to properly audit every github repo that I come across. Maybe I sometimes want to build gnome extensions or cool software projects from source and I want some level of checking along the way for known vulnerabilities. They can't claim this is an obvious win for security when it centralizes rather than democratizes security.

slashdavetoday at 12:23 AM

I interpreted their actions as providing time for vendors to protect themselves against the new model proactively, not to nerf the models themselves.

Although perhaps I am naive.