logoalt Hacker News

velcrovanlast Thursday at 4:20 PM5 repliesview on HN

It's not likely that reviewing your own code for vulnerabilities will fall under "prohibited uses" though.


Replies

convnetlast Thursday at 6:03 PM

> its cyber capabilities are not as advanced as those of Mythos Preview (indeed, during its training we experimented with efforts to differentially reduce these capabilities)

I wonder if this means that it will simply refuse to answer certain types of questions, or if they actually trained it to have less knowledge about cyber security. If it's the latter, then it would be worse at finding vulnerabilities in your own code, assuming it is willing to do that.

Kim_Bruninglast Friday at 10:13 AM

I can confirm from experience that reviewing your own code for vulnerabilities has fallen under "prohibited uses" starting with Opus 4.6 as recently as April 10; forcing me to spend a day troubleshooting and quarantining state from my search system.

"This request triggered restrictions on violative cyber content and was blocked under Anthropic's Usage Policy. To learn more, provide feedback, or request an exemption based on how you use Claude, visit our help center: https://support.claude.com/en/articles/8241253-safeguards-wa..."

"stop_reason":"refusal"

To be fair, they do provide a form at https://claude.com/form/cyber-use-case which you can use, and in my case Anthropic actually responded within 24 hours, which I did not expect.

I admit I'm now once bitten twice shy about security testing though.

Opus 4.7 was still 'pausing' (refusing) random things on the web interface when I tested it yesterday, so I'm unable to confirm that the form applies to 4.7 or how narrow the exemptions are or etc.

show 1 reply
niccelast Thursday at 6:46 PM

There is no way model can know the origin of the code.

xlbuttplug2last Thursday at 5:22 PM

May not be very effective if so.

I'm assuming finding vulnerabilities in open source projects is the hard part and what you need the frontier models for. Writing an exploit given a vulnerability can probably be delegated to less scrupulous models.

whatisthisevenlast Thursday at 5:41 PM

Currently 4.7 is suspicious of literally every line of code. May be a bug, but it shows you how much they care about end-users for something like this to have such a massive impact and no one care before release.

Good luck trying to do anything about securing your own codebase with 4.7.