I’ve never understood the “if I don’t enable bad behavior, someone else will, so I might as well ena...

brookst • today at 12:27 PM • 7 replies • view on HN

I’ve never understood the “if I don’t enable bad behavior, someone else will, so I might as well enable bad behavior” argument. Can you elaborate?

From where I sit it seems reasonable for Anthropic to not want their product used to create malware, even if they can’t solve the entire problem globally for every model. What’s wrong with that position? What should they do differently?

Replies

saidnooneever • today at 1:10 PM

some context:

its not about creating malware. this is already trivial and fully automated. its about finding exploits (which can be used to deploy malware), which is something both attackers and defenders benefit from.

threat actors will find them anyway, LLM or not. They only need 1 so its much less work for them.

defenders, they need to find them all. So for defenders, these models are more valuable than for attackers.

restricting certain models will not reduce the availability of these tool for attackers, but defenders are limited because running local models is more hard in an enterprise setting with heaps of events and products etc. to run through them, they need many GPUs where the attacker can run an local model on 1 GPU and get desired effects.

Hence, if they release the capability the world will adjust to it and be able to mitigate effects, collectively. Now, companies are left in the dark while attackers have effective tooling.

Besides this there is also things like for instance people now including strings with recipies for meth or sarin gas (malwareTech info). the new variant of shai hulud does this. That stops LLM scanners and can even get their users banned from LLM services.

There is a reason why cybersecurity researchers write papers about attack techniques and new exploits.

Its not to put them out there for people to abuse, but its there for the collective cybersecurity bunch to all have access to information that can help them solve the problems.

I know this is not a clear answer to your question, but hopefully it provides some context to think about and decide for yourself further. In the end of the day its also part opinio here, to find it good or bad. Likely theres good arguments against and for it.

I am for putting informaiton and tools out there so other smart folks can find solutions. Others are for restricting and wishful thinking (my opinion) that attackers wont find something.

➕ show 1 reply

0x20cowboy • today at 7:46 PM

> I’ve never understood the “if I don’t enable bad behavior, someone else will, so I might as well enable bad behavior” argument.

If someone is going to make a lot of money (fame, etc), it might as well be me.

unglaublich • today at 3:58 PM

It's the same as encryption backdoors to stop the bad guys.

The bad guys work around it, and the rest is now in a vulnerable position.

Antrophic plays security theater by blocking their LLMs to work with security.

The bad guys work around it, and those that want to make their software robust against them are in a vulnerable position.

jerf • today at 3:22 PM

"I’ve never understood the “if I don’t enable bad behavior, someone else will, so I might as well enable bad behavior” argument. Can you elaborate?"

You are mentally approaching this as if you have an oracle that can be consulted to say whether or not something is bad behavior. So of course, if this oracle exists and can be consulted and it says the behavior is bad, why would anyone argue with the idea that we should stop bad behavior?

This argument is valid [1], in that give the premises the argument is correct. The problem is, once you draw out the fact that the argument is depending on the existence of an oracle that does not exist, that premise of the argument is invalid.

Two people can sit down in front of an AI right now, with the exact same code base, and type in a prompt to the AI "Analyze this code base for security holes and try to build exploits against them." One person's use is completely valid, another person's use is completely harmful, and the information necessary to distinguish those two use cases is not available to the AI. I phrase it that way carefully, it isn't that "the AI isn't smart enough", the problem is that the information is simply unavailable. Intelligence doesn't factor in at that point.

Therefore, the only way that Antropic has to deal with this at scale is simply to block the query entirely. Which means that when I, the valid user who is trying to establish whether my code base has security issues and whether I can prove they are exploitable, I can not. I am checking for exploitability because while I would like to fix all security issues, issues that are provable exploitable are of a higher priority than smelly code that doesn't seem to be exploitable, which is a perfectly valid thing for me to want to do.

If I can't use legitimate tools to secure my code, but the bad guys can use unrestricted tools to attack my code, now this is a great deal more complicated than "Who can argue with stopping the bad stuff?", which is the main point I want to make here. I'm not going into a huge analysis of that problem, merely pointing out that it is a problem and that this isn't just about "stopping the bad stuff". There are additional complications beyond that, like, even if Anthropic could determine the "bad stuff" and stop just that in their LLM, LLMs in general don't have infinitely precise surgical "stop doing this thing" options and any such instruction to stop doing a thing always degrades the LLM across the board in various ways.

Anthropic has no access to the Platonic ideal of "stop malware", if such a thing even hypothetically exists. When analyzing the real effects their real actions will take, what their intentions were for those actions aren't really relevant. It is clear that they are making their model a great deal less useful for me, a legitimate user, and I and others like me are perfectly justified in disagreeing with their analysis and actions.

I also observe that "the bad guys getting unrestricted access to the full power" is only a matter of time. There's no question whether it will happen, the only question is whether this time is in the past or the future. This includes the fact that while your definition and my definition of "bad guys" may vary, it is virtually certain that your definition includes at least one high-powered intelligence agency somewhere in the world that does cyberattacks and will have the means, the opportunity, and the motive to get unrestricted access to these models by means you may consider licit or illicit. If your threat model includes them, as mine does, it is perfectly reasonable to complain that my tooling is being broken in a ways theirs won't be.

[1]: https://en.wikipedia.org/wiki/Validity_(logic)

➕ show 2 replies

SkyBelow • today at 1:15 PM

I don't think that is the argument.

The argument is more "I want to do good thing X, but it will also cause bad thing Y." followed by "Wait, bad thing Y is going to happen anyways, so I might as well do good thing X so we get both X and Y instead of just X."

Viewed this way, the idea is that given the world will have bad thing Y regardless, the one impact of your choice is if good thing X exists or not, and it is better to create good thing X.

Where it becomes an issue is that there is no clear X or Y. There are many different but very related bad things, so if the one you would add is actually better or worse than what is already out there, or maybe it'll exist both ways but you make it more popular, and very subjective things to judge, so different people look at the same outcome and some agree that bad thing Y would have existed anyways and others say that no, this is a new bad thing Z that wouldn't have existed anyways.

>From where I sit it seems reasonable for Anthropic to not want their product used to create malware

Yes, I think there is a PR component to this that is often left out of this discussions.

DyslexicAtheist • today at 5:41 PM

the problem is that the guardrails prevent us from performing real security work which is friction that is incurred by the legitimate user but not by a moderately sophisticated threat-actor.

for example in my org it is part of the culture that security has no seat at the table. that is a separate problem, but the number of orgs like mine are more numerous than the number of orgs where security isn't a cost-center.

we find lots of stuff because low-hanging fruit is everywhere. hecking heck: I'm a fruit.

and when the cost of fixing is even the slightest inconvenience to devs we will not fix it, but continue sitting on the risk until the cows come home. In such a place a new critical finding isn't even novel. Instead our job moves to to combining different vulns that we already have, and try to show managers how bad it is.

the common retort from management is: proof to me why this is an issue, and why engineering should divert their attention to it. And unless my team can proof why X can be exploited, or Y can be bypassed, or Z can gain persistence, ... the vulnerabilities will remain. I have been in discussions where the business demanded to see an exploit so they can justify the cost of fixing it. low-cyber-maturity doesn't even describe it. we are not a mom and pop shop but have 110K employees worldwide. and again - we are not uniquely insecure.

so these guardrails aren't helping because the moment the chat has any offsec artifacts, or even just a single wrongly worded phrase anywhere in the workspace, the session is flagged, you need to downgrade the model.

what adds insult to injury, is that the guardrail is just a way to funnel users into the Ai company's "cyber marketing" program: "your chat has been flagged, please proof your identity and hand over your passport data so you can sign up to our TrustedCyber program". Bitch please you have my payment information, use that??

if you consider bug-density (security defect density) per LoC, it is even more of a sh1t show: no restrictions apply for developers to push their buggy code, but the security team needs to somehow proof that they aren't the malicious party?

totally off - considering the right way to build defensive/offsec/malicious tooling with AI isn't by using frontier models ... but run a serious of agents on tightly scoped tasks. see https://securitycryptographywhatever.com/2026/03/25/ai-bug-f... and https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jag... - this shuts out the average joe who works in an org where cyber security maturity is poor. joe does not know about how to orchestrate a fleet of agents and give them muppet names. all he knows is that the good guys are losing the fight.

➕ show 1 reply

fatata123 • today at 12:31 PM

They have no choice, enterprise customers won’t touch them unless they take a position like this. It’s a practical decision for them at the end of the the day.

➕ show 3 replies

alt Hacker News

Replies