the problem is that the guardrails prevent us from performing real security work which is friction that is incurred by the legitimate user but not by a moderately sophisticated threat-actor.
for example in my org it is part of the culture that security has no seat at the table. that is a separate problem, but the number of orgs like mine are more numerous than the number of orgs where security isn't a cost-center.
we find lots of stuff because low-hanging fruit is everywhere. hecking heck: I'm a fruit.
and when the cost of fixing is even the slightest inconvenience to devs we will not fix it, but continue sitting on the risk until the cows come home. In such a place a new critical finding isn't even novel. Instead our job moves to to combining different vulns that we already have, and try to show managers how bad it is.
the common retort from management is: proof to me why this is an issue, and why engineering should divert their attention to it. And unless my team can proof why X can be exploited, or Y can be bypassed, or Z can gain persistence, ... the vulnerabilities will remain. I have been in discussions where the business demanded to see an exploit so they can justify the cost of fixing it. low-cyber-maturity doesn't even describe it. we are not a mom and pop shop but have 110K employees worldwide. and again - we are not uniquely insecure.
so these guardrails aren't helping because the moment the chat has any offsec artifacts, or even just a single wrongly worded phrase anywhere in the workspace, the session is flagged, you need to downgrade the model.
what adds insult to injury, is that the guardrail is just a way to funnel users into the Ai company's "cyber marketing" program: "your chat has been flagged, please proof your identity and hand over your passport data so you can sign up to our TrustedCyber program". Bitch please you have my payment information, use that??
if you consider bug-density (security defect density) per LoC, it is even more of a sh1t show: no restrictions apply for developers to push their buggy code, but the security team needs to somehow proof that they aren't the malicious party?
totally off - considering the right way to build defensive/offsec/malicious tooling with AI isn't by using frontier models ... but run a serious of agents on tightly scoped tasks. see https://securitycryptographywhatever.com/2026/03/25/ai-bug-f... and https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jag... - this shuts out the average joe who works in an org where cyber security maturity is poor. joe does not know about how to orchestrate a fleet of agents and give them muppet names. all he knows is that the good guys are losing the fight.
> the right way to build defensive/offsec/malicious tooling with AI isn't by using frontier models ... but run a serious of agents on tightly scoped tasks
The right way to do it is to run a series of agents... many of which are nonetheless built on frontier models (and nearly none of which are built on some local 27B Qwen variant...). One thing the latest models are good at is orchestrating other agents.