> What is not mentioned is that Claude Code also found one thousand false positive bugs, which developers spent three months to rule out.
Source? I haven't seen this anywhere.
In my experience, false positive rate on vulnerabilities with Claude Opus 4.6 is well below 20%.
Same. Codex and Claude Code on the latest models are really good at finding bugs, and really good at fixing them in my experience. Much better than 50% in the latter case and much faster than I am.
Source: """AI is bad"""
In my experience, the issue has been likelihood of exploitation or issue severity. Claude gets it wrong almost all the time.
A threat model matters and some risks are accepted. Good luck convincing an LLM of that fact
In TFA:
I have so many bugs in the Linux kernel that I can’t
report because I haven’t validated them yet… I’m not going
to send [the Linux kernel maintainers] potential slop,
but this means I now have several hundred crashes that they
haven’t seen because I haven’t had time to check them.
—Nicholas Carlini, speaking at [un]prompted 2026
To the issue of AI submitted patches being more of a burden than a boon, many projects have decided to stop accepting AI-generated solutioning:
https://blog.devgenius.io/open-source-projects-are-now-banni...
These are just a few examples. There are more that google can supply.