> What is not mentioned is that Claude Code also found one thousand false positive bugs, which...

mtlynch • yesterday at 10:56 AM • 5 replies • view on HN

> What is not mentioned is that Claude Code also found one thousand false positive bugs, which developers spent three months to rule out.

Source? I haven't seen this anywhere.

In my experience, false positive rate on vulnerabilities with Claude Opus 4.6 is well below 20%.

Replies

Supermancho • yesterday at 2:26 PM

To the issue of AI submitted patches being more of a burden than a boon, many projects have decided to stop accepting AI-generated solutioning:

https://blog.devgenius.io/open-source-projects-are-now-banni...

These are just a few examples. There are more that google can supply.

➕ show 3 replies

christophilus • yesterday at 12:29 PM

Same. Codex and Claude Code on the latest models are really good at finding bugs, and really good at fixing them in my experience. Much better than 50% in the latter case and much faster than I am.

paulddraper • yesterday at 2:11 PM

Source: """AI is bad"""

r9295 • yesterday at 11:04 AM

In my experience, the issue has been likelihood of exploitation or issue severity. Claude gets it wrong almost all the time.

A threat model matters and some risks are accepted. Good luck convincing an LLM of that fact

j16sdiz • yesterday at 11:47 AM

In TFA:

   I have so many bugs in the Linux kernel that I can’t 
   report because I haven’t validated them yet… I’m not going 
   to send [the Linux kernel maintainers] potential slop, 
   but this means I now have several hundred crashes that they
   haven’t seen because I haven’t had time to check them.
    
    —Nicholas Carlini, speaking at [un]prompted 2026

➕ show 2 replies

alt Hacker News

Replies