I replicated this experiment on several production codebases and got several crits. Lots of dupes, lots of false positives, lots of bugs that weren't actually exploitable, lots of accepted/ known risks. But also, crits!
I think this really needs to be party of the message. It's great that Claude found a vulnerability that apparently has been overlooked for a long time. It's even proper for Anthropic to tout the find. But we should all ask about the signal to nose ratio that would have been part of the process. If it only was successful... That would be worth touting, too. But I expect there was more noise than they'd care to admit.
I think this really needs to be party of the message. It's great that Claude found a vulnerability that apparently has been overlooked for a long time. It's even proper for Anthropic to tout the find. But we should all ask about the signal to nose ratio that would have been part of the process. If it only was successful... That would be worth touting, too. But I expect there was more noise than they'd care to admit.
Or put another way, the context matters.