FWIW Claude Code Opus 4.5 ranks ~71% accuracy on the OpenSSF CVE Benchmark that we ran against DeepS...

sanketsaurav • today at 8:16 PM • 0 replies • view on HN

FWIW Claude Code Opus 4.5 ranks ~71% accuracy on the OpenSSF CVE Benchmark that we ran against DeepSource (https://deepsource.com/benchmarks).

We have a different approach, in that we're using SAST as a fast first pass on the code (also helps ground the agent, more effective than just asking the model to "act like a security researcher"). Then, we're using pre-computer static analysis artifacts about the code (like data flow graphs, control flow graphs, dependency graphs, taint sources/sinks) as "data sources" accessible to the agent when the LLM review kicks in. As a result, we're seeing higher accuracy than others.

Haven't gotten access to this new feature yet, but when we do we'd update our benchmarks.

alt Hacker News