CodeQL seems to raise too many false-positives in my experience. I’d be interested in what ...

Chris_Newton • yesterday at 11:29 AM • 1 reply • view on HN

CodeQL seems to raise too many false-positives in my experience.

I’d be interested in what kinds of false positives you’ve seen it produce. The functionality in CodeQL that I have found useful tends to accompany each reported vulnerability with a specific code path that demonstrates how the vulnerability arises. While we might still decide there is no risk in practice for other reasons, I don’t recall ever seeing it make a claim like this that was incorrect from a technical perspective. Maybe some of the other types of checks it performs are more susceptible to false positives and I just happen not to have run into those so much in the projects I’ve worked on.

Replies

ploxiln • yesterday at 3:36 PM

The previous company I was working at (6 months ago) had a bunch of microservices, most in python using fastapi and pydantic. At one point the security team tuned on CodeQL for a bunch of them, and we just got a bunch of false positives for not validating a UUID url path param to a request handler. In fact the parameter was typed in the handler function signature, and fastapi does validate that type. But in this strange case, CodeQL knew that these were external inputs, but didn't know that fastapi would validate that path param type, so it suggested adding redundant type check and bail-out code, in 100s of places.

The patterns we had established were as simple, basic, and "safe" as practical, and we advised and code-reviewed the mechanics of services/apps for the other teams, like using database connections/pools correctly, using async correctly, validating input correctly, etc (while the other teams were more focused on features and business logic). Low-level performance was not really a concern, mostly just high-level db-queries or sub-requests that were too expensive or numerous. The point is, there really wasn't much of anything for CodeQL to find, all the basic blunders were mostly prevented. So, it was pretty much all false-positives.

Of course, the experience would be far different if we were more careless or working with more tricky components/patterns. Compare to the base-rate fallacy from medicine ... if there's a 99% accurate test across a population with nothing for it to find, the "1%" false positive case will dominate.

I also want to mention a tendency for some security teams to decide that their role is to set these things up, turn them on, cover their eyes, and point the hose at the devs. Using these tools makes sense, but these security teams think it's not practical for them to look at the output and judge the quality with their own brains, first. And it's all about the numbers: 80 criticals, 2000 highs! (except they're all the same CVE and they're all not valid for the same reason)

➕ show 1 reply

alt Hacker News

Replies