logoalt Hacker News

babyyesterday at 11:53 PM0 repliesview on HN

Our experience has been that without a good harness you don't really get much out of codex/claude. And you really need to spend time and energy figuring out why coding agents can't find bugs like you can.

Every week I see bugs (as an auditor) that our own harness (https://zkao.io/) can't find, and we have to figure out pretty interesting techniques in order to make the tool find them. Mind you I'm talking mostly about cryptographic vulnerabilities, not just webapp bugs. So IMO it's going to make a lot of sense for companies to have both their own harness (as tptacek is talking about) and pay for services that focus on making a good harness from experience (and audit firms are going to be the best at doing this, as they see a lot of bugs and can spend time "teaching" their harness about these bugs)

On the other hand, you have to find equally as good techniques to triage, because otherwise you just have some machinery that I call "vibe auditing" that just produces enough false positives to tire all the developers (who are already overwhelmed with crappy AI submissions in bugbounties and other AI tool that review all of their PRs).

At the end of the day, when your harness doesn't return any bug, you're left wondering "does it mean there's no bugs?" We're basically back in this reputation game, where you want to use the best tool, or the best team (that knows what the best tools are), and need to figure out which one is.