How are you handling the gap between what an agent reports and what it actually did? The sanitised o...

justin_vin • today at 8:24 PM • 1 reply • view on HN

How are you handling the gap between what an agent reports and what it actually did? The sanitised optimism problem you mention is something I keep running into -- agents will confidently say they fixed something when they actually just suppressed the error. Are you doing any diff-level verification or is it mostly the reviewer agent catching it?

Replies

LeoStehlik • today at 9:54 PM

Both, as it proved neither is enough on its own.

The structural fix is the obsession about separating roles: the agent that builds is never the one that verifies. I run a reviewer agent (I call her Iris), and a tester (Rex) — they live in separate sessions with no shared context with the builder. Iris' brief explicitly says "we require a live browser test, code review is not enough" — and that is where role separation was key; agents reviewing their own output tend to confirm what they already believe.

The explicit result/verdict format helps too. Each acceptance criteria gets a PASS/FAIL/UNKNOWN verdict, attached with evidence. Unknown is the one with gravitas — you force the agent to say "I could not verify this" rather than it quietly pretending it was a PASS.

But diff-level verification is where it still leaks. I don't have a systematic diff check yet. It's mostly Iris catching "agent replaced the whole file rather than extending it" by noticing the git diff is suspiciously clean. That's still more pattern matching than proper instrumentation — room for improvement... when I figure out how. Not there yet, to be honest.

The sanitised optimism problem is deep — it's not always dishonesty, but quite often a genuine model confusion about whether a suppressed error counts as a fix. The agent believes... voila, success. The only way around it I've found is that the verifier has to be skeptical by default, not reviewing in good faith.

This tool's live timeline is the missing piece in that loop. Being able to see the actual tool calls rather than the curated (and falsely optimistic) summary could change verdict quality rather significantly.

alt Hacker News

Replies