Very nice. But "Models excel at code, but not at visual inspection" is limited, Claude excel's at code cause that is anthropics main focus. Google will leapfrog them soon.
I could never see Claude doing this without a human in the loop, while Google has probably already reverse engineered a good chunk of the software available on the web
Doubtful.
When dealing with binaries, Gemini 3.1 Pro is in the same tier as Opus 4.6, https://quesma.com/benchmarks/binaryaudit/. Here are the results without humans in the loop, fully end-to-end.
For any practical development, you want humans in the loop, just precisely as much as it is needed (e.g. to ask the right questions, not to get steered away), but not more.