It is healthy to question results: That's good science.
This result wouldn't surprise me if the tooling was limited to, say, copilot :)
It would surprise me if it included tooling like Claude Code. Which seems unlikely, given its recency.