Very interesting paper and it lines up exactly with my observations. The ROI just isn't there writing tests up front and the conclusion in that paper lays it out clearly
Overall, these findings suggest that agent-written
tests often behave more like a habitual software-development rou-
tine than a dependable source of validation in this setting. More
agent-written tests do not mean more solves; what they more reli-
ably change is the process footprint—API calls, token usage, and
interaction patterns. Improving the value of testing for code agents
may therefore require better oracles and more actionable validation
signals, rather than simply inducing agents to write more tests.
> IMO, where tests clearly help is primarily as an "oracle" applied after generationBingo. I'm not against writing tests it's that the returns are better when its used as verification feedback and as "Oracle" exactly as you put it.
Just chiming in to say that I've seen the exact same that you have. Tests are better used to help validate that was was generated worked after the fact.
That, and even the absolute SOTA models still suck at writing tests.
Which shouldn't be surprising: humans suck at it too most of the time...