logoalt Hacker News

zuzululutoday at 1:05 AM1 replyview on HN

Very interesting paper and it lines up exactly with my observations. The ROI just isn't there writing tests up front and the conclusion in that paper lays it out clearly

    Overall, these findings suggest that agent-written
    tests often behave more like a habitual software-development rou-
    tine than a dependable source of validation in this setting. More
    agent-written tests do not mean more solves; what they more reli-
    ably change is the process footprint—API calls, token usage, and
    interaction patterns. Improving the value of testing for code agents
    may therefore require better oracles and more actionable validation
    signals, rather than simply inducing agents to write more tests.
> IMO, where tests clearly help is primarily as an "oracle" applied after generation

Bingo. I'm not against writing tests it's that the returns are better when its used as verification feedback and as "Oracle" exactly as you put it.


Replies

girvotoday at 4:01 AM

Just chiming in to say that I've seen the exact same that you have. Tests are better used to help validate that was was generated worked after the fact.

That, and even the absolute SOTA models still suck at writing tests.

Which shouldn't be surprising: humans suck at it too most of the time...

show 1 reply