logoalt Hacker News

esperenttoday at 11:16 AM1 replyview on HN

From that paper:

> This raises a central question: do such tests meaningfully improve issue resolution, or do they mainly mimic a familiar software-development practice while consuming interaction budget?

This is an important question but it's not the one I'm most interested in when requiring agents to follow TDD. My goal is to lock in behavior because it was happening way too frequently that an agent would successfully fix the issue at hand, but break something else that it wasn't supposed to touch.

The tests add another layer and it's why I always separate out red and green worker subagents. The green worker might get trigger happy and go beyond scope/break something but it's not allowed to fudge the tests so I'll know and can clean up and revert.

It's also why I'm not too bothered about perfect red green TDD. I can add the tests later if needed.


Replies

rsalustoday at 7:23 PM

tests are an important signal of course, but the use case you describe doesn't necessarily mean you need to follow TDD. the data suggests that creating the tests after the code is just as or even more effective, and at significantly cheaper input cost.

I've been finding enforcing integrations and behavior structurally (e.g., through codegen/schemagen, e2e tests, etc) more reliable than simply instructing the models to write tests. oftentimes these tests are pretty low quality anyway, and results in its own form of tech debt.

show 1 reply