My Agent Skill for Test-Driven Development

211 points • by laxmena • last Thursday at 2:10 PM • 96 comments • view on HN

Comments

I find it hard to believe that these LLM systems with their enormous training sets and built-in system prompts have their output meaningfully modified by a few paragraphs of extra prompting in the form of these skill files, BUT, it is cool to see people writing out consise, focused documents like this. These would have great to have as a young developer, and great for several of the teams I've worked in in the past. I dabble with python for automating things here and there and I just learned some new things reading __mharison__'s skill in the comments here.

This kind of wisdom used to be cfound in blog posts, or in the beads of more senior developers, but they were never written out as concisely as these skill files. It's kinda funny that billions of dollars had to be spent creating a machine that's a rough human analog needing guidance to get us to produce these documents

➕ show 3 replies

simonw • yesterday at 8:14 PM

This article would benefit from a date. It looks like it's recent (Internet Archive first grabbed it on May 29th) but it's the kind of information that can quickly become stale as models and agents improve.

(I've been getting solid results recently from simply telling Claude Code and Codex "Test with uv run pytest, use red/green TDD".)

➕ show 7 replies

fowlie • yesterday at 10:20 PM

Haven't tried this, but I've recently become a big fan of Matt Pococks skills. Workflow: /grill-with-docs -> /to-prd -> /to-issue -> /tdd. That will interview relentlessy until there is a "shared understanding" using "ubiquitous language", then it will spec all requirements with user stories, create issues and implement them using tdd.

➕ show 3 replies

zuzululu • yesterday at 9:02 PM

TDD sounds great on paper for agentic development but you quickly realize it balloons the token cost. Often I write some feature and then its repurposed or removed, code is refactored moved around as time goes. With TDD I would be taxed heavily and velocity slow to a crawl.

The waterfall approach is better after trying out TDD especially when you have a multi-agent setup. Also I found that in some cases the tests were just superficial hallucinations that never actually tested the components written or there some some context corruption and ultimately triggered a false positive that kicked off a completely unintentional refactoring.

➕ show 7 replies

SubiculumCode • today at 12:45 AM

One issue that I've run into with codex has been excessive use of fallbacks routines. Perhaps this is good practice in.professional programming in many situations, but for mine (in this case): computing geodesic distances and analysis, a silent bad fallback means the processed data is not what I thought it was..e.g. used an inaccurate geodesic method in place of the accurate one.

➕ show 1 reply

dluxem • yesterday at 8:34 PM

I believe using a skill here is the wrong approach. LLMs already know what TDD is and how to do it, just like object oriented programming.

If this is encoded in a skill, that skill essentially has to be loaded for everything thing your LLM is doing. This is probably one of the few areas where direct instructions via AGENTS.md is best, and I don't believe it requires much direction here to force the issue.

But I think the OP is just trying to have their agent work in a very specific way -- that is fine too.

> 5. Show me the test and ask for approval before continuing

➕ show 2 replies

realty_geek • yesterday at 11:01 PM

As an aside, check out Jason's podcast (codewithjason.com) - its pretty good.

The latest one is with "Uncle Bob Martin" who has some interesting takes on coding with AI from .... can I say an oldie?

➕ show 2 replies

revlsas • today at 1:28 PM

TDD is unnecessary bloat at this point

Just work with Codex to fill the gaps, and then get it to one shot the implementation

Do review afterwards if needed

All these md files will be increasingly useless as models improve

jvuygbbkuurx • yesterday at 9:06 PM

All of these post are missing actual comparisons on results. I read exactly opposite 'you should do x' everyday. If TDD actually was better it would simply be in the system prompts already.

➕ show 1 reply

csbartus • today at 5:45 AM

This specify-encode-fulfill loop/method is effective to make agents create bug-free code.

In my version of this workflow I do specify myself, then let the LLM do the rest.

This way 1.) I'm 100% sure the understanding/spec is good 2.) It's translated into an executable format so the implementation can be verified 3.) The implementation has maximum code coverage tests which steers the AI to produce code which follows standards, fits into the existing codebase, and it's very easy to refactor.

So far, this is the one and only advantage of using LLMs in my SWE practice. They glue together (human written) specs with code, with confidence, in no time.

servercobra • yesterday at 9:02 PM

This overall is pretty close to how I've set up my implementation skill. One thing I'm curious about is how well the analogies like "We don't make dinner in a dirty kitchen." work vs something a lot more straightforward. Any input OP?

➕ show 1 reply

__mharrison__ • yesterday at 9:11 PM

Testing is so important for development.

Even more so when coding with agents. I think it is the probably the biggest lever to keep AI in guardrails.

(It's also why I wrote my latest book, Effective Testing, because I routinely find that my clients are very poor at treating.)

➕ show 1 reply

Ampersander • today at 9:03 AM

Testing is obsolete in the AI age. I just one shot every problem with claude, it never makes a mistake.

➕ show 1 reply

enraged_camel • yesterday at 10:31 PM

Spawning separate agents to review the original agent's implementation results in a very noticeable increase in code quality and decrease in bugs. This is why I encode two or three rounds of sub-agent review during the planning process, where I tell the agent authoring the plan to include those review rounds at the end. If the code is particularly load-bearing, I then ask a fourth agent, usually from the other frontier lab.

All of this burns more tokens of course, but probably way less than coming back to the code later to fix bugs. It is also slower, but in the long run saves time.

➕ show 1 reply

nullc • yesterday at 11:28 PM

If you don't follow up with a pass of injecting bugs and validating that the tests fail in the presence of bugs... then you've only confirmed that the tests can pass and they may be substantially useless.

deepnotes • today at 7:14 AM

[flagged]

eddysir • today at 2:32 AM

[flagged]

keenseller709 • yesterday at 10:25 PM

[flagged]

EvanXue • today at 2:38 AM

[flagged]

tokenfaucet • today at 1:35 AM

[flagged]

Koyukoyu • yesterday at 11:37 PM

[dead]

behnamoh • yesterday at 7:32 PM

Snake oil. Just ask the model, all these custom agents/skills haven't proven that useful in practice.

➕ show 8 replies

yieldcrv • today at 1:47 AM

Tests are vanity in agentic engineering

They do nothing to keep an AI on track in comparison to the aspects that simulate a product manager

And the AI just will correct the test when it fails as opposed to correct the code, because the code didn't miss anything the specification changed

My protip: just write tickets or have the AI write those too. that and the commits and the PRs will function as the AI’s memory better than any client side markdown file masquerading as a soul

➕ show 1 reply

whateveracct • today at 5:32 AM

/test-me

steno132 • yesterday at 8:38 PM

Test driven development is one of the worst ideas nowadays in the LLM age. We have models that can consistently write expert level, usually bug free code for you and rapidly fix even complex bugs in your codebase.

The token cost and tech debt introduced by tests is just not worth it. There's usually no bugs and if there are, you can fix them quickly if and when it's needed.

➕ show 2 replies

bob1029 • today at 7:57 AM

TDD is fundamentally problematic in every practical implementation I've ever seen. I don't think the same thing, but much faster, is going to help at all. TDD tends to cause adverse, higher order effects.

I am currently observing AI authored tests creating a massive sense of complacency because a human no longer owns responsibility for the test suite. It's too easy to reject ownership by way of the various agent prompting schemes. I find myself enjoying the idea of it too, primarily because adding tests to even the most trivial functionality is mandatory due to the TDD policy.

Developing good tests is like an artform. Total coverage is a terrible objective. Correctness does not compose upward. It's a game of chasing ghosts if you think you can build a perfectly clean system bottom up and then magically meet the customer at the top. They're gonna kick your jenga tower over on day one.

➕ show 1 reply

alt Hacker News

My Agent Skill for Test-Driven Development

Comments