logoalt Hacker News

wild_eggyesterday at 1:11 PM6 repliesview on HN

You need to be telling it to create reproduction test cases first and iterate until it's truly solved. There's no need for you to manually be testing that sort of thing.

The key to success with agents is tight, correct feedback loops so they can validate their own work. Go has great tooling for debugging race conditions. Tell it to leverage those properly and it shouldn't have any problems solving it unless you steer it off course.


Replies

epolanskiyesterday at 2:23 PM

+1 half the time I see such posts the answer is "harness".

Put the LLM in a situation where it can test and reason about its results.

show 1 reply
Someoneyesterday at 3:51 PM

If that’s what you have to do that makes LLMs look more like advanced fuzzers that take textual descriptions as input (“find code that segfaults calling x from multiple threads”, followed by “find changes that make the tests succeed again”) than as truly intelligent. Or, maybe, we should see them as diligent juniors who never get tired.

show 1 reply
kitdyesterday at 7:36 PM

TDD and the coding agent: a match made in heaven.

It is Valentine's Day after all.

JetSetIllyyesterday at 2:20 PM

I accept what you say about the best way to use these agents. But my worry is that there is nothing that requires people to use them in that way. I was deliberately vague and general in my test. I don't think how Claude responded under those conditions was good at all.

I guess I just don't see what the point of these tools are. If I was to guide the tool in the way you describe, I don't see how that's better than just thinking about and writing the code myself.

I'm prepared to be shown differently of course, but I remain highly sceptical.

show 2 replies
treydyesterday at 1:25 PM

If only there was a way to prevent race conditions by design as part if the language's type system, and in a way that provides rich and detailed error messages that allow coding agents to troubleshoot issues directly (without having to be prompted to write/run tests that just check for race conditions).