logoalt Hacker News

theshrike79yesterday at 6:52 AM1 replyview on HN

When you write tests with LLM-generated code you're not trying to prove correctness in a mathematically sound way.

I think of it more as "locking" the behavior to whatever it currently is.

Either you do the red-green-with-multiple-adversarial-sub-agents -thing or just do the feature, poke the feature manually and if it looks good then you have the LLM write tests that confirm it keeps doing what it's supposed to do.

The #1 reason TDD failed is because writing tests is BOORIIIING. It's a bunch of repetition with slight variations of input parameters, a ton of boilerplate or helper functions that cover 80% of the cases, but the last 20% is even harder because you need to get around said helpers. Eventually everyone starts copy-pasting crap and then you get more mistakes into the tests.

LLMs will write 20 test cases with zero complaints in two minutes. Of course they're not perfect, but human made bulk tests rarely are either.


Replies

godelskiyesterday at 6:16 PM

  > you're not trying to prove correctness in a mathematically sound way.

  > "locking" the behavior to whatever it currently is.
These two sentences are incompatible

  > The #1 reason TDD failed is
Because spec is an ever evolving thing that cannot be determined a priori. And because it highly incentivized engineers to metric hack.

  > It's a bunch of repetition with slight variations
If that's how you're writing tests then you're writing them wrong. You have the wrong level of abstraction. Abstraction is not a dirty word. It solves these problems. Maybe juniors don't understand that abstraction and fuck it up while learning but making abstraction a dirty word is throwing the baby out with the bath water.

  > Eventually everyone starts copy-pasting crap
Which is a horrendous way to write code.