Do you have an example of the tautological tests you're referring to? What comes to mind to me ...

tshaddox • yesterday at 3:39 PM • 4 replies • view on HN

Do you have an example of the tautological tests you're referring to? What comes to mind to me is genuinely logically tautological tests, like "assert(true || expectedResult == actualResult)" which is a mistake I don't even expect modern AI coding tools to make. But I suspect you're talking about a subtler type of test which at first glance appears useful but actually isn't.

Replies

tveita • yesterday at 11:20 PM

I've definitely seen Opus go to town when asked to test a fairly simple builder. Possibly it inferred something about testing the "contract", and went on to test such properties as

  - none of the "final" fields have changed after calling each method
  - these two immutable objects we just confirmed differ on a property are not the same object

In addition to multiple tests with essentially identical code, multiple test classes with largely duplicated tests etc.

jihadjihad • yesterday at 3:49 PM

Among many other possible examples, here are a few [0] from Ruby that I've seen in the wild before LLMs, and still see today spat out by LLMs.

0: https://www.codewithjason.com/examples-pointless-rspec-tests...

➕ show 1 reply

adampunk • yesterday at 8:46 PM

I don’t have examples but I have an LLM driven project with like…2500 tests and I regularly need to prune:

* no-op tests

* unit tests labeled as integration tests

* skipped tests set to skip because they were failing and the agent didn’t want to fix them

* tests that can never fail

Probably at any given time the tests are 2-4% broken. I’d say about 10% of one-shot tests are bogus if you’re just working w spec + chat and don’t have extra testing harnesses.

esafak • yesterday at 3:41 PM

For example, you might write a concurrency test, and the agent will cheerfully remove the concurrency and announce that it passes. They get so hung up on making things work in a narrow sense that they lose track of the purpose.

alt Hacker News

Replies