Do you have an example of the tautological tests you're referring to? What comes to mind to me is genuinely logically tautological tests, like "assert(true || expectedResult == actualResult)" which is a mistake I don't even expect modern AI coding tools to make. But I suspect you're talking about a subtler type of test which at first glance appears useful but actually isn't.
Among many other possible examples, here are a few [0] from Ruby that I've seen in the wild before LLMs, and still see today spat out by LLMs.
0: https://www.codewithjason.com/examples-pointless-rspec-tests...
I don’t have examples but I have an LLM driven project with like…2500 tests and I regularly need to prune:
* no-op tests
* unit tests labeled as integration tests
* skipped tests set to skip because they were failing and the agent didn’t want to fix them
* tests that can never fail
Probably at any given time the tests are 2-4% broken. I’d say about 10% of one-shot tests are bogus if you’re just working w spec + chat and don’t have extra testing harnesses.
For example, you might write a concurrency test, and the agent will cheerfully remove the concurrency and announce that it passes. They get so hung up on making things work in a narrow sense that they lose track of the purpose.
I've definitely seen Opus go to town when asked to test a fairly simple builder. Possibly it inferred something about testing the "contract", and went on to test such properties as
In addition to multiple tests with essentially identical code, multiple test classes with largely duplicated tests etc.