Just chiming in to say that I've seen the exact same that you have. Tests are better used to help validate that was was generated worked after the fact.
That, and even the absolute SOTA models still suck at writing tests.
Which shouldn't be surprising: humans suck at it too most of the time...
Absolutely, there's no reason to believe that agents will be more capable of writing tests than any other piece of code. The big pay off is actually verifying the code that was generated.