Agree - all of this is based on vibes (I also use TDD based on vibes FWIW). The only way to settle "does TDD / caveman / [insert random skill here] help" is to replay real PRs from your repo and measure quality