Yeah I specifically tell it not to pre-emptively fix tests that it knows will break as a result of changes its making and instead limit itself only to creating new tests for new changes. I want to see the tests break, then we go through and review each set of breakages versus the mission and assess if they’re regressions or stale assertions. This is a) how I know it’s actually writing meaningful tests b) a very functional and useful form of “code review” versus just trying to catch problems by reading diffs and c) helped me find real problems and regressions.
Almost all the breakages after a big refactor are stale assertions but every time I catch a couple of critical problems that make the entire exercise very worth it.
The whole dev process is so fast compared to writing software manually that I find it absurd that I wouldn’t invest heavily in automated tests.
See my AGENTS.md in nearby comment