Running the code checks if it works now, whereas code review checks if it will work in a year and if anyone else can understand it.
Tests don't catch architectural mistakes or time bombs. If you remove reviews and rely solely on tests, you end up with a "working" big ball of mud that is impossible to maintain. AI won't help if it's the one generating the mud.