An example: it had a complete interface to a hash map. The task was to delete elements. Instead of using the hash map API, it iterated through the entire underlying array to remove a single entry. The expected solution was O(1), but it implemented O(n). These decisions compound. The software may technically work, but the user experience suffers.
If you have particular performance requirements like that, then include them. Test for them. You still don’t have to actually look at the code. Either the software meets expectations or it doesn’t, and keep having AI work at it until you’re satisfied.