> can you deterministically test the thing you are asking it to do?
Of course: have it write tests first; and run them to check its work.
Works well for refactoring, but greenfield implementations still rely on a spec that is guaranteed to be incomplete, overcomplete and wrong in many ways.
You can't ask something to check its own work without external reward/penalty. It'll cheat.
Well if the spec is incomplete it sounds like you should lower scope for the AI, and then go from there. I wouldn't be too keen to give a junior engineer free reign and expect awesomeness