As people are discovering, natural language is insufficiently precise to be able to specify edge cases. Any language precise enough to be formally verified against is a programming language
we're going to end up speaking past each other - but generally I do agree with you and am not denouncing the importance of formal verification methods. I do think abstractions are going to dominate the human ux above them
One agent generates : Spec -> Code then
Another agent: Code -> Inverted Spec
then compare Spec and Inverted Spec.
If there is a Gap, a Human fixes and clarifies the Gap.
This is like Generator and Discriminator aspects of GAN models or Autoencoder models.