Sort of agreed. Natural language specs don't scale. They can't be used to accurately model and verify the behavior of complex systems. But they can be used as a guide to create formal language specs that can be used for that purpose. As long as the formal spec is considered to be the ground truth, I think it can scale. But yeah, that means some kind of code will be required.. :)
Things like Github's speckit seems to have a fair amount of usage.
The idea that specs are code now, is one can effectively rebuild in the future with newer models. Test requirements could be defined upfront in the specs too, no?