I've had the same thought, maybe more grandiosely. The idea is that LLM prompts are code -- after all they are text that gets 'compiled' (by the LLM) into a lower-level language (the actual code). The compile process is more involved because it might involve some back-and-forth, but on the other hand it is much higher level. The goal is to have a web of prompts become the source of truth for the software: sort of like the flowchart that describes the codebase 'is' the codebase.
One problem with this is that there isn't really a "current prompt" that completely describes the current source code; each source file is accompanied by a full chat log, including false starts and misunderstandings. It's sort of like reading a git history instead of the actual file.
No it doesn’t get compiled. Compilation is a translation from one formal language to another that can be rigorously modeled and is generally reproducible.
Translating from a natural language spec to code involves a truly massive amount of decision making because it’s ambiguous. For a non trivial program, 2 implementations of the same natural language spec will have thousands of observable differences.
Where we are today, that is agents require guardrails to keep from spinning out, there is no way to let agents work on code autonomously or constantly recompile specs that won’t end up with all of those observable differences constantly shifting, resulting in unusable software.
Tests can’t prevent this because for a test suite to cover all observable behavior, it would need to be more complex than the code. In which case, it wouldn’t be any easier for machine or human to understand. The only solution to this problem is that LLMs get better.
Personally I think at the point they can pull this off, they can do any white collar job, and there’s not point in planning for that future because it results in either Mad Max or Star Trek.