> It's 2026 and the notion that you can get detailed enough requirements and specifications that you can one-shot a perfect solution needs to die.
It's 2026 and the idea that even with detailed-enough requirements you can one-shot even a workable (let alone perfect) solution also needs to die. Anthropic failed to build even something as simple as a workable C compiler, not only with a perfect spec (and reference implementations, both of which the model trained on) but even with thousands of tests painstakingly written over many person-years. Today's models are not yet capable enough to build non-trivial production software without close and careful human supervision, even with perfect specs and perfect tests. Without a perfect spec and a perfect human-written test suite the task is even harder. Maybe in 2027.
I wonder how knowledgeable in compilation was the engineer that attempted this. I'm pretty confident that I could produce a decent C compiler in a few weeks (or less), if given Opus 4.7 + unlimited tokens + a good test suite. (and this is not blind unsubstantiated belief in AI, I've recently rewritten a somewhat sophisticated interpreter in a week with AI; and have worked on several C++ compilers in the past, including a GCC port to a custom DSP, so I have a bit of an idea about what this would take).
But yeah, this is not a "one shot" project, none of it is. One shot doesn't work even with humans - after all, this is exactly what killed waterfall as a methodology.
Sorry where are we seeing that it failed? It compiled multiple projects successfully albeit less optimized.
" It lacks the 16-bit x86 compiler that is necessary to boot Linux out of real mode. For this, it calls out to GCC (the x86_32 and x86_64 compilers are its own).
It does not have its own assembler and linker; these are the very last bits that Claude started automating and are still somewhat buggy. The demo video was produced with a GCC assembler and linker.
The compiler successfully builds many projects, but not all. It's not yet a drop-in replacement for a real compiler. The generated code is not very efficient. Even with all optimizations enabled, it outputs less efficient code than GCC with all optimizations disabled.
The Rust code quality is reasonable, but is nowhere near the quality of what an expert Rust programmer might produce. "
For faffing about with a multi agent system that seems like a pretty successful experiment to me.
Source: https://www.anthropic.com/engineering/building-c-compiler
Edit: Like I think people don't realize not even 7 months ago it wasn't writing this at all.