> With hard requirements listed, I found out that the generated code missed requirements,
This is hardly a surprise, no? No matter how much training we run, we are still producing a generative model. And a generative model doesn't understand your requirements and cross them off. It predicts the next most likely token from a given prompt. If the most statistically plausible way to finish a function looks like a version that ignores your third requirement, the model will happily follow through. There's really no rules in your requirements doc. They are just the conditional events X in a glorified P(Y|X). I'd venture to guess that sometimes missing a requirement may increase the probability of the generated tokens, so the model will happily allow the miss. Actually, "allow" is too strong a word. The model does not allow shit. It just generates.
But agents do keep task lists and check the tasks off as they go. Of course it’s not perfect either but it’s MUCH better than an LLM can offer on its own.
If you are seeing an agent missing tasks, work with it to write down the task list first and then hold it accountable to completing them all. A spec is not a plan.