You've missed the subtlety here.
LLMs don't have attention to detail.
This project had extremely comprehensive, easily verifiable, tests.
So the LLM could be as sloppy as they usually arez they just had to keep redoing their work until the code actually worked.
I missed the subtlety?
I linked the paper! I read the paper. Yeah. they wrote the tests, which is how this worked! how the heck do you think it was supposed to work?
the fact that they needed to write the tests was just the means to implementation. It didn't change the non-LLM labor economics of the problem.