I kinda like the analogy of travelling here.
With normal artisanal coding you take your time getting from A to B and you might find out alternate routes while you slowly make your way to the destination. There's also a clear cost in backtracking and trying an alternate route - you already wrote the "wrong" code and now it's useless. But you also gained more knowledge and maybe in a future trip from A to C or C to D you know that a side route like that is a bad idea.
Also because it's you, a human with experience, you know not to walk down ravines or hit walls at full speed.
With LLMs there's very little cost in backtracking. You're pretty much sending robots from A to B and checking if any of them make it every now and then.
The robots will jump down ravines and take useless side routes because they lac the lived in experience "common sense" of a human.
BUT what makes the route easier for both are linters, tests and other syntactic checks. If you manage to do a full-on Elmo style tunnel from A to B, it's impossible to miss no matter what kind of single-digit IQ bot you send down the tube at breakneck speed. Or just adding a few "don't walk down here, stay on the road" signs on the way,
Coincidentally the same process also makes the same route easier for inexperienced humans.
tl;dr If you have good specs and tests and force the LLM to never stop until the result matches both, you'll get a lot better results. And even if you don't use an AI, the very same tooling will make it easier for humans to create good quality code.
That would be great if you were a research lab with unlimited funding. But most business needs to grapple with real user data. Data they've been hired to process or to provide an easier way to process. Trying stuff until something sticks is not a real solution.
Having tests and specs is no guarantee that something will works. The only truth is the code. One analogy that I always take is the linear equation y = ax + b. You cannot write tests that fully proves that this equation is implemented without replicating the formula in the tests. Instead you check for a finite set of tuples (x, y). Those will helps if you chose the wrong values of a or switch to the negative of b, but someone that knows the tests can come up with a switch case that returns the correct y for the x in the tests and garbage otherwise. That is why puzzle like leetcode don't show you the tests.