If that’s what you have to do that makes LLMs look more like advanced fuzzers that take textual descriptions as input (“find code that segfaults calling x from multiple threads”, followed by “find changes that make the tests succeed again”) than as truly intelligent. Or, maybe, we should see them as diligent juniors who never get tired.
I don't see any problems with either of those framings.
It really doesn't matter at all whether these things are "truly intelligent". They give me functioning code that meets my requirements. If standard fuzzers or search algorithms could do the same, I would use those too.