> The model performs a statistical substitution, replacing a 1-of-10,000 token with a 1-of-100 synonym
Do we see this in programming too? I don't think so? Unique, rarely used API methods aren't substituted the same way when refactoring. Perhaps that could give us a clue on how to fix that?
I think that's different because refactoring usually involves calling the same functions/methods albeit in a bit more readable way.
When not given a clear guideline to "just" refactor, I have had problems with LLMs hallucinating functions that don't exist.