My point is that I don't see any scenario in which sending a ton of input tokens (the whole past conversation) and expecting output that literally does nothing is absolutely pointless.
Like, sure, if the model were "smarter" it would probably generate something like "Okay, I won't do it.". What is the value of the response "Okay, I won't do it."? Why did you just waisted time and compute to generate it?
> Do you never have a second thought after looking at your output, or realize you forgot something you wanted to include? Could it be that they saw "one new function", thought "boy, there should really be two... what happened?" and changed their mind?
Sure, all of those are totally valid. And in each of this cases it would be better to just don't make the request at all or make the request with the correction.
> like calling your intern
LLM is not human. It can't act on it's own without you triggering it to act. Unlike the intern in your example who will be wasting time and getting frustrated if they don't receive a response from you. With model you can just abandon this "conversation" (which is really just a growing context that you send again and again with every request) forever or until you are ready to continue it. There is no situation when just adding "no" to the conversation is useful.