Are you aware that every tool call produces output which also counts as input to the LLM?
Are you aware that a lot of model tool calls are useless and a smarter model could avoid those?
Are you aware that output tokens are priced 5x higher than input tokens?
This has no bearing on my comment. The point is that a better model avoids dozens of prompts and tool calls by making fewer CORRECT tool calls, with the user needing no more prompts.
I’m surprised this is even a question; obviously a better prompter has the same properties and it’s not in dispute?