So much incorrect and misinformation in these comments. As someone who is building an agent[0] with ...

OsrsNeedsf2P • yesterday at 11:37 PM • 2 replies • view on HN

So much incorrect and misinformation in these comments. As someone who is building an agent[0] with MCP tools, neither the MCP tool description nor the response is the problem. Both of those are easily solved by not bloating them.

The real killer is the input tokens on each step. If you have 100k tokens in the conversation, and the LLM calls an MCP tool, the output and the existing conversation is sent back. So now you've input 200k tokens to the LLM.

Now imagine 10 tool calls per user message - or 50. You're sending 1-5M input tokens, not because the MCP definitions or tool responses are large, but because at each step, you have to send the whole conversation again.

"what about caching" - Only 90% savings, also cache misses are surprisingly common (we see as low as 40% cache hit rate)

"MCP definitions are still large" - not compared to any normal conversation. Also these get cached

We've seen the biggest savings by batching/parallelizing tool calls. I suspect the future of LLM tool usage will have a different architecture, but CLI doesn't solve the problems either.

[0] https://ziva.sh, it's an agent specialized for Godot[1]

[1] https://godotengine.org

Replies

martinald • today at 2:22 AM

But this is just the nature of LLMs (so far). Every "conversation" involves sending the entire conversation history back.

The article misses imo the main benefit of CLIs vs _current_ MCP implementations [1], the fact that they can be chained together with some sort of scripting by the agent.

Imagine you want to sum the total of say 150 order IDs (and the API behind the scenes only allows one ID per API calls).

With MCP the agent would have to do 150 tool calls and explode your context.

With CLIs the agent can write a for loop in whatever scripting language it needs, parse out the order value and sum, _in one tool call_. This would be maybe 500 tokens total, probably 1% of trying to do it with MCP.

[1] There is actually no reason that MCP couldn't be composed like this, the AI harnesses could provide a code execution environment with the MCPs exposed somehow. But noone does it ATM AFIAK. Sort of a MCP to "method" shim in a sandbox.

sudhirb • today at 1:07 AM

a 90% saving is huge isn't it?

for long agent sessions, I would expect a very high cache hit rate unless you're editing the system prompt, tools, or history between turns, or some turns take longer than the cache timeout

alt Hacker News

Replies