logoalt Hacker News

kingstnaptoday at 5:41 PM0 repliesview on HN

A significant caveat is that there is a pricing mismatch that makes it so first party's can subsidize quite heavily.

Agents are expensive in large part because tool calls require round trips. It's because these APIs are stateless and not streaming so you have to resend the whole context each time. This means you have roughly #tool calls x 1/2 context size cached input tokens over any given session. Most API providers overcharge you by a huge amount for cached tokens. A exception being Deepseek. Paying OpenAI $0.05 for 100k cached GPT5.5 tokens during a possibly 2 second round trip agent tool call is like paying $100/hr for what is likely to be ~10 to 20 GB of VRAM residence (holding the KV cache).

Or it got offloaded to NVME and you are paying $0.05 for that much PCIe bandwidth.