logoalt Hacker News

gravyesterday at 7:51 PM5 repliesview on HN

I fail to understand how two LLMs would be "consuming" a different amount of tokens given the same input? Does it refer to the number of output tokens? Or is it in the context of some "agentic loop" (eg Claude Code)?


Replies

lemonfeveryesterday at 7:58 PM

Most LLMs output a whole bunch of tokens to help them reason through a problem, often called chain of thought, before giving the actual response. This has been shown to improve performance a lot but uses a lot of tokens

show 1 reply
jcimsyesterday at 7:55 PM

One very specific and limited example, when asked to build something 4.6 seems to do more web searches in the domain to gather latest best practices for various components/features before planning/implementing.

andrewchildsyesterday at 7:58 PM

I've found that Opus 4.6 is happy to read a significant amount of the codebase in preparation to do something, whereas Opus 4.5 tends to be much more efficient and targeted about pulling in relevant context.

show 1 reply
Gracanayesterday at 8:40 PM

They're talking about output consuming from the pool of tokens allowed by the subscription plan.

bsamuelsyesterday at 7:54 PM

thinking tokens, output tokens, etc. Being more clever about file reads/tool calling.