I fail to understand how two LLMs would be "consuming" a different amount of tokens given ...

grav • yesterday at 7:51 PM • 5 replies • view on HN

I fail to understand how two LLMs would be "consuming" a different amount of tokens given the same input? Does it refer to the number of output tokens? Or is it in the context of some "agentic loop" (eg Claude Code)?

Replies

lemonfever • yesterday at 7:58 PM

Most LLMs output a whole bunch of tokens to help them reason through a problem, often called chain of thought, before giving the actual response. This has been shown to improve performance a lot but uses a lot of tokens

➕ show 1 reply

jcims • yesterday at 7:55 PM

One very specific and limited example, when asked to build something 4.6 seems to do more web searches in the domain to gather latest best practices for various components/features before planning/implementing.

andrewchilds • yesterday at 7:58 PM

I've found that Opus 4.6 is happy to read a significant amount of the codebase in preparation to do something, whereas Opus 4.5 tends to be much more efficient and targeted about pulling in relevant context.

➕ show 1 reply

Gracana • yesterday at 8:40 PM

They're talking about output consuming from the pool of tokens allowed by the subscription plan.

bsamuels • yesterday at 7:54 PM

thinking tokens, output tokens, etc. Being more clever about file reads/tool calling.

alt Hacker News

Replies