logoalt Hacker News

bryanlarsenyesterday at 3:29 PM1 replyview on HN

It works better, until you run out of tokens. Running out of tokens is something that used to never happen to me, but this month now regularly happens.

Maybe I could avoid running out of tokens by turning off 1M tokens and max effort, but that's a cure worse than the disease IMO.


Replies

cube2222yesterday at 6:27 PM

I would risk a guess that people have a wrong intuition about the long-context pricing and are complaining because of that.

Yeah, the per-token price stays the same, even with large context. But that still means that you're spending 4x more cache-read tokens in a 400k context conversation, on each turn, than you would be in a 100k context conversation.