This effect is likely even larger when you consider that the raw cost per inferred token grows linea...

zozbot234 • today at 5:21 AM • 1 reply • view on HN

This effect is likely even larger when you consider that the raw cost per inferred token grows linearly with context, rather than being constant. So longer tasks performed with higher-context models will cost quadratically more. The computational cost also grows super-linearly with model parameter size: a 20B-active model is more than four times the cost of a 5B-active model.

Replies

tibbar • today at 5:28 AM

Doesn't context cacheing mostly eliminate this problem? (I suppose for enough context the 90% discount is eventually a lot anyway)

➕ show 1 reply

alt Hacker News

Replies