logoalt Hacker News

woadwarrior01yesterday at 1:36 PM1 replyview on HN

Yeah, even if one efficiency trick lands, people will end up spending the saved budget right back on bigger models, and/or more "thinking" tokens.


Replies

EthanHeilmanyesterday at 4:13 PM

Not if the bigger models have diminishing returns. Lets say you figure out a way to reduce RAM requirements 100X, but 2x increasing RAM usage by 2x only gets you a 1% increase in effectiveness and 3x does not get you any noticeable increase over 2x at all. Sure you can reduce the price per token, but you might have already saturated the market. Even if you haven't saturated the market, your hardware based moat just got smaller and this is going to reduce your margins even more.

Just noticed that pydry made a similar point: https://news.ycombinator.com/item?id=47574216