logoalt Hacker News

louiereedersonyesterday at 6:16 PM4 repliesview on HN

For a 56.7 score on the Artificial Intelligence Index, GPT 5.5 used 22m output tokens. For a score of 57, Opus 4.7 used 111m output tokens.

The efficiency gap is enormous. Maybe it's the difference between GB200 NVL72 and an Amazon Tranium chip?


Replies

swyxyesterday at 6:18 PM

why would chip affect token quantity. this is all models.

show 1 reply
karmasimidayesterday at 6:19 PM

Chips doesn’t impact output quality in this magnitude

show 1 reply
AtNightWeCodeyesterday at 9:35 PM

You need to compare total cost. Token count is irrelevant.

dist-epochyesterday at 8:24 PM

If it's a new pretrain, the token embeddings could be wider - you can pack more info into a token making it's way through the system.

Like Chinese versus English - you need fewer Chinese characters to say something than if you write that in English.

So this model internally could be thinking in much more expressive embeddings.