logoalt Hacker News

embedding-shapetoday at 8:46 AM2 repliesview on HN

> melted away in the face of massive context sizes

If only. There is a huge difference between "Gives good responses/can easily spot things within N context size" and "Technically works but sucks within N context size", almost all models basically become cave-people once you go beyond 50% of the "supported" context size, meaning while they may technically work with 1 million output tokens, those last 500K tokens are gonna be massively "dumber" than the first 500k tokens.


Replies

stalfietoday at 12:21 PM

There's at least one benchmark that attempts to measure this, but it has been running for a year plus so it's quite infrequently updated now.

https://fiction.live/stories/Fiction-liveBench-Mar-25-2025/o...

foostertoday at 11:54 AM

I don’t find that to be true?

show 1 reply