context window size isnt quite the issue though, its that the attention mass kinda spreads out too m...

carterschonwald • today at 1:55 PM • 0 replies • view on HN

context window size isnt quite the issue though, its that the attention mass kinda spreads out too much and everything kinda converges to a sortah global average region full of what we know to be slop! theres some really cool ways at the harness or model layer to mitigate this. just isnt really prioritized by the labs often.

alt Hacker News