I dislike the non-specificity of "models" here. Different models have different attention ...

deliciousturkey • today at 12:33 PM • 0 replies • view on HN

I dislike the non-specificity of "models" here. Different models have different attention architectures, and can therefore have significant differences in long-context behavior. It's true that long context is an issue can most models do drop off in quality, but I would not extrapolate behavior of old models to new ones.

alt Hacker News