It probably depends what you're using the models for. If you use them for web search, summarizing web pages, I can imagine there's a plateau and we're probably already hitting it.
For coding though, there is kind of no limit to the complexity of software. The more invariants and potential interactions the model can be aware of, the better presumably. It can handle larger codebases. Probably past the point where humans could work on said codebases unassisted (which brings other potential problems).
> summarizing web pages
For summarizing creative writing, I've found Opus and Gemini 3 pro are still only okay and actively bad once it gets over 15K tokens or so.
A lot of long context and attention improvements have been focused on Needle in a Haystack type scenarios, which is the opposite of what summarization needs.