Has anyone tested how good the 1M context window is?
i.e given an actual document, 1M tokens long. Can you ask it some question that relies on attending to 2 different parts of the context, and getting a good repsonse?
I remember folks had problems like this with Gemini. I would be curious to see how Sonnet 4.6 stands up to it.
Did you see the graph benchmark? I found it quite interesting. It had to do a graph traversal on a natural text representation of a graph. Pretty much your problem.