Ok wow that is incredibly interesting, what a test. I would've honestly expected just random noise (like if you gave this same task a human, lol) but you can even see related models draw similar results. Maybe it is an indicator of overall knowledge, or how consistent the world model is. It also could not correlate at all with non-geographical knowledge.