As your local vision nut, their claims about "SOTA" vision are absolutely BS in my tests.
Sure it's SOTA at standard vision benchmarks. But on tasks that require proper image understanding, see for example BabyVision[0] it appears very much lacking compared to Gemini 3 Pro.
Gemini remains the only usable vision fm :(