Look, just give the Qwen3-vl models a go. I've found them to be fantastic as this kind of thing so far, and what I'm seeing on display here, is laughable in comparison. Close source / closed weight paid model with worse performance than open? common. OpenAI really is a bubble.