The most important benchmark:
https://boutell.dev/misc/qwen3-max-pelican.svg
I used Simon Willison's usual prompt.
It thought for over 2 minutes (free account). The commentary was even more glowing than the image.
It has a certain charm.