Qwen's flamingo is artistically far more interesting. It's a one-eyed flamingo with sunglasses and a bow tie who smokes pot. Meanwhile Opus just made a boring, somewhat dorky flamingo. Even the ground and sky are more interesting in Qwen's version
But in terms of making something physically plausible, Opus certainly got a lot closer
"artistically interesting" is IMHO both a subjective and 'solved' problem. These models are trained with an "artistically interesting" reward model that tries to guide the model towards higher quality photos.
I think getting the models to generate realistic and proportional objects is a much harder and important challenge (remember when the models would generate 6 fingers?).
The Opus bike isn't very physically plausible though.
Given adherence is a more significant practical barrier, it's probably the better signal. That is, if we decide too look for signal here.