gemini isn't even that good. just tested 3.5 on usual complex prompts to opus/chat 5.5. meh
Who would have guessed that something costing roughly a third as much wouldn't do as well at certain tasks.
Well, the first impression is that Gemini still goes off the instruction rails easier than other models, but I noticed that it tends to go back to the initial goal without holding a hand, which is a real improvement. It's really interesting that these models behave so differently.
Are you really comparing flash to opus? Shouldn't you be comparing pro?