logoalt Hacker News

GeorgeOldfieldyesterday at 7:53 PM3 repliesview on HN

gemini isn't even that good. just tested 3.5 on usual complex prompts to opus/chat 5.5. meh


Replies

k8sToGoyesterday at 8:19 PM

Are you really comparing flash to opus? Shouldn't you be comparing pro?

show 1 reply
bachmeieryesterday at 8:52 PM

Who would have guessed that something costing roughly a third as much wouldn't do as well at certain tasks.

kmac_yesterday at 8:31 PM

Well, the first impression is that Gemini still goes off the instruction rails easier than other models, but I noticed that it tends to go back to the initial goal without holding a hand, which is a real improvement. It's really interesting that these models behave so differently.