logoalt Hacker News

pitchedyesterday at 11:59 AM1 replyview on HN

That score is on par with Gemini 3 Flash but these scores look much more affected by the agent used than the model, from scrolling through the results.


Replies

varispeedyesterday at 12:24 PM

Gemini 3 Flash is pure rubbish. It can easily get into loop mode and spout information no different than Markov chain and repeat it over and over.