That score is on par with Gemini 3 Flash but these scores look much more affected by the agent used than the model, from scrolling through the results.
Gemini 3 Flash is pure rubbish. It can easily get into loop mode and spout information no different than Markov chain and repeat it over and over.
Gemini 3 Flash is pure rubbish. It can easily get into loop mode and spout information no different than Markov chain and repeat it over and over.