The questions do ask specifically to respond with the answer only, with an example format given in m...

XCSme • today at 1:42 AM • 0 replies • view on HN

The questions do ask specifically to respond with the answer only, with an example format given in many cases.

Note that all reasoning models are tested with "medium" reasoning.

The benchmarks are questions/data processing tasks that an average user will likely ask, not coding questions (I didn't add any coding tests yet).

Gemini models also tend to be very consistent. Asking the same question will likely give the same result.

The two models you mention scored the same, the only difference is that Gemini was better at domain-specific questions (i.e. you ask something quite technical/niche).

alt Hacker News