I assume you're using the "regular" Pro version of Gemini 3.1 for the above, rather t...

nopinsight • today at 7:28 AM • 3 replies • view on HN

I assume you're using the "regular" Pro version of Gemini 3.1 for the above, rather than the Deep Think mode, which is more comparable to GPT-5.5 Pro. To my knowledge, regular 3.1 Pro is a tier below and often makes mistakes.

Moreover, there's no reason to believe the progress of LLMs, which couldn't reliably solve high-school math problems just 3–4 years ago, will stop anytime soon.

You might want to track the progress of these models on the CritPt benchmark, which is built on *unpublished, research-level* physics problems:

https://critpt.com/

Frontier models are still nowhere near solving it, but progress has been rapid.

* o3 (high) <1.5 years ago was at 1.4%

* GPT 5.4 (xhigh), 23.4%

* GPT-5.5 (xhigh), 27.1%

* GPT-5.5 Pro (xhigh) 30.6%.

https://artificialanalysis.ai/evaluations/critpt.

Replies

FrojoS • today at 9:01 AM

> there's no reason to believe the progress of LLMs [...] will stop anytime soon

Wrong. Every advancement has followed a s curve. Where we are on that curve is anyones guess. Or maybe "this time its different".

➕ show 10 replies

Davidzheng • today at 11:47 AM

Deep think still makes many many many more mistakes than gpt 5.5 pro on math

civvv • today at 8:54 AM

There are many indications that model progress is slowing down, so that is not entirely accurate.

➕ show 3 replies

alt Hacker News

Replies