logoalt Hacker News

Flux159yesterday at 6:41 PM6 repliesview on HN

I'm a bit confused by this branding (never even noticed that there was a 5.2-Instant), it's not a super fast 1000tok/s Cerebras based model which they have for codex-spark, it's just 5.2 w/out the router / "non-thinking" mode?

I feel like openai is going to get right back to where they were pre GPT-5 with a ton of different options and no one knows which model to use for what.


Replies

tedsandersyesterday at 7:11 PM

Yeah, for a while ChatGPT Plus has been powered by two series of models under the hood.

One series is the Instant series, which is faster and more tuned to ChatGPT, but less accurate.

The second series is the Thinking series, which is more accurate and more tuned to professional knowledge work, but slower (because it uses more reasoning tokens).

We'd also prefer to have simple experience with just one option, but picking just one would pull back the pareto frontier for some group of people/preferences. So for now we continue to serve two models, with manual control for people who want to choose and an imperfect auto switcher for people who don't want to be bothered. Could change down the road - we'll see.

(I work at OpenAI.)

show 11 replies
0xbadcafebeeyesterday at 8:06 PM

It's because people like choice and control, and "5.2" vs "5.2 thinking" is confusing. Making them "5.2 instant" and "5.2 thinking" is less confusing to more people. Their competitors already do this (Gemini 3 Fast & Gemini 3 Thinking).

show 1 reply
NitpickLawyeryesterday at 7:15 PM

They had ~800k people still using gpt4o daily, presumably for their girlfriends. They need to address them somehow. Plus, serving "thinking" models is much more expensive than "instant" models. So they want to keep the horny people hornying on their platform, but at a cheaper cost.

show 2 replies
TrainedMonkeyyesterday at 7:09 PM

Will need to wait for real benchmarks, but based on OpenAI marketing Instant is their latency optimized offering. For voice interface, you don't actually need high tok/s because speech is slow, time to first token matters much more.

josalhoryesterday at 8:01 PM

Reminder that OpenAI serves a lot of customers for free, most of the people I know use the free tier. There is a big limit on thinking queries on free tier, so a decent non thinking model is probably a positive ROI for them.