GPT-5.5

1437 points • by rd • yesterday at 6:01 PM • 953 comments • view on HN

Comments

Labs still aren't publishing ARC-AGI-3 scores, even though it's been out for some time. Is it because the numbers are too embarrassing?

➕ show 3 replies

bandrami • yesterday at 10:52 PM

Cool. Now there will be a week or "this is the greatest model ever and I think mine just gained sentience", followed by a week of "I think they must have just nerfed it because it's not as good as it was a week ago", followed by three weeks of smart people cargo culting the specific incantations they then convince themselves make it work best.

➕ show 1 reply

bradley13 • yesterday at 7:46 PM

"our strongest set of safeguards to date"

How much capability is lost, by hobbling models with a zillion protections against idiots?

Every prompt gets evaluated, to ensure you are not a hacker, you are not suicidal, you are not a racist, you are not...

Maybe just...leave that all off? I know, I know, individual responsibility no longer exists, but I can dream.

➕ show 1 reply

nullbyte • yesterday at 6:09 PM

82.7% on Terminal Bench is crazy

➕ show 1 reply

rarisma • today at 12:17 AM

I like that its more consistent than the 4o and o4 days but still 5.4, 5.3, 5.2, etc still are a mess, for example 5.2 and 5.1 don't have mini models and 5.3 was codex only.

Anthropic is slightly better but where is 4.6 or 4.7 haiku or 4.7 sonnet etc.

➕ show 1 reply

benjx88 • yesterday at 7:18 PM

Good job on the release notice. I appreciate that it isn't just marketing fluff, but actually includes the technical specs for those of us who care and not concentrated in coding agents only.

I hope GPT 5.5 Pro is not cutting corners and neuter from the start, you got the compute for it not to be.

extr • yesterday at 6:50 PM

Seems like a continuation of the current meta where GPT models are better in GPT-like ways and Claude models are better in Claude-like ways, with the differences between each slightly narrowing with each generation. 5.5 is noticeably better to talk to, 4.7 is noticeably more precise. Etc etc.

nickandbro • yesterday at 7:10 PM

Very impressive! Interesting how all other benchmarks it seems to surpass Opus 4.7 except SWE-Bench Pro (Public). You would think that doing so well at Cyber, it would naturally possess more abilities there. Wonder what makes up the actual difference there

GenerWork • yesterday at 7:19 PM

Looking at the space/game/earthquake tracker examples makes me hopeful that OpenAI is going to focus a bit more on interface visual development/integration from tools like Figma. This is one area where Anthropic definitely reigns supreme.

impulser_ • yesterday at 6:15 PM

What is the reason behind OpenAI being able to release new models very fast?

Since Feb when we got Gemini 3.1, Opus 4.6, and GPT-5.3-Codex we have seen GPT-5.4 and GPT-5.5 but only Opus 4.7 and no new Gemini model.

Both of these are pretty decent improvements.

➕ show 4 replies

aetherspawn • yesterday at 10:26 PM

Umm yeah but this is like every release in the last 3 years.

The big question is: does it still just write slop, or not?

Fool me once, fool me twice, fool me for the 32nd time, it’s probably still just slop.

YmiYugy • yesterday at 6:14 PM

So according to the benchmarks somewhere in between Opus 4.7 and Mythos

➕ show 1 reply

neuroelectron • today at 11:20 AM

Are they using RTX 5090s now?

RayVR • today at 11:13 AM

My first experience with 5.5 via ChatGPT was immensely disappointing. It was a massive reduction in quality compared to 5.4, which already had issues.

w10-1 • yesterday at 9:42 PM

NYTimes article - on the same day?

  https://www.nytimes.com/2026/04/23/technology/openai-new-model.html

I can see how some model releases would meet the NY Times news-worthy threshold if they demonstrated significance to users - i.e., if most users were astir and competitors were re-thinking their situation.

However, this same-day article came out before people really looked at it. It seems largely intended to contrast OpenAI with Anthropic's caution, before there has been any evidence that the new model has cyber-security implications.

It's not at all clear that the broader discourse is helping, if even the NY Times is itself producing slop just to stoke questions.

ionwake • yesterday at 6:30 PM

is there anywhere I can try it? ( I just stopped my pro sub ) but was wondering if there is a playground or 3rd party so i can just test it briefly?

deaux • yesterday at 10:41 PM

ctrl+f "cutoff, 0 results"

Surely it doesn't still have the same ancient data cutoff as 5.4 did?

k2xl • yesterday at 6:23 PM

Surprised to see SWE-Bench Pro only a slight improvement (57.7% -> 58.6%) while Opus 4.7 hit 64.3%. I wonder what Anthropic is doing to achieve higher scores on this - and also what makes this test particular hard to do well in compared to Terminal Bench (which 5.5 seemed to have a big jump in)

➕ show 2 replies

cynicalpeace • yesterday at 6:13 PM

It's possible that "smarter" AI won't lead to more productivity in the economy. Why?

Because software and "information technology" generally didn't increase productivity over the past 30 years.

This has been long known as Solow's productivity paradox. There's lots of theories as to why this is observed, one of them being "mismeasurement" of productivity data.

But my favorite theory is that information technology is mostly entertainment, and rather than making you more productive, it distracts you and makes you more lazy.

AI's main application has been information space so far. If that continues, I doubt you will get more productivity from it.

If you give AI a body... well, maybe that changes.

➕ show 4 replies

AbuAssar • yesterday at 7:46 PM

This is the first time openAi include competing models in their benchmarks, always included only openAi models.

tantalor • yesterday at 6:58 PM

> A playable 3D dungeon arena

Where's the demo link?

Manik_agg • today at 5:05 AM

OpenAI finally catching up with claude

zerotosixty • yesterday at 8:09 PM

Those who are using gpt5.5 how does it compare to Opus 4.6 / 4.7 in terms of code generation?

renecito • yesterday at 11:21 PM

why the stats of every AI on every release looks around the same?

Are the tests getting harder and harder so the older AIs look worst and the new ones look like they are "almost there" ?

➕ show 1 reply

faxmeyourcode • yesterday at 6:34 PM

How does it compare to mythos?

adam12 • yesterday at 9:57 PM

"Sometime with GPT-5.5 I become lazy"

I don't want to be lazy.

immanuwell • today at 8:28 AM

Big claims from OpenAI as usual - GPT-5.5 sounds impressive on paper, but we've been down this road before, so I'll believe the 'no speed tradeoff' part when I see it in the wild

objektif • yesterday at 6:10 PM

Are there faster mini/nano versions as well?

➕ show 2 replies

Pooge • today at 8:59 AM

Up until now I only paid LLM subscriptions to Anthropic but I'm going to give ChatGPT a chance when my current subscription runs out next month.

Schlagbohrer • yesterday at 9:50 PM

entering this comments area wondering if it will be full of complaints about the new personality, as with every single LLM update

cchrist • yesterday at 7:50 PM

Which is better GPT-5.5 or Opus 4.7? And for what tasks?

senko • yesterday at 7:21 PM

I might just be following too many AI-related people on X, but omg the media blitz around 5.5 is aggressive.

Soo many unconvincing "I've had access for three weeks and omg it's amazing" takes, it actually primes me for it to be a "meh".

I prefer to see for myself, but the gradual rollout, combined with full-on marketing campaign, is annoying.

phillipcarter • yesterday at 6:49 PM

... sigh. I realize there's little that can be done about this, but I just got through a real-world session determining of Opus 4.7 is meaningfully better than Opus 4.6 or GPT 5.4, and now there's another one to try things with. These benchmark results generally mean little to me in practice.

Anyways, still exciting to see more improvements.

egorfine • yesterday at 7:47 PM

> We are releasing GPT‑5.5 with our strongest set of safeguards to date

...

> we’re deploying stricter classifiers for potential cyber risk which some users may find annoying initially

So we should be expecting to not be able to check our own code for vulnerabilities, because inherently the model cannot know whether I'm feeding my code or someone else's.

➕ show 1 reply

vardump • yesterday at 7:01 PM

I just can't bear to use services from this company after what they did to the global DRAM markets.

I'm not trying to make any kind of moral statement, but the company just feels toxic to me.

woeirua • yesterday at 6:48 PM

Nice to see them openly compare to Opus-4.7… but they don’t compare it against Mythos which says everything you need to know.

The LinkedIn/X influencers who hyped this as a Mythos-class model should be ashamed of themselves, but they’ll be too busy posting slop content about how “GPT-5.5 changes everything”.

➕ show 1 reply

throwaway2027 • yesterday at 6:43 PM

Good timing I had just renewed my subscription.

I_am_tiberius • yesterday at 6:24 PM

I'd really like to see improvements like these: - Some technical proof that data is never read by open ai. - Proof that no logs of my data or derived data is saved. etc...

➕ show 1 reply

numbers • yesterday at 6:31 PM

I've stopped trusting these "trust me bro" benchmarks and just started going to LM Arena and looking for the actual benchmark comparisons.

https://arena.ai/leaderboard/code

➕ show 2 replies

ace2pace • yesterday at 8:18 PM

I hear its as good as Opus 4.7.

The battle has just begun

nickandbro • yesterday at 10:09 PM

I just prompted GPT-5.5 Pro "Solve Nuclear Fusion" and it one shotted it (kidding obviously)

debba • yesterday at 6:51 PM

Cannot see it in Codex CLI

➕ show 1 reply

theihtisham • today at 2:34 AM

i just installed Codex and And Gave try to GPT 5.5 Its Good As compare to previous one

PilotJeff • today at 2:47 AM

So exhausted from all this endless bs…. Keep releasing , this reminds me of all the .com software during that era where wow we are already at version 3.0 it’s only been 60 Days

c0rruptbytes • yesterday at 10:16 PM

literally cannot launch the codex app anymore

alt Hacker News

GPT-5.5

Comments

🔗 View 36 more comments