logoalt Hacker News

GPT-5.5

1437 pointsby rdyesterday at 6:01 PM953 commentsview on HN

Comments

pants2yesterday at 8:06 PM

Labs still aren't publishing ARC-AGI-3 scores, even though it's been out for some time. Is it because the numbers are too embarrassing?

show 3 replies
bandramiyesterday at 10:52 PM

Cool. Now there will be a week or "this is the greatest model ever and I think mine just gained sentience", followed by a week of "I think they must have just nerfed it because it's not as good as it was a week ago", followed by three weeks of smart people cargo culting the specific incantations they then convince themselves make it work best.

show 1 reply
bradley13yesterday at 7:46 PM

"our strongest set of safeguards to date"

How much capability is lost, by hobbling models with a zillion protections against idiots?

Every prompt gets evaluated, to ensure you are not a hacker, you are not suicidal, you are not a racist, you are not...

Maybe just...leave that all off? I know, I know, individual responsibility no longer exists, but I can dream.

show 1 reply
nullbyteyesterday at 6:09 PM

82.7% on Terminal Bench is crazy

show 1 reply
rarismatoday at 12:17 AM

I like that its more consistent than the 4o and o4 days but still 5.4, 5.3, 5.2, etc still are a mess, for example 5.2 and 5.1 don't have mini models and 5.3 was codex only.

Anthropic is slightly better but where is 4.6 or 4.7 haiku or 4.7 sonnet etc.

show 1 reply
benjx88yesterday at 7:18 PM

Good job on the release notice. I appreciate that it isn't just marketing fluff, but actually includes the technical specs for those of us who care and not concentrated in coding agents only.

I hope GPT 5.5 Pro is not cutting corners and neuter from the start, you got the compute for it not to be.

extryesterday at 6:50 PM

Seems like a continuation of the current meta where GPT models are better in GPT-like ways and Claude models are better in Claude-like ways, with the differences between each slightly narrowing with each generation. 5.5 is noticeably better to talk to, 4.7 is noticeably more precise. Etc etc.

nickandbroyesterday at 7:10 PM

Very impressive! Interesting how all other benchmarks it seems to surpass Opus 4.7 except SWE-Bench Pro (Public). You would think that doing so well at Cyber, it would naturally possess more abilities there. Wonder what makes up the actual difference there

GenerWorkyesterday at 7:19 PM

Looking at the space/game/earthquake tracker examples makes me hopeful that OpenAI is going to focus a bit more on interface visual development/integration from tools like Figma. This is one area where Anthropic definitely reigns supreme.

impulser_yesterday at 6:15 PM

What is the reason behind OpenAI being able to release new models very fast?

Since Feb when we got Gemini 3.1, Opus 4.6, and GPT-5.3-Codex we have seen GPT-5.4 and GPT-5.5 but only Opus 4.7 and no new Gemini model.

Both of these are pretty decent improvements.

show 4 replies
aetherspawnyesterday at 10:26 PM

Umm yeah but this is like every release in the last 3 years.

The big question is: does it still just write slop, or not?

Fool me once, fool me twice, fool me for the 32nd time, it’s probably still just slop.

YmiYugyyesterday at 6:14 PM

So according to the benchmarks somewhere in between Opus 4.7 and Mythos

show 1 reply
neuroelectrontoday at 11:20 AM

Are they using RTX 5090s now?

RayVRtoday at 11:13 AM

My first experience with 5.5 via ChatGPT was immensely disappointing. It was a massive reduction in quality compared to 5.4, which already had issues.

w10-1yesterday at 9:42 PM

NYTimes article - on the same day?

  https://www.nytimes.com/2026/04/23/technology/openai-new-model.html
I can see how some model releases would meet the NY Times news-worthy threshold if they demonstrated significance to users - i.e., if most users were astir and competitors were re-thinking their situation.

However, this same-day article came out before people really looked at it. It seems largely intended to contrast OpenAI with Anthropic's caution, before there has been any evidence that the new model has cyber-security implications.

It's not at all clear that the broader discourse is helping, if even the NY Times is itself producing slop just to stoke questions.

ionwakeyesterday at 6:30 PM

is there anywhere I can try it? ( I just stopped my pro sub ) but was wondering if there is a playground or 3rd party so i can just test it briefly?

deauxyesterday at 10:41 PM

ctrl+f "cutoff, 0 results"

Surely it doesn't still have the same ancient data cutoff as 5.4 did?

k2xlyesterday at 6:23 PM

Surprised to see SWE-Bench Pro only a slight improvement (57.7% -> 58.6%) while Opus 4.7 hit 64.3%. I wonder what Anthropic is doing to achieve higher scores on this - and also what makes this test particular hard to do well in compared to Terminal Bench (which 5.5 seemed to have a big jump in)

show 2 replies
cynicalpeaceyesterday at 6:13 PM

It's possible that "smarter" AI won't lead to more productivity in the economy. Why?

Because software and "information technology" generally didn't increase productivity over the past 30 years.

This has been long known as Solow's productivity paradox. There's lots of theories as to why this is observed, one of them being "mismeasurement" of productivity data.

But my favorite theory is that information technology is mostly entertainment, and rather than making you more productive, it distracts you and makes you more lazy.

AI's main application has been information space so far. If that continues, I doubt you will get more productivity from it.

If you give AI a body... well, maybe that changes.

show 4 replies
AbuAssaryesterday at 7:46 PM

This is the first time openAi include competing models in their benchmarks, always included only openAi models.

tantaloryesterday at 6:58 PM

> A playable 3D dungeon arena

Where's the demo link?

Manik_aggtoday at 5:05 AM

OpenAI finally catching up with claude

zerotosixtyyesterday at 8:09 PM

Those who are using gpt5.5 how does it compare to Opus 4.6 / 4.7 in terms of code generation?

renecitoyesterday at 11:21 PM

why the stats of every AI on every release looks around the same?

Are the tests getting harder and harder so the older AIs look worst and the new ones look like they are "almost there" ?

show 1 reply
faxmeyourcodeyesterday at 6:34 PM

How does it compare to mythos?

adam12yesterday at 9:57 PM

"Sometime with GPT-5.5 I become lazy"

I don't want to be lazy.

immanuwelltoday at 8:28 AM

Big claims from OpenAI as usual - GPT-5.5 sounds impressive on paper, but we've been down this road before, so I'll believe the 'no speed tradeoff' part when I see it in the wild

objektifyesterday at 6:10 PM

Are there faster mini/nano versions as well?

show 2 replies
Poogetoday at 8:59 AM

Up until now I only paid LLM subscriptions to Anthropic but I'm going to give ChatGPT a chance when my current subscription runs out next month.

Schlagbohreryesterday at 9:50 PM

entering this comments area wondering if it will be full of complaints about the new personality, as with every single LLM update

cchristyesterday at 7:50 PM

Which is better GPT-5.5 or Opus 4.7? And for what tasks?

senkoyesterday at 7:21 PM

I might just be following too many AI-related people on X, but omg the media blitz around 5.5 is aggressive.

Soo many unconvincing "I've had access for three weeks and omg it's amazing" takes, it actually primes me for it to be a "meh".

I prefer to see for myself, but the gradual rollout, combined with full-on marketing campaign, is annoying.

phillipcarteryesterday at 6:49 PM

... sigh. I realize there's little that can be done about this, but I just got through a real-world session determining of Opus 4.7 is meaningfully better than Opus 4.6 or GPT 5.4, and now there's another one to try things with. These benchmark results generally mean little to me in practice.

Anyways, still exciting to see more improvements.

egorfineyesterday at 7:47 PM

> We are releasing GPT‑5.5 with our strongest set of safeguards to date

...

> we’re deploying stricter classifiers for potential cyber risk which some users may find annoying initially

So we should be expecting to not be able to check our own code for vulnerabilities, because inherently the model cannot know whether I'm feeding my code or someone else's.

show 1 reply
vardumpyesterday at 7:01 PM

I just can't bear to use services from this company after what they did to the global DRAM markets.

I'm not trying to make any kind of moral statement, but the company just feels toxic to me.

woeiruayesterday at 6:48 PM

Nice to see them openly compare to Opus-4.7… but they don’t compare it against Mythos which says everything you need to know.

The LinkedIn/X influencers who hyped this as a Mythos-class model should be ashamed of themselves, but they’ll be too busy posting slop content about how “GPT-5.5 changes everything”.

show 1 reply
throwaway2027yesterday at 6:43 PM

Good timing I had just renewed my subscription.

I_am_tiberiusyesterday at 6:24 PM

I'd really like to see improvements like these: - Some technical proof that data is never read by open ai. - Proof that no logs of my data or derived data is saved. etc...

show 1 reply
numbersyesterday at 6:31 PM

I've stopped trusting these "trust me bro" benchmarks and just started going to LM Arena and looking for the actual benchmark comparisons.

https://arena.ai/leaderboard/code

show 2 replies
ace2paceyesterday at 8:18 PM

I hear its as good as Opus 4.7.

The battle has just begun

nickandbroyesterday at 10:09 PM

I just prompted GPT-5.5 Pro "Solve Nuclear Fusion" and it one shotted it (kidding obviously)

debbayesterday at 6:51 PM

Cannot see it in Codex CLI

show 1 reply
theihtishamtoday at 2:34 AM

i just installed Codex and And Gave try to GPT 5.5 Its Good As compare to previous one

PilotJefftoday at 2:47 AM

So exhausted from all this endless bs…. Keep releasing , this reminds me of all the .com software during that era where wow we are already at version 3.0 it’s only been 60 Days

c0rruptbytesyesterday at 10:16 PM

literally cannot launch the codex app anymore

🔗 View 36 more comments