Cool. Now there will be a week or "this is the greatest model ever and I think mine just gained sentience", followed by a week of "I think they must have just nerfed it because it's not as good as it was a week ago", followed by three weeks of smart people cargo culting the specific incantations they then convince themselves make it work best.
"our strongest set of safeguards to date"
How much capability is lost, by hobbling models with a zillion protections against idiots?
Every prompt gets evaluated, to ensure you are not a hacker, you are not suicidal, you are not a racist, you are not...
Maybe just...leave that all off? I know, I know, individual responsibility no longer exists, but I can dream.
I like that its more consistent than the 4o and o4 days but still 5.4, 5.3, 5.2, etc still are a mess, for example 5.2 and 5.1 don't have mini models and 5.3 was codex only.
Anthropic is slightly better but where is 4.6 or 4.7 haiku or 4.7 sonnet etc.
Good job on the release notice. I appreciate that it isn't just marketing fluff, but actually includes the technical specs for those of us who care and not concentrated in coding agents only.
I hope GPT 5.5 Pro is not cutting corners and neuter from the start, you got the compute for it not to be.
Seems like a continuation of the current meta where GPT models are better in GPT-like ways and Claude models are better in Claude-like ways, with the differences between each slightly narrowing with each generation. 5.5 is noticeably better to talk to, 4.7 is noticeably more precise. Etc etc.
Very impressive! Interesting how all other benchmarks it seems to surpass Opus 4.7 except SWE-Bench Pro (Public). You would think that doing so well at Cyber, it would naturally possess more abilities there. Wonder what makes up the actual difference there
Looking at the space/game/earthquake tracker examples makes me hopeful that OpenAI is going to focus a bit more on interface visual development/integration from tools like Figma. This is one area where Anthropic definitely reigns supreme.
What is the reason behind OpenAI being able to release new models very fast?
Since Feb when we got Gemini 3.1, Opus 4.6, and GPT-5.3-Codex we have seen GPT-5.4 and GPT-5.5 but only Opus 4.7 and no new Gemini model.
Both of these are pretty decent improvements.
Umm yeah but this is like every release in the last 3 years.
The big question is: does it still just write slop, or not?
Fool me once, fool me twice, fool me for the 32nd time, it’s probably still just slop.
So according to the benchmarks somewhere in between Opus 4.7 and Mythos
Are they using RTX 5090s now?
My first experience with 5.5 via ChatGPT was immensely disappointing. It was a massive reduction in quality compared to 5.4, which already had issues.
NYTimes article - on the same day?
https://www.nytimes.com/2026/04/23/technology/openai-new-model.html
I can see how some model releases would meet the NY Times news-worthy threshold if they demonstrated significance to users - i.e., if most users were astir and competitors were re-thinking their situation.However, this same-day article came out before people really looked at it. It seems largely intended to contrast OpenAI with Anthropic's caution, before there has been any evidence that the new model has cyber-security implications.
It's not at all clear that the broader discourse is helping, if even the NY Times is itself producing slop just to stoke questions.
is there anywhere I can try it? ( I just stopped my pro sub ) but was wondering if there is a playground or 3rd party so i can just test it briefly?
ctrl+f "cutoff, 0 results"
Surely it doesn't still have the same ancient data cutoff as 5.4 did?
Surprised to see SWE-Bench Pro only a slight improvement (57.7% -> 58.6%) while Opus 4.7 hit 64.3%. I wonder what Anthropic is doing to achieve higher scores on this - and also what makes this test particular hard to do well in compared to Terminal Bench (which 5.5 seemed to have a big jump in)
It's possible that "smarter" AI won't lead to more productivity in the economy. Why?
Because software and "information technology" generally didn't increase productivity over the past 30 years.
This has been long known as Solow's productivity paradox. There's lots of theories as to why this is observed, one of them being "mismeasurement" of productivity data.
But my favorite theory is that information technology is mostly entertainment, and rather than making you more productive, it distracts you and makes you more lazy.
AI's main application has been information space so far. If that continues, I doubt you will get more productivity from it.
If you give AI a body... well, maybe that changes.
This is the first time openAi include competing models in their benchmarks, always included only openAi models.
> A playable 3D dungeon arena
Where's the demo link?
OpenAI finally catching up with claude
Those who are using gpt5.5 how does it compare to Opus 4.6 / 4.7 in terms of code generation?
why the stats of every AI on every release looks around the same?
Are the tests getting harder and harder so the older AIs look worst and the new ones look like they are "almost there" ?
How does it compare to mythos?
"Sometime with GPT-5.5 I become lazy"
I don't want to be lazy.
Big claims from OpenAI as usual - GPT-5.5 sounds impressive on paper, but we've been down this road before, so I'll believe the 'no speed tradeoff' part when I see it in the wild
Up until now I only paid LLM subscriptions to Anthropic but I'm going to give ChatGPT a chance when my current subscription runs out next month.
entering this comments area wondering if it will be full of complaints about the new personality, as with every single LLM update
Which is better GPT-5.5 or Opus 4.7? And for what tasks?
I might just be following too many AI-related people on X, but omg the media blitz around 5.5 is aggressive.
Soo many unconvincing "I've had access for three weeks and omg it's amazing" takes, it actually primes me for it to be a "meh".
I prefer to see for myself, but the gradual rollout, combined with full-on marketing campaign, is annoying.
... sigh. I realize there's little that can be done about this, but I just got through a real-world session determining of Opus 4.7 is meaningfully better than Opus 4.6 or GPT 5.4, and now there's another one to try things with. These benchmark results generally mean little to me in practice.
Anyways, still exciting to see more improvements.
> We are releasing GPT‑5.5 with our strongest set of safeguards to date
...
> we’re deploying stricter classifiers for potential cyber risk which some users may find annoying initially
So we should be expecting to not be able to check our own code for vulnerabilities, because inherently the model cannot know whether I'm feeding my code or someone else's.
I just can't bear to use services from this company after what they did to the global DRAM markets.
I'm not trying to make any kind of moral statement, but the company just feels toxic to me.
Nice to see them openly compare to Opus-4.7… but they don’t compare it against Mythos which says everything you need to know.
The LinkedIn/X influencers who hyped this as a Mythos-class model should be ashamed of themselves, but they’ll be too busy posting slop content about how “GPT-5.5 changes everything”.
Good timing I had just renewed my subscription.
I'd really like to see improvements like these: - Some technical proof that data is never read by open ai. - Proof that no logs of my data or derived data is saved. etc...
I've stopped trusting these "trust me bro" benchmarks and just started going to LM Arena and looking for the actual benchmark comparisons.
I hear its as good as Opus 4.7.
The battle has just begun
I just prompted GPT-5.5 Pro "Solve Nuclear Fusion" and it one shotted it (kidding obviously)
i just installed Codex and And Gave try to GPT 5.5 Its Good As compare to previous one
So exhausted from all this endless bs…. Keep releasing , this reminds me of all the .com software during that era where wow we are already at version 3.0 it’s only been 60 Days
literally cannot launch the codex app anymore
Labs still aren't publishing ARC-AGI-3 scores, even though it's been out for some time. Is it because the numbers are too embarrassing?