does the model really improve? i tried several tasks today, and most of them failed, which are super easy ones.
maybe it's just because the gpt5.2 in cursor is super stupid?
The benchmarks are very impressive. Codex and Opus 4.5 are really good coders already and they keep getting better.
No wall yet and I think we might have crossed the threshold of models being as good or better than most engineers already.
GDPval will be an interesting benchmark and I'll happily use the new model to test spreadsheet (and other office work) capabilities. If they can going like this just a little bit further, much of the office workers will stop being useful.... I don't know yet how to feel about this.
Great for humanity probably but but for the individuals?
It's funny how they don't compare themselves to Gemini and Claude anymore.
I use it everyday but have been told by friends that Gemini has overtaken it.
A classic long-form sales pitch. Someone's been reading their Patio11...
Funny that, their front page demo has a mistake. For the waves simulation, the user asks:
>- The UI should be calming and realistic.
Yet what it did is make a sleek frosted glass UI with rounded edges. What it should have done is call a wellness check on the user on suspicion of a co2 leak leading to delirium.
I recently built a webapp to summarize hn comment threads. Sharing a summary given there is a lot here: https://hn-insights.com/chat/gpt-52-8ecfpn.
Can this be used without uploading my code base to their server?
For those curious about the question: "how well does GPT 5.2 build Counter Strike?"
We tried the same prompts we asked previous models today, and found out [1].
The TL:DR: Claude is still better on the frontend, but 5.2 is comparable to Gemini 3 Pro on the backend. At the very least 5.2 did better on just about every prompt compared to 5.1 Codex Max.
The two surprises with the GPT models when it comes to coding: 1. They often use REPLs rather than read docs 2. In this instance 5.2 was more sheepish about running CLI commands. It would instead ask me to run the commands.
Since this isn't a codex fine-tuned model, I'm definitely excited to see what that looks like.
[1] The full video and some details in the tweet here: https://x.com/instant_db/status/1999278134504620363
im happy for this, but there's all these math and science benchmarks, has anyone ever made a communicates-like-a-human benchmark? or an isn't-frustrating-to-talk-with benchmark?
Every new model is ‘state-of-the-art’. This term is getting annoying.
Hmmm, is there any insight if these are really getting much better at coding? Will hand coding be dead within a few years, just human typing in english?
So how much better is it than opus or Gemini ?
gpt-5.2 and gpt-5.2-chat-latest the same token price? Isn't the latter non-thinking and more akin to -nano or -mini?
Is the training cutoff date known?
Slight increase in model cost, but looks like benefits across the board to match.
gpt-5.2 $1.75 $0.175 $14.00
gpt-5.1 $1.25 $0.125 $10.00My god, what terrible marketing, totally written by AI. No flow whatsoever.
I use Gemini 3 with my $10/month copilot subscription on vscode. I have to say, Gemini 3 is great. I can do the work of four people. I usually run out of premium tokens in a week. But I’m actually glad there is a limit or I would never stop working. I was a skeptic, but it seems like there is a wider variety of patterns in the training distribution.
I have already cancelled. Claude is more than enough for me. I don’t see any point in splitting hairs. They are all going to keep lying more and more sneakily.
So, right off the bat: 5.2 code talk (through codex) feels really nice. The first coding attempt was a little meh compared to 5.1 codex max (reflecting what they wrote themselves), but simply planning / discussing things felt markedly better than anything I remember from any previous model, from any company.
I remain excited about new models. It's like finding my coworker be 10% smarter every other week.
I'm not interested in using OpenAI anymore because Sam Altman is so untrustworthy. All you see on X.com is him and Greg Brockman kissing David Sacks' ass, trying to make inroads with him, asking Disney for investments, and shit. Are you kidding? Who wants to support these clowns? Let's let Google win. Let's let Anthropic win. Anyone but Sam Altman.
This is also the exact on-the-day 10th anniversary of openai's creation incidentally
Does it still use the word ‘fluff’ in 90% of its preambles, or is it finally able to get straight to the point?
$168.00 / 1M ouput tokens is hilarious for their "Pro". Can't wait to here all the bitching from orgs next month. Literally the dumbest product of all time. Do you people seriously pay for this?
"Investors are putting pressure, change the version number now!!!"
They just keep flogging that dead horse.
The winner in this race will be whoever gets small local models to perform as well on consumer hardware. It'll also pop the tech bubble in the US.
>>> Already, the average ChatGPT Enterprise user says AI saves them 40–60 minutes a day
If this is what AI has to offer, we are in a gigantic bubble
They’re definitely just training the models on the benchmarks at this point
are we doomed yet?
Seems not yet with 5.2
Still 256K input tokens. So disappointing (predictable, but disappointing).
Did Calmmy Sammy that his is the version that will finally cure cancer? The AI shakeout in the AI industry is going to be brutal. Can't see how Private Equity is going to get the little guy to be left holding the giant bag of excrement, but they will figure that out. AI, smart enough to replace you, but not quite smart enough the replace the CEO or Hedge Fund Bros.
Isn't it delusional to only compare your models against your own previous variants? Where is an actual comparison with Google, Anthropic, OSS Models
“…where it outperforms industry professionals at well-specified knowledge work tasks spanning 44 occupations.”
What a sociopathic way to sell
Is this another GPT-4.5?
GPT-5.2 System Card PDF: https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944...
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[flagged]
I told all my friends to upgrade or they're not my friends anymore /s
No, thank you, OpenAI and ChatGPT doesn't cut it for me.
[flagged]
the halving of error rates for image inputs is pretty awesome, this makes it far more practical for issues where it isn't easy to input all the needed context. when I get lazy I'll just shift+win+s the problem and ask one of the chatbots to solve it.