Just tried it out for a prod issue was experiencing. Claude never does this sort of thing, I had it write an update statement after doing some troubleshooting, and I said “okay let’s write this in a transaction with a rollback” and GPT-5.5 gave me the old “okay,
BEGIN TRAN;
-- put the query here
commit;
I feel like I haven’t had to prod a model to actually do what I told it to in awhile so that was a shock. I guess that it does use fewer tokens that way, just annoying when I’m paying for the “cutting edge” model to have it be lazy on me like that.
This is in Cursor the model popped up and so I tried it out from the model selector.
Sorry if I’m not getting it, but what was wrong exactly? Is the issue that it merely put “-- put the query here” in the reply, instead of repeating it again?
If so, I’m not sure I’d even consider that a problem. If the goal is for it to give you a query to run, and you ask it “let’s do it in a transaction”, it’s a reasonable thing for it to simply inform you, “yeah you can just type begin first” since it’s assuming you’re going to be pasting the query in anyway. And yeah, it does use fewer tokens, assuming the query was long. Similar to how, if it gave me a command to run, and I say “I’m getting a permission denied”, it would be reasonable for it to say “yeah do it as root, put sudo before the command”, and it’s IMO reasonable if it didn’t repeat the whole thing verbatim just with the word “sudo” first.
But if the context was that you actually expected it to run the query for you, and instead it just said “here, you run it”, then yeah that’s lazy and I’d understand the shock.
OpenAI is the first company that has reached a level of intelligence so high, the model has finally become smart enough to make YOU do all the work. Emergent behavior in action.
All earnesty aside, OpenAI’s oddly specific singular focus on “intelligence per token” (also in the benchmarks) that literally noone else pushes so hard eerily reminds me of Apple’s Macbook anorexia era pre-M1. One metric to chase at the cost of literally anything else. GPT-5.3+ are some of the smartest models out there and could be a pleasure to work with, if they weren’t lazy bastards to the point of being completely infuriating.
GPT-5.5 shatters benchmarks for amount of faith it puts in the user.
I feel like the last 2-3 generations of models (after gpt-5.3-codex) didn't really improve much, just changed stuff around and making different tradeoffs.