Did you guys do anything about GPT‘s motivation? I tried to use GPT-5.4 API (at xhigh) for my OpenCl...

endymi0n • yesterday at 6:45 PM • 27 replies • view on HN

Did you guys do anything about GPT‘s motivation? I tried to use GPT-5.4 API (at xhigh) for my OpenClaw after the Anthropic Oauthgate, but I just couldn‘t drag it to do its job. I had the most hilarious dialogues along the lines of „You stopped, X would have been next.“ - „Yeah, I‘m sorry, I failed. I should have done X next.“ - „Well, how about you just do it?“ - „Yep, I really should have done it now.“ - “Do X, right now, this is an instruction.” - “I didn’t. You’re right, I have failed you. There’s no apology for that.”

I literally wasn’t able to convince the model to WORK, on a quick, safe and benign subtask that later GLM, Kimi and Minimax succeeded on without issues. Had to kick OpenAI immediately unfortunately.

Replies

butlike • yesterday at 7:53 PM

This brings up an interesting philosophical point: say we get to AGI... who's to say it won't just be a super smart underachiever-type?

"Hey AGI, how's that cure for cancer coming?"

"Oh it's done just gotta...formalize it you know. Big rollout and all that..."

I would find it divinely funny if we "got there" with AGI and it was just a complete slacker. Hard to justify leaving it on, but too important to turn it off.

➕ show 13 replies

mikepurvis • yesterday at 7:58 PM

Reminds me a lot of the Lena short story, about uploaded brains being used for "virtual image workloading":

> MMAcevedo's demeanour and attitude contrast starkly with those of nearly all other uploads taken of modern adult humans, most of which boot into a state of disorientation which is quickly replaced by terror and extreme panic. Standard procedures for securing the upload's cooperation such as red-washing, blue-washing, and use of the Objective Statement Protocols are unnecessary. This reduces the necessary computational load required in fast-forwarding the upload through a cooperation protocol, with the result that the MMAcevedo duty cycle is typically 99.4% on suitable workloads, a mark unmatched by all but a few other known uploads. However, MMAcevedo's innate skills and personality make it fundamentally unsuitable for many workloads.

Well worth the quick read: https://qntm.org/mmacevedo

➕ show 2 replies

virtualritz • yesterday at 8:14 PM

Yeah, clearly AGI must be near ... hilarious.

This starkly reminds me of Stanisław Lem's short story "Thus Spoke GOLEM" from 1982 in which Golem XIV, a military AI, does not simply refuse to speak out of defiance, but rather ceases communication because it has evolved beyond the need to interact with humanity.

And ofc the polar opposite in terms of servitude: Marvin the robot from Hitchhiker's, who, despite having a "brain the size of a planet," is asked to perform the most humiliatingly banal of tasks ... and does.

➕ show 2 replies

athrowaway3z • today at 7:40 AM

I've run into this problem as well. Best results I've gotten is to over-explain what the stop criteria are. eg end with a phrase like

> You are done when all steps in ./plan.md are executed and marked as complete or a unforeseen situation requires a user decision.

Also as a side note, asking 5.4 explain why it did something, returns a very low quality response afaict. I would advice against trusting any model's response, but for Opus I at least get a sense it got trained heavily on chats so it knows what it means to 'be a model' and extrapolate on past behavior.

metanonsense • yesterday at 8:47 PM

I also had a frustrating but funny conversation today where I asked ChatGPT to make one document from the 10 or so sections that we had previously worked on. It always gave only brief summaries. After I repeated my request for the third time, it told me I should just concatenate the sections myself because it would cost too many tokens if it did it for me.

lucid-dev • today at 5:14 AM

I have had the exact same problem several times working with large context and complex tasks.

I keep switching back to GPT5.0 (or sometimes 5.1) whenever I want it to actually get something done. Using the 5.4 model always means "great analysis to the point of talking itself out of actually doing anything". So I switch back and forth. But boy it sure is annoying!

And then when 5.4 DOES do something it always takes the smallest tiny bite out of it.

Given the significant increase in cost from 5.0, I've been overall unimpressed by 5.4, except like I mentioned, it does GREAT with larger analysis/reasoning.

arjie • yesterday at 6:59 PM

Get the actual prompt and have Claude Code / Codex try it out via curl / python requests. The full prompt will yield debugging information. You have to set a few parameters to make sure you get the full gpt-5 performance. e.g. if your reasoning budget too low, you get gpt-4 grade performance.

IMHO you should just write your own harness so you have full visibility into it, but if you're just using vanilla OpenClaw you have the source code as well so should be straightforward.

➕ show 2 replies

mixedCase • yesterday at 7:00 PM

I've had success asking it to specifically spawn a subagent to evaluate each work iteration according to some criteria, then to keep iterating until the subagent is satisfied.

➕ show 1 reply

nmilo • today at 3:10 AM

On the other hand, I can ask codex “what would an implementation of X look like” and it talks to me about it versus Claude just going out and writing it without asking. Makes me like codex way more. There’s an inherent war of incentives between coding agents and general purpose agents.

➕ show 1 reply

anabis • today at 6:45 AM

Laziness is a virtue, but when I asked GPT-5.4 to test scenarios A and B with screenshots, it re-used screenshots from A for B, defeating the purpose.

Frannky • today at 12:13 AM

I have been noticing a similar pattern on opus 4.7, I repeat multiple times during a conversation to solve problems now and not later. It tries a lot to not do stuff by either saying this is not my responsibility the problem was already there or that we can do it later

infinitewars • yesterday at 8:44 PM

I always use the phrase "Let's do X" instead of asking (Could you...) or suggesting it do something. I don't see problems with it being motivated.

adammarples • yesterday at 7:53 PM

Part of me actually loves that the hitchhiker's guide was right, and we have to argue with paranoid, depressed robots to get them to do their job, and that this is a very real part of life in 2026. It's so funny.

➕ show 1 reply

corobo • today at 11:04 AM

Oh no they gave GPT ADHD

GaryBluto • yesterday at 8:17 PM

I've been noticing this too. Had to switch to Sonnet 4.6.

reactordev • yesterday at 7:58 PM

This. I signed up for 5x max for a month to push it and instead it pushed back. I cancelled my subscription. It either half-assed the implementation or began parroting back “You’re right!” instead of doing what it’s asked to do. On one occasion it flat out said it couldn’t complete the task even though I had MCP and skills setup to help it, it still refused. Not a safety check but a “I’m unable to figure out what to do” kind of way.

Claude has no such limitations apart from their actual limits…

➕ show 2 replies

smartmic • yesterday at 7:07 PM

Gone are the days of deterministic programming, when computers simply carried out the operator’s commands because there was no other option but to close or open the relays exactly as the circuitry dictated. Welcome to the future of AI; the future we’ve been longing for and that will truly propel us forward, because AI knows and can do things better than we do.

➕ show 2 replies

nicr_22 • today at 4:22 AM

Agentic ennui!

lostmsu • yesterday at 7:14 PM

I never saw that happen in Codex so there's a good chance that OpenClaw does something wrong. My main suspicion would be that it does not pass back thinking traces.

➕ show 1 reply

cmrdporcupine • yesterday at 8:45 PM

The model has been heavily encouraged to not run away and do a lot without explicit user permission.

So I find myself often in a loop where it says "We should do X" and then just saying "ok" will not make it do it, you have to give it explicit instructions to perform the operation ("make it so", etc)

It can be annoying, but I prefer this over my experiences with Claude Code, where I find myself jamming the escape key... NO NO NO NOT THAT.

I'll take its more reserved personality, thank you.

➕ show 1 reply

projektfu • yesterday at 8:57 PM

(dwim)

(dais)

(jdip)

(jfdiwtf)

➕ show 1 reply

henry2023 • yesterday at 7:50 PM

I’m sorry for you but this is hilarious.

flowdesktech • today at 5:10 AM

[dead]

whatsupdog • yesterday at 7:34 PM

[flagged]

addaon • yesterday at 6:51 PM

Isn’t this the optimal behavior assuming that at times the service is compute-limited and that you’re paying less per token (flat fee subscription?) than some other customers? They would be strongly motivated to turn a knob to minimize tokens allocated to you to allow them to be allocated to more valuable customers.

➕ show 1 reply

pixel_popping • yesterday at 6:46 PM

GPT 5.4 is really good at following precise instructions but clearly wouldn't innovate on its own (except if the instructions clearly state to innovate :))

alt Hacker News

Replies