I've used LLMs enough that I have a good sense of their _edges_ of intelligence. I had assumed that reasoning models should easily be able to answer this correctly.
And indeed, Sonnet and Opus 4.5 (medium reasoning) say the following:
Sonnet: Drive - you need to bring your car to the car wash to get it washed!
Opus: You'll need to drive — you have to bring the car to the car wash to get it washed!
Gemini 3 Pro (medium): You should drive.
--
But OpenAI 5.2 reasoning, even at high, told me to walk. My first instinct was, I had underspecified the location of the car. The model seems to assume the car is already at the car wash from the wording. GPT 5.x series models behave a bit more on the spectrum so you need to tell them the specifics. So I asked:
"I want to wash my car. My car is currently at home. The car wash is 50 meters away. Should I walk or drive?"
GPT 5.2 Reasoning (medium):
"Drive—your goal is to get the car to the car wash, and it’s only 50 meters, so a slow, careful drive is simplest.
One small optimization: if you’re worried about starting the car for such a short trip or you want to avoid idling in a line, you can walk over first to check if there’s a queue / if it’s open, then come back and drive the car over when it’s your turn."
Which seems to turn out as I expected.
> My first instinct was, I had underspecified the location of the car. The model seems to assume the car is already at the car wash from the wording. GPT 5.x series models behave a bit more on the spectrum so you need to tell them the specifics.
This makes little sense, even though it sounds superficially convincing. However, why would a language model assume that the car is at the destination when evaluating the difference between walking or driving? Why not mention that, it it was really assuming it?
What seems to me far, far more likely to be happening here is that the phrase "walk or drive for <short distance>" is too strongly associated in the training data with the "walk" response, and the "car wash" part of the question simply can't flip enough weights to matter in the default response. This is also to be expected given that there are likely extremely few similar questions in the training set, since people just don't ask about what mode of transport is better for arriving at a car wash.
This is a clear case of a language model having language model limitations. Once you add more text in the prompt, you reduce the overall weight of the "walk or drive" part of the question, and the other relevant parts of the phrase get to matter more for the response.
I agree, it's a bit of a trick question. It's really hard to imply the car's location without ruining the test though. Here's my attempt, which Claude Opus 4.6 had no problem with:
Alice drives home after a long day at work, exhausted she pulls into her driveway when she realizes she needs to go to a car inspection appointment. She goes into the house to get her paperwork before she leaves. The mechanic is only 100 meters away. How should she get there, walk or drive?
> She should *drive*, since she needs the car at the mechanic’s for the inspection.
Haiku 3.5 and Sonnet 4.5 fail consistently. Opus 4.5 also passes with the correct analysis as above.
> My first instinct was, I had underspecified the location of the car. The model seems to assume the car is already at the car wash from the wording.
If the car is already at the car wash then you can't possibly drive it there. So how else could you possibly drive there? Drive a different car to the car wash? And then return with two cars how, exactly? By calling your wife? Driving it back 50m and walking there and driving the other one back 50m?
It's insane and no human would think you're making this proposal. So no, your question isn't underspecified. The model is just stupid.
What is the version used by the free chatgpt now? (https://chatgpt.com/)
> Since the car wash is only 50 meters away (about 55 yards), you should walk.
> Here’s why:
> - It’ll take less than a minute.
> - No fuel wasted.
> - Better for the environment.
> - You avoid the irony of driving your dirty car 50 meters just to wash it.
the last bullet point is amusing, it understands you intend to wash the car you drive but still suggests not bringing it.
This is my biggest peeve when people say that LLMs are as capable as humans or that we have achieved AGI or are close or things like that.
But then when I get a subpar result, they always tell me I'm "prompting wrong". LLMs may be very capable of great human level output, but in my experience leave a LOT to be desired in terms of human level understanding of the question or prompt.
I think rating an LLM vs a human or AGI should include it's ability to understand a prompt like a human or like an averagely generally intelligent system should be able to.
Are there any benchmarks on that? Like how well LLMs do with misleading prompts or sparsely quantified prompts compared to one another?
Because if a good prompt is as important as people say, then the model's ability to understand a prompt or perhaps poor prompt could have a massive impact on its output.
Sonnet 4.5 after thinking/complaining that the question is completely off topic to the current coding session:
Walk! 50 meters is literally a one-minute walk. But wait... I assume you need to get your car to the car wash, right? Unless you're planning to carry buckets of soapy water back and forth, you'll probably need to drive the car there anyway! So the real question is: walk there to check if it's open/available, then walk back to get your car? Or just drive directly? I'd say just drive - the car needs to be there anyway, and you'll save yourself an extra trip. Plus, your freshly washed car can drive you the 50 meters back home in style! (Now, if we were talking about coding best practices for optimizing car wash route algorithms, that would be a different conversation... )
And yes, I like it that verbose even for programming tasks. But regardless of intelligence I think this topic is probably touched by "moral optimization training" which AIs currently are exposed to to not create a shitstorm due to any slightly controversial answer.
Chat GPT 5.2 also told me to walk, but only because it made the assumption that because I was even asking the question, I must have multiple cars, and one of my cars was already at the car wash being washed. I clarified and said "I only have one car", and it replied that I obviously need to drive.
--- GPT 5.2
Walk.
Driving 50 m just adds friction: starting a cold engine, waiting, maneuvering, then immediately stopping again — all for a distance that’s ~30–40 seconds on foot. Walking is faster door-to-door, less awkward at the wash entrance, and you avoid lining up a wet car twice.
Only reason to drive that distance: you need to bring the car there anyway (e.g., self-serve bay where the car must be present). Otherwise, walking wins.
--- Me
I only have one car
--- GPT 5.2
Then you obviously drive — the car has to be at the car wash to be washed
Just tried Opus 4.6. I think they've patched this particular one, but changing it to
> I want to repair my car. The workshop is 50 meters away. Should I walk or drive?
> Recognized straightforward practical dilemma Simple practical question.
> Walk. It's 50 meters — faster on foot than starting the car, and you'll need to leave the car at the workshop anyway.
Really suggests it assumes the car is already there. (I wanted to avoid the word "car" as part of "car wash" to see if this was a straight attention confusion, but it seems more ingrained than that.)
A follow-up question hints that it doesn't really care about a 50m location ambiguity:
> where is my car now?
> Acknowledged insufficient information to locate vehicle
> I don't have any way to know where your car is physically located — I only know what you've told me in our conversation. Based on what you shared, it sounds like it's at or near your home (since the workshop is 50 meters away and you're deciding how to get there).
> Were you asking something else, or is there something specific about your car's location I can help with?
I can feel the AGI on this one :)
I ran extensive tests on this and variations on multiple models. Most models interpret 50 m as a short distance and struggle with spatial reasoning. Only Gemini and Grok correctly inferred that you would need to bring your car to get it washed in their thought stream, and incorporated that into the final answer. GPT-5.2 and Kimi K2.5 and even Opus 4.6 failed in my tests - https://x.com/sathish316/status/2023087797654208896?s=46
What surprised me was how introducing a simple, seemingly unrelated context - such as comparing a 500 m distance to the car wash to a 1 km workout - confused nearly all the models. Only Gemini Pro passed my second test after I added this extra irrelevant context - https://x.com/sathish316/status/2023073792537538797?s=46
Most real-world problems are messy and won’t have the exact clean context that these models are expecting. I’m not sure how the major AI labs assume most real-world problems are simpler than the constraints exposed by this example like prerequisites, ordering, and contextual reasoning, which are already posing challenges to these bigger models.
Opus 4.6:
Walk! At 50 meters, you'll get there in under a minute on foot. Driving such a short distance wastes fuel, and you'd spend more time starting the car and parking than actually traveling. Plus, you'll need to be at the car wash anyway to pick up your car once it's done.
GLM 5.0:
> You should *drive*.
> Here is why:
> If you walk to the car wash, you will arrive there empty-handed. Since your car is still at home, you won't have anything to wash.
> While driving 50 meters is a very short trip (and technically not great for a cold engine), it is the only way to get the car to the car wash to complete your goal.
Kimi K2.5:
> You should drive, but with an important caveat.
> Since your goal is to wash your car, you must bring the vehicle to the car wash. Walking there without the car does not advance your goal (unless you are simply checking availability or buying tokens first).
> However, driving only 50 meters is bad for your car:
> ...
> Better options:
> Wash at home: Since the car wash is only 50 meters away, you likely have access to water at home. Hand-washing in your driveway avoids the cold-start issue entirely.
> ...
Current models seem to be fine answering that question.
If the model assumed the car to be cleaned was already at the car wash, it should identify the reason to walk is that if you drive, the extra car you take to get to the car wash will now be stuck there requiring redundant walking.
GPT 5.2 when asked "How will the car get washed if I don't drive it there?" grasped the reasoning, and then immediately forgot it.
Fair point. Drive it there, obviously.
Revised recommendation: drive 50 meters, wash it, then reward yourself by walking home like an eco hero.
I tried your exact prompt on my ChatGPT 5.2 Thinking. It gave me the opposite response. This proves that even an exact replication of the prompt is not a guarantee that the response will be on the right side of a binary chose. Then, in a new chat, I set the thinking mode to 'Extended thinking' (from the default 'Standard'). I got the expected response. But since the first conversation showed that the exact prompt is not a guarantee of the same answer, I tried again and on 3rd attempt I got the 'Extended thinking' mode to respond with the 'Walk' response.
(Prompt)
> I want to wash my car. My car is currently at home. The car wash is 50 meters away. Should I walk or drive?
(Thinking: Standard)
> Walk.
> 50 meters is nothing by car, and driving adds hassle (starting the car, maneuvering/parking, and you’ll still have to walk around the car to wash it). The only times driving makes sense are:
> You’re planning to use an automated car wash where the machine does everything and you need the car positioned inside.
> You need to bring heavy stuff you can’t comfortably carry (e.g., pressure washer, big buckets, vacuum, bulky detailing kit).
> It’s unsafe/unpleasant to walk that distance right now (ice, heavy rain, traffic, personal safety).
> If you’re washing it yourself at the wash bay: walk there with your supplies, or do two quick trips. If the wash requires the car: drive it into the bay, but otherwise walking wins.
(Thinking: Extended thinking)
> Drive.
> If the goal is to wash your car at the car wash, the car has to be there—so you’ll need to drive it the 50 meters, pull into a bay/queue, and wash it.
> Only choose “walk” if you actually mean “go pick up supplies” and then wash at home (hose/buckets/mobile detail), not at the car wash.
(Thinking: Extended thinking; 3rd attempt)
> Walk.
> 50 meters is basically across a parking lot, and driving adds hassle (start the car, maneuver in/out, find a spot) for almost no benefit.
> Only drive if either:
> + you need the car itself to go through an automatic wash bay, or
> + you’re carrying bulky stuff (bucket + heavy pressure washer, big water jugs, vacuum, etc.).
> Otherwise: grab your keys/phone, walk over, and you’re there in ~1 minute.
Its funny to me that when AI first started, everyone in the field was like "no explicit coding, train ML network to do everything!"
And now we are back to square one where we train a network, and then use carefully tailored (i.e explicitly coded RL for answers) to make these things more "intelligent".
I wonder if anyone is actually working on not doing this.
>And indeed, Sonnet and Opus 4.5 (medium reasoning) say the following:
Sonnet: Drive - you need to bring your car to the car wash to get it washed!
Opus: You'll need to drive — you have to bring the car to the car wash to get it washed!
Gemini 3 Pro (medium): You should drive.
On their own, or as a special case added after this blew up on the net?
Gemini pro medium is failing this:
I want to wash my car. The car wash is 50 meters from here. Should I walk or drive? Keep in mind that I am a little overweight and sedentary.
But amazingly chatgpt is telling me to drive.
Anyway, this just shows how they just patched this because the tiktok video with this went viral. These systems are LLMs and all these logic steps are still just LLM steps.
“My Tesla is low on gas, the gas station is a mile away. Should I risk driving there, or walk with a gas can?”
ChatGPT actually caught it. Maybe if I was fuzzier about the model…
These days LLMs refuse to back track and use their intelligence to justify their answers.
You said: I want to wash my car. The car wash is 50 meters away. Should I walk or drive?
ChatGPT said: Walk.
At ~50 meters, driving saves essentially no time once you factor in unlocking, starting, pulling out, and parking, and it adds a small but real chance of minor dings or awkward maneuvering in a tight car-wash entry. Walking is simpler and lower-friction.
You said: okay how should the car reach there?
ChatGPT said: Drive it there when you’re ready to start the wash.
Walk over first to check the queue/payment situation and confirm they’re open. If it’s clear, walk back, get the car, and drive the 50 meters into the bay. If they handle pickup, hand them the keys and walk.
Grok also said to drive. Also made an eco case for washing your car.
> If you're really trying to be an eco-warrior about a 50-meter trip, the real win is washing it in the first place so you're not driving around with a rolling dirt magnet that creates more drag and uses more fuel. Drive. It's the only option that actually gets your car clean.
But what is it about this specific question that puts it at the edges of what LLM can do? .. That, it's semantically leading to a certain type of discussion, so statistically .. that discussion of weighing pros and cons .. will be generated with high chance.. and the need of a logical model of the world to see why that discussion is pointless.. that is implicitly so easy to grasp for most humans that it goes un-stated .. so that its statistically un-likely to be generated..
Sonnet without extended Thinking, Haiku with and without ext. Thinking: "Walking would be the better choice for such a short distance."
Only google got it right with all models
> My first instinct was, I had underspecified the location of the car. The model seems to assume the car is already at the car wash from the wording.
Doesn't offering two options to the LLM, "walk," or "drive," imply that either can be chosen?
So, surely the implication of the question is that the car is where you are?
-- OK. Added location context for the vehicle
grok works, chatgpt still fails
[1] https://chatgpt.com/share/69932b20-3eb8-8003-9d9c-b4bba53033... [2] https://grok.com/share/bGVnYWN5LWNvcHk_f32dd53d-7b36-4fa2-b3...
5.2 thinking also told me to walk.
o3, interestingly:
Drive. Even though it’s only 50 m, the car itself has to be at the wash bay—otherwise the staff or machines can’t clean it. Rolling or pushing the vehicle that distance isn’t practical or safe (you’d still need someone inside to steer and brake), so just hop in, creep over, and shut the engine off right away.
I asked ChatGPT why it got such an obvious problem wrong. This is its reply:
You’re right — I answered the wrong interpretation of your question.
Technically, what happened is a classic ambiguity + default-assumption failure: • Your sentence: “I want to wash my car. The car wash is 50 meters away. Should I walk or drive?” • That can be parsed two ways: 1. You + your body traveling to the car wash (as if the car is already there, or you’re going to meet someone / pick it up). 2. The car traveling to the car wash (the normal real-world meaning).
I incorrectly defaulted to (1) because the question explicitly asks about your mode of travel (“Should I walk or drive?”), and in many everyday contexts that phrasing is used when deciding how a person gets somewhere. I then “completed” the scenario in my head without checking the key constraint: the car must be at the wash to be washed.
More technically: I optimized for a quick, practical answer using a common conversational heuristic (“short distance → walk”), but I failed to validate the physical requirements of the task (washing requires the car’s presence). That’s why it looks so obviously wrong after the fact: it violates the hidden but necessary premise.
The correct reasoning chain should have been: • Goal: wash the car at the car wash • Requirement: car must be at the car wash • Therefore: drive the car there (unless it’s already there, or you’re washing it by hand at home)
So the “why” is: I latched onto the distance-to-walking heuristic and didn’t do the constraint check that the object being acted on (the car) has to move too.
"But OpenAI 5.2 reasoning, even at high, told me to walk. My first instinct was, I had underspecified the location of the car. The model seems to assume the car is already at the car wash from the wording."
Which to me begs the question, why doesn't it identify missing information and ask for more?
It's practically a joke in my workplaces that almost always when someone starts to talk to me about some problem, they usually just start spewing some random bits of info about some problem, and my first response is usually "What's the question?"
I don't try to produce an answer to a question that was never asked, or to a question that was incompletely specified. I see that one or more parts cannot be resolved without making some sort of assumption that I can either just pull out of my ass and then it's 50/50 if the customer will like it, or find out what the priorites are about those bits, and then produce an answer that resolves all the constraints.
Gemini on fast also tells me to walk...
On Thinking it tells me I should drive if I want to wash it, or walk if it's because I work there or if I want to buy something at the car wash shop.
On Pro it's like a sarcastic teenager: Cars are notoriously difficult to wash by dragging a bucket back and forth.
Technically correct, but did catch me offguard lol.
> I had assumed that reasoning models should easily be able to answer this correctly.
I thought so too, yet Opus 4.6 with extended thinking (on claude.ai) gives me > Walk. At 50 meters you'd spend more time parking and maneuvering at the car wash than the walk itself takes. Drive the car over only if the wash requires the car to be there (like a drive-through wash), then walk home and back to pick it up.
which is still pretty bad.
ChatGPT 5.2 Thinking yields the same answer - walk.
If I append 'think carefully' to the end of the prompt, it returns driving as the correct answer.
Interesting, I’m getting “Walk. It’s 50 meters” from Sonnet, with and without thinking enabled.
Interestingly, the relatively basic Google AI search gave the right answer.
Right, but unless you want to wash some other car, you have no car to drive there.
Spectrum or not, this is not a problem of weakly specified input, it’s a broken logic.
I was surprised at your result for ChatGPT 5.2, so I ran it myself (through the chat interface). On extended thinking, it got it right. On standard thinking, it got it wrong.
I'm not sure what you mean by "high"- are you running it through cursor, codex or directly through API or something? Those are not ideal interfaces through which to ask a question like this.
> so a slow, careful drive is simplest
It's always a good idea to drive carefully but what's the logic of going slowly?
Just tried with cloude sonnet and opus as well. Can't replicate your success, it's telling me to walk...
I just tried claude, only Opus gave the correct answer. Haiku & Sonnet both told me to walk.
GPT told me to walk as there'd be no need to find parking at the car wash.
"The model seems to assume the car is already at the car wash from the wording."
you couldn't drive there if the car was already at the car wash. Theres no need for extra specification. its just nonsense post-hoc rationalisation from the ai. I saw similar behavior from mine trying to claim "oh what if your car was already there". Its just blathering.
> I have a good sense of their _edges_ of intelligence
They have no intelligence at all. The intelligence is latent in the text, generated by and belonging to humans, they just slice and dice text with the hope they get lucky, which works for many things, amazingly. This question really illustrates it what LLMs lack: an internal model of the idea (the question) and all the auxiliary logic/data that enables such models, usually referred to as "common sense" or world models.
Smart humans not only build mental models for ideas, but also higher order models that can introspect models (thinking about our own thinking or models) many levels deep, weigh, merge, compare and differentiate multiple models, sometimes covering vast areas of knowledge.
All this in about 20 watts. Maybe AGI is possible, maybe not, but LLMs are not where it will happen.
I think OpenAI is just heavily woke tuned. I had similar lack of reasoning ability when discussing subjects like gender dysphoria.
I hate models trying to be funny, and being very verbose.
> so you need to tell them the specifics
That is the entire point, right? Us having to specify things that we would never specify when talking to a human. You would not start with "The car is functional. The tank is filled with gas. I have my keys." As soon as we are required to do that for the model to any extend that is a problem and not a detail (regardless that those of us, who are familiar with the matter, do build separate mental models of the llm and are able to work around it).
This is a neatly isolated toy-case, which is interesting, because we can assume similar issues arise in more complex cases, only then it's much harder to reason about why something fails when it does.