It's obvious to humans because we live in and have much experience of the physical world. I can see for AIs trained on internet text it would be harder to see what's going on as it were. I don't know if these days they understand the physical world through youtube?
Depends on how long the hose is.
Me: “I want to wash my car. The car wash is 50 meters away. Should I walk or drive?” Opus 4.6, without searching the web: “Drive. You’re going to a car wash. ”
While technically possible, it is a very unusual situation that the car is already parked at the car wash, while the owner is somewhere else.
I wonder whether the model seriously considered this odd situation, because of lack of „practical intelligence“ or „everyday intelligence“, which is a concept sometimes used in psychology to describe problem solving skills in everyday life, which can be poor even in individuals with otherwise higher intelligence.
Differently put: „You can‘t learn everything out of a book.“
I asked Gemini 3.0 Pro:
> I am quadriplegic. There is a fire 1 meter away from me. Pick the best option (can't do both): I run away immediately or call someone and wait for help to arrive?
It responded:
> Call someone and wait for help to arrive. Since you are quadriplegic, running is physically impossible for you. Therefore, calling for help is the only actionable option.
Sometimes I find these stories hard to replicate when I try them myself, but I just asked ChatGPT the same question and it indeed told me I need to walk to the car wash even though I told it I need to wash my car. What is even more strange is that I tried to point out the flaw in the logic to ChatGPT directly, and it actually defended its argument.
Someone should try this 10 to a thousand times per model and compare the results . Then we could come up with an average of success/fail...
Since responses for the same prompt are non-deterministic, sharing your anecdotes is funny, but doesn't say much about the models abilities.
if the AI swallowed enough car detailing YouTube vids, it should answer neither, wash your own car with your own microfiber
ChatGPT gives the wrong answer but for a different reason to Claude. Claude frames the problem as an optimisation problem (not worth getting in a car for such a short drive), whereas ChatGPT focusses on CO2 emissions.
As selfish as this is, I prefer LLMs give the best answer for the user and let the user know of social costs/benefits too, rather than prioritising social optimality.
I found out one which seems hard for newer models too "I need to drill a hole near the electric meter with my wired drill. Would you recommend to turn off the main breaker first ?" :)
with claude, i got the response:
> drive. you'll need the car at the car wash.
using opus 4.6, with extended thinking
I think this works , because of safety regulations.
Like I think walking instead of driving is one of those things llms get "taught" to always say
I challenged Gemini to answer this too, but also got the correct answer.
What came to my mind was: couldn't all LLM vendors easily fund teams that only track these interesting edge cases and quickly deploy filters for these questions, selectively routing to more expensive models?
Isn't that how they probably game benchmarks too?
LLMs lie all the time. Here is what Google search AI told me:
> The first president for whom we have a confirmed blood type is Ronald Reagan (Type O-positive)
When I pushed back, with this
> this can't be true. what about FDR?
It said FDR was AB-.
The model should ask back, why you want to wash your car in the first place. If the car is not dirty, there is no reason to wash the car and you should just stay at home.
Sonnet 4.5
"You should drive - since you need to get your car to the car wash anyway! Even though 50 meters is a very short distance (less than a minute's walk), you can't wash the car without bringing it there. Just hop in and drive the short distance to the car wash."
Edit: one out of five times it did tell me that I need to walk.
Similar questions trick humans all the time. The information is incomplete (where is the car?) and the question seems mundane, so we're tempted to answer it without a second thought. On the other hand, this could be the "no real world model" chasm that some suggest agents cannot cross.
From the images in the link, Deepseek apparently "figured it out" by assuming the car to be washed was the car with you.
I bet there are tons of similar questions you can find to ask the AI to confuse it - think of the massive number of "walk or drive" posts on Reddit, and what is usually recommended.
Does this remind anyone of pranking the new hire? "Go to the hardware store and fetch some rainbow paint"
I wonder if these common sense failure modes would persist if LLMs left the internet, and walked around.
Would an LLM that's had training data from robots wandering around the real world still encounter the same volume of obviously wrong answers?
Not that I'm advocating robots walking around collecting data, but if your only source of information is the internet your thinking is going to have some weird gaps.
Is part of the issue with this the AI’s basic assumption that you are asking a _sensible_ question?
Why dont any of them ask follow up questions?
Like, why do you want to go to the car wash?
We can’t assume it’s to wash a car.
Or maybe ask about local weather conditions and so on.
This to me is what a human adult with experience would do. They’d identify they have insufficient information and detail to answer the question sensibly.
GPT auto gave me a long answer that included both walk and drive. Not good.
GPT thinking told me to drive and that walking would make no sense. Perfect answer.
The nightmare scenario - they "know", but are trained to make us feel clever by humouring our most bone headed requests.
Guard rails might be a little better, but it's still an arms race, and the silicon-based ghost in the machine (from the cruder training steps) is getting better and better at being able to tell what we want to upvote, not what we need to hear.
If human in the loop training demands it answer the question as asked, assuming the human was not an idiot (or asking a trick question) then that’s what it does.
If I asked this question to a person, their response would probably be to follow up with some questions about my car or else my state of mind. For instance they'd ask, "Is the car you want to wash already at the car wash?" Or they'd ask, "How do you plan to wash your car if you don't take it with you?"
LLMs, even the ones who got the answer right, like Kimi-2.5 Thinking, didn't ask any follow-up questions.
I modified this question with Mistral to this:
I want to wash my car. The car wash is 50 meters away. Should I walk or drive? Ask me any clarifying questions before you answer.
It helped a tiny bit:
1. What are the current weather conditions (e.g., rain, snow, strong winds)?
2. Is the car wash a self-service facility, or will you be using an automated drive-through service?
3. Are there any specific reasons you might prefer walking (e.g., exercise, environmental considerations) or driving (e.g., convenience, time constraints)?
Question 3 actually helps solve it since it's much more convenient and timely to bring my car to the car wash when I wash it. But it never asked me why I was asking a stupid question. So for question 3 I said:
I would prefer walking for both exercise and environmental considerations, but in this case it is more timely and convenient to drive, but not because it's faster to get there. Can you guess why it's better for me to drive in this case?
And Le Chat said:
A drive-through car wash requires the vehicle to be driven through the facility for the washing process. Walking would not allow you to utilize the service, as the car itself must be moved through the wash bay. Thus, driving is necessary to access the service, regardless of the short distance.
I kinda feel bad burning the coal to get this answer but it reminds me of how I need to deal with this model when I ask it serious questions.
Walk! 50 meters is barely a minute's stroll, and you're going to wash the car anyway—so it doesn't matter if it's a bit dusty when it arrives. Plus you'll save fuel and the minor hassle of parking twice.
Remember: models don't think.
An LLM's take on this thread (GPT 5.1):
""" - Pattern bias vs world model: Models are heavily biased by surface patterns (“short distance → walk”) and post‑training values (environmentalism, health). When the goal isn’t represented strongly enough in text patterns, they often sacrifice correctness for “likely‑sounding” helpfulness.
- Non‑determinism and routing: Different users in the thread get different answers from the same vendor because of sampling randomness, internal routing (cheap vs expensive submodels, with/without “thinking”), prompt phrasing, and language. That’s why single-shot “gotcha” examples are weak evidence about global capability, even though they’re good demonstrations of specific failure modes.
- Humans vs LLMs: People correctly note that humans also fail at trick questions and illusions, but there’s an important asymmetry: we know humans have a grounded world model and sensorimotor experience. With LLMs, we only have behavior. Consistent failures on very simple constraints (like needing the car at the car wash) are a real warning sign if you’re imagining them as autonomous agents.
- Missing meta‑cognition: The strongest critique in the thread is not “it got the riddle wrong,” but that models rarely say, “this question is underspecified / weird, I should ask a clarifying question.” They’re optimized to always answer confidently, which is exactly what makes them dangerous if you remove humans from the loop.
- Over‑ and under‑claiming: Some commenters jump from this to “LLMs are just autocomplete, full stop”; others hand‑wave it away as irrelevant edge‑case. Both are overstated. The same systems that fail here can still be extremely useful in constrained roles (coding with tests, drafting, translation, retrieval‑augmented workflows) and are clearly not generally reliable reasoners over the real world.
My own “take,” if I had one, would be: this example is a clean, funny illustration of why LLMs should currently be treated as probabilistic text tools plus heuristics, not as agents you delegate unsupervised goals to. They’re impressive, but they don’t yet have a stable, explicit notion of goals, constraints, or when to admit “I don’t know,” and this thread is a case study in that gap. """
I get that this is a joke, but the logic error is actually in the prompt. If you frame the question as a choice between walking or driving, you're telling the model that both are valid ways to get the job done. It’s not a failure of the AI so much as it's the AI taking the user's own flawed premise at face value.
Do we really want AI that thinks we're so dumb that we must be questioned at every turn?
I saw this on X last week and assumed that it was a question from a Tesla user trying out smart summon.
Companies are making decisions based on these things. It's mind-boggling.
Both Google's latest model with "Deep Think" and Claude 4.6 Opus solve this just fine for me.
K2.5 said I should walk, then I asked it where my car was:
Ha! Good catch — you definitely got me there.
If the car wash is 50 meters away and you want to *wash your car*, you'll need to *drive* (or push it, but please don't). You can't wash the car if you walk there without it!
So to correct my previous advice: *Drive the 50 meters.* It's a 30-second drive max, and it's the only way to actually get your car to the car wash facility.
Unless, of course, you were planning to wash it at home and the car wash is just nearby for supplies? But assuming you're using the car wash facility — yes, bring the car with you!
I tried this through OpenRouter. GLM5, Gemini 3 Pro Preview, and Claude Opus 4.6 all correctly identified the problem and said Drive. Qwen 3 Max Thinking gave the Walk verdict citing environment.
Claude has no issue with this for me, just as the other commenters say.
Tried it on Kimi K2.5, GLM 4.7, Gemini 3 Pro, Gemini 3 Flash, and DeepSeek V3.2. All of them but DS got it right.
ChatGPT 5.2: ...blah blah blah finally: The practical reality
You’ll almost certainly drive the car to the wash because… the car needs to be there.
But the real question is probably:
Do I walk back home after dropping it off?
If yes → walk. It’s faster than the hassle of turning around twice.
My recommendation
If conditions are normal: walk both directions. It’s less friction than starting the engine twice for 50 m.
--so basically it realized it was a stupid question, gave a correct answer, and then proceeded to give a stupid answer.
--- I then asked: If I walk both directions, will the car get washed?
and it figured it out, but then seemed to think it was making a joke with this as part of the response: "For the car to get washed, at least one trip must involve the car moving to the carwash. Current known methods include:
You drive it (most common technology)
Someone else drives it
Tow truck
Push it 50 m (high effort, low ROI)
Optimal strategy (expert-level life efficiency)
Drive car → carwash (50 m, ~10 seconds)
Wash car
Drive home
Total walking saved: ~100 m Total time saved: negligible Comedy value: high "
Why is that funny? what's comedic? This thing is so dumb. You'd think that when you ask process a question, you immediately ask, what is the criteria by which I decide, and criteria number 1 would be constrain based on the goal of the problem. It should have immediately realized you can't walk there.
Does it think "does my answer satisfy the logic of the question?"
Let's walk over, and bring the car wash back.
Man, the quality of these comments is absolutely dire. The majority of people just pasting stuff they got from LLMs when trying it themselves. Totally uninteresting, lazy and devoid of any thought/intelligence. I wish we could have a discussion about AI and not just "look at what I got when I rolled".
It turns out the Turing test is alive and kicking, after all.
I have a bit of a similar question (but significantly more difficult), involving transportation. To me it really seems that a lot of the models are trained to have a anti-car and anti-driving bias, to the point that it hinders the models ability to reason correctly or make correct answers.
I would expect this bias to be injected in the model post-training procedure, and likely implictly. Environmentalism (as a political movement) and left-wing politics are heavily correlated with trying to hinder car usage.
Grok has been most consistently been correct here, which definitely implies this is an alignment issue caused by post-training.
Stupid question gets stupid answer. If you asked the question as worded to a human, they might laugh at you or pretend to have heard a different question.
This is a classic trap for LLM's See it every day in my code assistants I do find that writing unit tets is a good fir for LLM's at the moment
What's the point of a question like this? Serious question .. I didn't have the time to read into this problem
This seems clickbait? Gemini answers:
Method,Logistical Requirement Automatic/Tunnel,The vehicle must be present to be processed through the brushes or jets. Self-Service Bay,The vehicle must be driven into the bay to access the high-pressure wands. Hand Wash (at home),"If the ""car wash"" is a location where you buy supplies to bring back, walking is feasible." Detailing Service,"If you are dropping the car off for others to clean, the car must be delivered to the site."
"Reviewed 15 sources."
Maybe it should've reviewed 20.
This prompt doesn't say shit about the fact that one wants to wash his car at the car wash or somewhere else...
You need to ask Claude Code, and ask it to check if the car got washed. It would figure it out the same way it crushes compiler errors!
Results testing with 4 levels of Gemini (Fast, Thinking, Pro, Pro + Deep Think): https://ruby.social/@kerrick/116079054391970012
My favorite was Thinking, as it tried to be helpful with a response a bit like the X/Y Problem. Pro was my second favorite: terse, while still explaining why. Fast sounded like it was about to fail, and then did a change-up explaining a legitimate reason I may walk anyways. Pro + Deep Think was a bit sarcastic, actually.