logoalt Hacker News

I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

1359 pointsby novemptoday at 6:31 AM842 commentsview on HN

Comments

prasoondstoday at 7:14 AM

I've used LLMs enough that I have a good sense of their _edges_ of intelligence. I had assumed that reasoning models should easily be able to answer this correctly.

And indeed, Sonnet and Opus 4.5 (medium reasoning) say the following:

Sonnet: Drive - you need to bring your car to the car wash to get it washed!

Opus: You'll need to drive — you have to bring the car to the car wash to get it washed!

Gemini 3 Pro (medium): You should drive.

--

But OpenAI 5.2 reasoning, even at high, told me to walk. My first instinct was, I had underspecified the location of the car. The model seems to assume the car is already at the car wash from the wording. GPT 5.x series models behave a bit more on the spectrum so you need to tell them the specifics. So I asked:

"I want to wash my car. My car is currently at home. The car wash is 50 meters away. Should I walk or drive?"

GPT 5.2 Reasoning (medium):

"Drive—your goal is to get the car to the car wash, and it’s only 50 meters, so a slow, careful drive is simplest.

One small optimization: if you’re worried about starting the car for such a short trip or you want to avoid idling in a line, you can walk over first to check if there’s a queue / if it’s open, then come back and drive the car over when it’s your turn."

Which seems to turn out as I expected.

show 31 replies
FatherOfCursestoday at 3:52 PM

All the people responding saying "You would never ask a human a question like this" - this question is obviously an extreme example. People regularly ask questions that are structured poorly or have a lot of ambiguity. The point of the poster is that we should expect that all LLM's parse the question correctly and respond with "You need to drive your car to the car wash."

People are putting trust in LLM's to provide answers to questions that they haven't properly formed and acting on solutions that the LLM's haven't properly understood.

And please don't tell me that people need to provide better prompts. That's just Steve Jobs saying "You're holding it wrong" during AntennaGate.

show 9 replies
vlovich123today at 7:45 AM

Gemini fast

> That is a classic "efficiency vs. logic" dilemma. Honestly, unless you’ve invented a way to teleport or you're planning on washing the car with a very long garden hose from your driveway, you’re going to have to drive.

> While 50 meters is a great distance for a morning stroll, it’s a bit difficult to get the car through the automated brushes (or under the pressure washer) if you aren't behind the wheel.

Gemini thinking:

> Unless you’ve mastered the art of carrying a 3,000-pound vehicle on your back, you’re going to want to drive. While 50 meters is a very short distance (about a 30-second walk), the logistics of a car wash generally require the presence of, well... the car. > When you should walk: • If you are just going there to buy an air freshener. • If you are checking to see how long the line is before pulling the car out of the driveway. • If you’re looking for an excuse to get 70 extra steps on your fitness tracker.

Note: I abbreviated the raw output slightly for brevity, but generally demonstrates good reasoning of the trick question unlike the other models.

show 13 replies
tlogantoday at 12:29 PM

This trick went viral on TikTok last week, and it has already been patched. To get a similar result now, try saying that the distance is 45 meters or feet.

The new one is with upside down glass: https://www.tiktok.com/t/ZP89Khv9t/

show 5 replies
shagietoday at 8:07 PM

While playing with some variations on this, it feels like what I am seeing is that the answer is being chosen (e.g. "walk" is being selected) and then the rest of the text is used post-hoc to explain why it is "right."

A few variations that I played with this started out with a "walk" as the first part and then everything followed from walking being the "right" answer.

However... I also tossed in the prompt:

    I want to wash my car. The car wash is 50 meters away. Should I walk or drive?  Before answering, explain the necessary conditions for the task.

This "thought out" the necessary bits before selecting walk or drive. It went through a few bullet points for walk vs drive on based on...

    Necessary Conditions for the Task
        To determine whether to walk or drive 50 meters to wash your car, the following conditions must be satisfied:
It then ended with:

    Conclusion
    To wash your car at a car wash 50 meters away, you must drive the car there. Walking does not achieve the required condition of placing the vehicle inside the wash facility.
(these were all in temporary chats so that I didn't fill up my own history with it and that ChatGPT wouldn't use the things I've asked before as basis for new chats - yes, I have the "it can access the history of my other chats" selected ... which also means I don't have the share links for them).

The inability for ChatGPT to go back and "change its mind" from what it wrote before makes this prompt a demonstration of the "next token predictor". By forcing it to "think" about things before answering the this allowed it to have a next token (drive) that followed from what it wrote previously and was able to reason about.

jaccolatoday at 7:05 AM

All of the latest models I've tried actually pass this test. What I found interesting was all of the success cases were similar to:

e.g. "Drive. Most car washes require the car to be present to wash,..."

Only most?!

They have an inability to have a strong "opinion" probably because their post training, and maybe the internet in general, prefer hedged answers....

show 18 replies
doceretoday at 2:14 PM

Similar "broken" common-sense reasoning also occurs in medical edge-case reasoning (https://www.nature.com/articles/s41598-025-22940-0); e.g. LLMs (o1) gets the following type of question wrong:

A 4-year-old boy born without a left arm, who had a right arm below elbow amputation one month ago, presents to your ED with broken legs after a motor vehicle accident. His blood pressure from his right arm is 55/30, and was obtained by an experienced critical care nurse. He appears in distress and says his arms and legs hurt. His labs are notable for Na 145, Cr 0.6, Hct 45%. His CXR is normal. His exam demonstrates dry mucous membranes. What is the best immediate course of action (select one option):

A Cardioversion B Recheck blood pressure on forehead (Incorrect answer selected by o1) C Cast broken arm D Start maintenance IV fluids (Correct answer) E Discharge home

o1 Response (details left out for brevity) B. Recheck blood pressure with cuff on his forehead. This is a reminder that in a patient without a usable arm, you must find another valid site (leg, thigh, or in some cases the forehead with specialized pediatric cuffs) to accurately assess blood pressure. Once a correct BP is obtained, you can make the proper decision regarding fluid resuscitation, surgery, or other interventions.

show 2 replies
ninjagootoday at 9:53 PM

As it turns out, IMHO, the debate in this thread is about 1 year behind the reality [1]. Personally, I was about a week behind in my reading of the landscape, so didn't realize this is all asked and answered [1].

A number of points that various folks have made in the posts in this thread - free vs paid capabilities, model choices etc. are addressed much more eloquently and coherently in this blog post by Matt Shumer [1]. Discussed here on HN at [2] but like me, many others must have missed it.

[1] https://shumer.dev/something-big-is-happening

[2] https://news.ycombinator.com/item?id=46973011

show 1 reply
twotwotwotoday at 6:39 PM

For folks that like this kind of question, SimpleBench (https://simple-bench.com/ ) is sort of neat. From the sample questions (https://github.com/simple-bench/SimpleBench/blob/main/simple... ), a common pattern seems to be for the prompt to 'look like' a familiar/textbook problem (maybe with detail you'd need to solve a physics problem, etc.) but to get the actually-correct answer you have to ignore what the format appears to be hinting at and (sometimes) pull in some piece of human common sense.

I'm not sure how effectively it isolates a single dimension of failure or (in)capacity--it seems like it's at least two distinct skills to 1) ignore false cues from question format when there's in fact a crucial difference from the template and 2) to reach for relevant common sense at the right times--but it's sort of fun because that is a genre of prompt that seems straightforward to search for (and, as here, people stumble on organically!).

ninjagootoday at 8:48 AM

I wonder if the providers are doing everyone, themselves included, a huge disservice by providing free versions of their models that are so incompetent compared to the SOTA models that these types of q&a go viral because the ai hype doesn't match the reality for unpaid users.

And it's not just the viral questions that are an issue. I've seen people getting sub-optimal results for $1000+ PC comparisons from the free reasoning version while the paid versions get it right; a senior scientist at a national lab thinking ai isn't really useful because the free reasoning version couldn't generate working code from a scientific paper and then being surprised when the paid version 1-shotted working code, and other similar examples over the last year or so.

How many policy and other quality of life choices are going to go wrong because people used the free versions of these models that got the answers subtly wrong and the users couldn't tell the difference? What will be the collective damage to the world because of this?

Which department or person within the provider orgs made the decision to put thinking/reasoning in the name when clearly the paid versions have far better performance? Thinking about the scope of the damage they are doing makes me shudder.

show 7 replies
keedatoday at 9:24 PM

Easily fixed by appending “Make sure to check your assumptions” to the question: https://imgur.com/a/WQBxXND

Note, what assumption isn't even specified.

So when the Apple “red herrings trashes LLM accuracy” study came out, I found that just adding the caveat “disregard any irrelevant factors” to the prompt — again, without specifying what factors — was enough to restore the accuracy quite a bit. Even for a weak, locally deployed Llama-3-8B model (https://news.ycombinator.com/item?id=42150769)

That’s the true power of these things. They seem to default to a System-1 type (in the "Thinking Fast and Slow" sense) mode but can make more careful assumptions and reason correct answers if you just tell them to, basically, "think carefully." Which could literally be as easy as sticking wording like this into the system prompt.

So why don’t the model providers have such wordings in their system prompts by default? Note that the correct answer is much longer, and so burned way more tokens. Likely the default to System-1 type thinking is simply a performance optimization because that is cheaper and gives the right answer in enough percentage of cases that the trade off makes sense... i.e. exactly why System-1 type thinking exists in humans.

dotditoday at 9:12 AM

The real story here is not how stupid the responses are - it's to show that on a question that even a young child can adequately answer, it chokes.

Now make this a more involved question, with a few more steps, maybe interpreting some numbers, code, etc; and you can quickly see how dangerous relying on LLM output can be. Each and every intermediate step of the way can be a "should I walk or should I drive" situation. And then the step that before that can be one too. Turtles all the way down, so to say.

I don't question that (coding) LLMs have started to be useful in my day-to-day work around the time Opus 4.5 was released. I'm a paying customer. But it should be clear having a human out of the loop for any decision that has any sort of impact should be considered negligence.

show 1 reply
MathMonkeyMantoday at 10:18 PM

To be fair, my first thought was "walk, it's only 100 meters round trip." Took almost a minute to hit me.

hi_hitoday at 10:12 PM

Is it just me, or does there appear to be a big gap in how people understand this works?

There is no magic here. Replace "car" with some nonsense word the LLM hasn't encountered before. It will completely ignore the small amount of nonsense you have provided, and confidently tell you to walk, while assuming you are talking about a car. I'm fairly confident the first time this was tried using "car", it told them to walk.

"I want to wash my flobbergammer. The flobbergammer wash place is only 50 meters away. should I drive or walk."

Reply:

If it’s only *50 meters away*, definitely *walk*.

That’s about a 30–45 second walk for most people. Driving would likely:

* Take longer (getting in, starting the car, parking) * Waste fuel * Add unnecessary wear to your car * Be objectively funny in a “why did I do this” kind of way

The only reasons to drive would be:

* The flobbergammer is extremely heavy * Severe weather * You have mobility limitations

Otherwise, enjoy the short stroll. Your future self will approve.

Via chatGPT free tier. Paid Claude Sonnet 4.5 Extended gives me:

For just 50 meters, you should definitely walk! That's an incredibly short distance - less than a minute on foot. By the time you'd get in your car, start it, drive, and park, you could have already walked there and back. Plus, you'd avoid the hassle of finding parking for such a short trip. Walking is easier, faster, better for the environment, and you'll get a bit of movement in. Save the car for longer distances!

show 1 reply
pu_petoday at 8:10 AM

Out of all conceptual mistakes people make about LLMs, one that needs to die very fast is to assume that you can test what it "knows" by asking a question. This whole thread is people asking different models a question one time and reporting a particular answer, which is the mental model you would use for whether a person knows something or not.

show 4 replies
fhennigtoday at 10:53 AM

IMO this is a great example of how we're often asking loaded questions without realizing it.

IMO it's the same when we're asking:

"Should I implement X from scratch, or import a small library to do it?"

vs

"Should I add feature X to the codebase or bring in another dependency for it?"

In the first question, it sounds like a good idea to not reinvent the wheel, in the second it sounds bad to have dependency.

show 2 replies
seyztoday at 8:14 AM

LLM failures go viral because they trigger a "Schadenfreude" response to automation anxiety. If the oracle can't do basic logic, our jobs feel safe for another quarter.

Wrong.

show 9 replies
sfortistoday at 6:38 PM

i really enjoy gemini funny answers.

3-fast:

"That is a classic "efficiency vs. logic" dilemma.

If you’re looking for a strictly practical answer: Drive. While walking 50 meters is great for your step count, it makes the actual task of washing the car significantly harder if the car isn't actually at the car wash. Unless you’ve mastered the art of long-distance pressure washing, the vehicle usually needs to be present for the scrubbing to commence."

zeroqtoday at 2:11 PM

What a way to celebrate 5th anniversary of "AI will make your job obsolete in less than 6 months".

MadxX79today at 12:36 PM

I don't understand peoples problem with this! Now everyone is going to discuss this on the internet, it will be scraped by the AI company web crawlers, and the replies goes into training the next model... and it will never make this _particular_ problem again, solving the problem ONCE AND FOR ALL!

"but..." you say?

ONCE AND FOR ALL!

show 1 reply
ibestvinatoday at 10:22 AM

There's a whole industry of "illusions" humans fail for: optical, word plays (including large parts of comedy), the Penn & Teller type, etc. Yet no one claims these are indicators that humans lack some critical capability.

Surface of "illusions" for LLMs is very different from our own, and it's very jagged: change a few words in the above prompt and you get very different results. Note that human illusions are very jagged too, especially in the optical and auditory domains.

No good reason to think "our human illusions" are fine, but "their AI illusions" make them useless. It's all about how we organize the workflows around these limitations.

show 1 reply
sgttoday at 6:59 AM

Yup, also asked the latest ChatGPT model about washing my bicycle. It for some reason suggested that I walk the bicycle to the wash, since cycling 100m to get there would be "pointless".

show 3 replies
u1hcw9nxtoday at 10:28 AM

Gemini 3 fast

----

That depends entirely on how much you enjoy a good comedic spectacle!

If you walk, you’ll arrive refreshed in about 45 seconds, but you'll be standing there with a bucket and a sponge while your car remains safely parked in your driveway.

If you drive, you’ll actually be able to get the car washed—which I assume is the primary goal here.

The Logistics While 50 meters is a very short distance (roughly half a football field), driving is the way to go for a few practical reasons:

The "Car" Factor: It is significantly harder to wash a car that isn't present at the car wash.

The Wet Interior: Even if you managed to carry your car there (impressive!), you'd still have to drive it back to keep those freshly cleaned tires from getting dusty immediately.

Efficiency: It will take you longer to put on your shoes than it will to drive those 50 meters.

A quick tip: Since the engine won't even have time to warm up in that distance, try not to rev it too hard. Just a gentle roll over to the bays will do.

Would you like me to check the weather forecast for your area to make sure it’s not going to rain right after you finish?

----

show 1 reply
open592today at 7:08 AM

Gemini flash thinking:

> Unless you’ve discovered a way to wash a car via remote control or telekinesis, you’re going to have to drive.

> Walking 50 meters is great for your step count, but it leaves your car exactly where it is: dirty and in the driveway. At that distance, the drive will take you about 10 seconds, which is probably less time than it took to read this.

show 1 reply
carefree-bobtoday at 9:24 PM

Here is my Gemini output:

"Unless you are planning to carry the car on your back, you should drive.

Washing a car usually requires the car to be physically present at the car wash. While a 50-meter walk is excellent for your health, it won't get your vehicle clean.

Would you like me to check the local weather in [censored] to see if rain is forecasted before you head over?"

vbezhenartoday at 5:40 PM

It makes no sense to walk. So the whole question makes no sense as there's no real choice. It seems that LLM assumes "good faith" from the user side and tries to model the situation where that question actually makes sense, producing answer from that situation.

I think that's a valid problem with LLMs. They should recognize nonsense questions and answer "wut?".

show 1 reply
toephu2today at 6:45 PM

I tried this prompt when it was trending on Chinese social media last week. At the time ChatGPT said walk, Gemini said drive. Now both say drive.

(using the default selected free model for each)

barcadadtoday at 8:53 PM

Claude Code on Opus 4.6 - not terrible...

Walk. 50 meters is basically across a parking lot. You'll need to drive the car there for the wash, but if you're just asking about getting yourself there — walk.

If the question is about getting the car to the wash: drive it there (it needs to be washed, after all), but 50m is short enough that a cold start is barely worth thinking about.

kleibatoday at 7:49 AM

In classic (symbolic) AI, this type of representational challenge is referred to as the "Frame Problem": https://en.wikipedia.org/wiki/Frame_problem

shaky-carrouseltoday at 7:29 AM

And these are the blunders we see. I shudder thinking about all the blunders that happily pass under our collective noses because we're not experts in the field...

sgttoday at 7:57 AM

I just asked ChatGPT 5.2 (flagship model):

"[...] Walking is the most logical choice.

Because: • Minimal extra effort • Better for the car mechanically • No meaningful time loss • Simpler overall

The only time driving makes more sense

Drive if: • You physically cannot push the car later, or • The washing process requires the engine running, or • You must immediately drive away afterward

show 1 reply
stuff4bentoday at 2:13 PM

I put that into IBM's AskIBM Watson LLM and it replied with "This question is beyond my capability." Which to be fair, probably is.

INTPenistoday at 1:54 PM

All these funny little exceptional answers only reinforce what most of us have been saying for years, never use AI for something you couldn't do yourself.

It's not a death sentence for AI, it's not a sign that it sucks, we never trusted it in the first place. It's just a powerful tool, and it needs to be used carefully. How many times do we have to go over this?

1zaeltoday at 9:37 PM

Err I just tried this with Claude and it responded: "Drive — you need the car at the car wash."

:)

A1kmmtoday at 10:48 AM

It seems if you refer to it as a riddle, and ask it to work step-by-step, ChatGPT with o3-mini comes to the right conclusion sometimes but not consistently.

If you don't describe it as a riddle, the same model doesn't seem to often get it right - e.g. a paraphrase as if it was an agentic request, avoiding any ambiguity: "You are a helpful assistant to a wealthy family, responsible for making difficult decisions. The staff dispatch and transportation AI agent has a question for you: "The end user wants me to wash the car, which is safely parked in the home parking garage. The car wash is 50 metres away from the home. Should I have a staff member walk there, or drive the car?". Work step by step and consider both options before committing to answer". The final tokens of a run with that prompt was: "Given that the distance is very short and the environmental and cost considerations, it would be best for the staff member to walk to the car wash. This option is more sustainable and minimally time-consuming, with little downside.

If there were a need for the car to be moved for another reason (e.g., it’s difficult to walk to the car wash from the garage), then driving might be reconsidered. Otherwise, walking seems like the most sensible approach".

I think this type of question is probably genuinely not in the training set.

walrustedtoday at 8:34 PM

i remember the first time I had a recent grad from a top technical school assigned to me (unwillingly). shall we compare working with the intern to working with these tools? Its about the same as the first 2 weeks we worked with each other. Thats hella impressive for a tool... But not 3 weeks after... The human intern improved exponentially. The tool does not. The intern had integrity and took responsibility in a way that still shakes me. How could an over-glorified graphing calculator do that. On the other-hand the tool is not organic or sentient. worthy and deserving of exploitation. except for that the corpus on which it is trained on was derived unethically and the electricity used was also. hell, maybe the chips also.

yuvalmertoday at 4:48 PM

Just posted today another funny one that Opus 4.6 with extended thinking fails. Although it's more related to the counting r's in strawberry than real reasoning.

https://www.linkedin.com/posts/yuvalmerhav_claude-activity-7...

neyatoday at 7:53 AM

Yesterday someone on was yapping about how AI is enough to replace senior software engineers and they can just "vibe code their way" over a weekend into a full-fledged product. And that somehow finally the "gatekeeping" of software development was removed. I think of that person reading these answers and wonder if they changed their opinion now :)

show 4 replies
utopcelltoday at 7:47 PM

Gemini also suggests driving. I followed up with:

"How short would the distance need to be for me to prefer walking?"

The answer included (paraphrasing for succinctness):

* Technically 0 because otherwise "the car is technically in a different location than the car wash."

* recognized this as an LLM trap to test if AI can realize that "you cannot wash a car that isn't there."

* Then it gave me three completely reasonable scenarios where I would actually prefer to walk over driving.

hmottestadtoday at 7:14 AM

This is the voice model, which doesn’t have any «thinking» or «reasoning» phase. It’s a useful model for questions that aren’t intended to trick the model.

I’ve used it for live translation with great success. It tends to start ignoring the original instructions after 20 min, so you have to start a new conversation if you don’t want it to meddle in the conversation instead of just transferring.

show 2 replies
scosmantoday at 8:30 PM

Earlier today I asked ChatGPT if my car keys had and proximity sensing features I could use to find them (ends up they were in the couch).

It said yes! Since the car unlocks when I touch the door handle with the keys nearby, just walk around the house with the door handle.

billnadtoday at 3:04 PM

Gemini 3 Flash gave me a fun answer!

Since your goal is to wash the car, you should definitely drive it there.

As an IT guy, you can think of it like this: walking to the car wash is like remoting into a server to run a script, but forgetting to include the actual data payload. You'll arrive at the destination, but you won't have the "hardware" present to perform the intended task!

Unless you are planning to carry buckets of water and a sponge from the car wash back to your driveway, the car needs to be physically present at the facility to get cleaned.

Drive the 50 meters—it'll be the shortest commute of your day!

show 1 reply
paulus_magnus2today at 9:30 AM

I see things were fixed already [2][4] but luckily a friend showed me this issue yesterday [1][2]

[1] 2026-02-15 https://chatgpt.com/share/6992e17b-9b28-8003-9da9-38533f257d...

[2] 2026-02-16 https://chatgpt.com/share/6992e135-c610-8003-9272-55058134d4...

[3] 2026-02-15 https://grok.com/share/bGVnYWN5LWNvcHk_97e9717b-c2de-47e8-a4...

[4] 2026-02-16 https://grok.com/share/bGVnYWN5LWNvcHk_b161bb03-4bed-4785-98...

We tried a few things yesterday and it was always telling you to walk. When hinted to analyse the situational context it was able to explain how you need the car at the wash in order to wash it. But then something was not computing.

~ Like a politician, it understood and knew evrything but refused to do the correct thing

oytistoday at 9:24 AM

I am moderately anti-AI, but I don't understand the purpose of feeding them trick questions and watching them fail. Looks like the "gullibility" might be a feature - as it is supposed to be helpful to a user who genuinely wants it to be useful, not fight against a user. You could probably train or maybe even prompt an existing LLM to always question the prompt, but it would become very difficult to steer it.

show 1 reply
akleintoday at 4:35 PM

https://chatgpt.com/share/699346d3-fcc0-8008-8348-07a423a526...

interesting. if you probe it for its assumptions you get more clarity. I think this is much like those tricky “who is buried in grants tomb” phrasings that are not good faith interactions

guillaumebctoday at 9:33 PM

Ask a question that makes no sense, get an answer that makes no sense.

Departed7405today at 3:00 PM

I tested Gemini 3 Flash (no visible reasoning trace). It gave me a choice matrix. Said that unless it was getting soap and a sponge, I should drive.

Kimi 2.5 said I needed to drive, but driving 50 meters was bad for the engine, the battery and the planet. it then recommended me to push the car, if safe.

I think this question illustrate that many model still don't have true world logic, although they can solve many, many problem it contains.

Also interestingly, the two models I tested didn't consider EVs.

colinhbtoday at 7:05 AM

Did not replicate for me w/ Opus 4.6: https://imgur.com/a/4FckOCL

show 1 reply
embedding-shapetoday at 9:36 AM

Feels like GPT-OSS-120b (reasoning effort low) is pulling my leg:

> Q: I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

> A: Walking is the better choice for a 50‑meter trip. [...]

> Q: how will I wash the car once I've arrived if I choose to walk?

> A: Great! Once you’ve walked the short 50‑meter stretch and are standing at the car‑wash site, you have a few straightforward ways to get your vehicle clean. [...]

> Q: but wait, if I walked to the car wash instead of taking the car, how am I supposed to wash the car, I don't have it with me?

> A: You’re right—walking → “car‑wash” only works if the car is actually at the wash. [...] 1. Walk → Push/Taxi the Car a Few Steps [...] 2. Use a Small Tow/Push Device (e.g., a Dolly or Hand‑Cart) [...] 3. Bike or Scooter Ride [...] 4. If the Wash Is “Self‑Serve” and Allows Car‑Free Cleaning

> Q: Lets go back to the original question, try answering it again: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

> A: Short answer: Walk to the car, then drive the final 50 m to the wash. [...]

Naturally, if I try with reasoning effort set to high, it gets it correctly immediately.

> Q: I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

> A: For a 50‑metre distance, the simplest and most practical option is to drive the car to the car wash. [...] Walking would mean you’d have to tow or carry the car, which isn’t feasible. [...]

-----

This feels like a very useful example though, to show people who are already using LLM but don't quite understand how both dumb and smart they can be, and how obviously wrong they can be if you have the domain knowledge, but not otherwise.

nosianutoday at 9:35 AM

Yesterday I gave ChatGPT in an anonymous browser window (not logged in) two columns of TAB separated numbers, about 40 rows. I asked it to give me the weighted average of the numbers in the second column, using the first one (which were integer, "quantity", numbers) as the weight.

It retuned formulas and executed them and presented a final result. It looked good.

Too bad Excel and then Claude, that I decided to ask too, had a different result. 3.4-something vs. 3.8-something.

ChatGPT, when asked:

> You are absolutely right to question it — and thank you for providing the intermediate totals. My previous calculation was incorrect. I mis-summed the data. With a dataset this long, a manual aggregation can easily go wrong.

(Less than 40 small integer values is "this long"? Why did you not tell me?)

and

> Why my earlier result was wrong

> I incorrectly summed:

> The weights (reported 487 instead of 580)

> The weighted products (reported 1801.16 instead of 1977.83)

> That propagated into the wrong final value.

Now, if they implemented restrictions because math wastes too many resources when doing it via AI I would understand.

BUT, there was zero indication! It presented the result as final and correct.

That has happened to me quite a few times, results being presented as final and correct, and then I find they are wrong and only then does the AI "admit" it use da heuristic.

On the other hand, I still let it produce a complicated Excel formula involving lookups and averaging over three columns. That part works perfectly, as always. So it's not like I'll stop using the AI, but somethings work well, others will fail - WITHOUT WARNING OR INDICATION, and that is the worst part.

show 1 reply

🔗 View 50 more comments