ChatGPT Health fails to recognise medical emergencies – study

184 points • by simonebrunozzi • today at 3:44 PM • 137 comments • view on HN

Comments

unstyledcontent • today at 4:06 PM

I have had some incredible medical advice from ChatGPT. It has saved me from small mystery issues, like a rash on my face. Small enough issues that I probably wouldn't have bothered to go into a doctor. BUT it also failed to diagnose me with a medical issue that ended up with a trip to the ER and emergency surgery.

A few weeks before the ER, I was having stomach pain. I went to the doctor with theories from ChatGPT in hand, they checked me for those things and then didn't check me for what ended up being a pretty obvious issue. What's interesting is that I mentioned to the doctor that I used ChatGPT and that the doctor even seemed to value that opinion and did not consider other options (and what it ultimately ended up being was rare but really obvious in retrospect, I think most doctors would have checked for it). I do feel I actually biased the first doctors opinion with my "research."

➕ show 10 replies

WarmWash • today at 4:04 PM

I'd greatly prefer a blind study comparing doctors to AI, rather than a study of doctors feeding AI scenarios and seeing if it matches their predetermined outcome.

Edit: People seem confused here. The study was feeding the AI structured clinical scenarios and seeing it's results. The study was not a live analyses of AI being used in the field to treat patients.

➕ show 9 replies

traceroute66 • today at 8:02 PM

> ChatGPT was trained on the same medical textbooks and research papers that doctors are.

There is a reason why the majority of a doctor's 8 years of training is spent doing the rounds as a junior doctor in hospital wards ....

➕ show 2 replies

iainctduncan • today at 6:42 PM

I think the worse situation is the bad AI summaries from search on health issues.

We had a potential pet poisoning, so was naturally searching for resources. Google had a summary with a "dose of concern" that was an order of magnitude off. Someone could have read that and thought all was fine and had a dead cat.

(BTW cat is fine, turned out to be a false alarm, but public service announcement: cats are alergic to aspirin and peptobismal has aspirin. don't leave demented plastic chewing cats around those bottles, in case you too have a lovely but demented cat)

➕ show 2 replies

nerdjon • today at 4:08 PM

Even though these tools are showing time and time again that they have serious reliability issues, somehow people still think it is a good idea to use them for critical decisions.

Still regularly get wrong information from google’s search AI.

Really starting to wonder if common sense is ever going to come back with new tech, but I fear it is going to require something truly catastrophic to happen.

➕ show 4 replies

spicyusername • today at 4:05 PM

And how often are we reviewing doctors performance?

I suspect many, many doctors also fail to regularly recognize medical emergencies.

➕ show 5 replies

andersmurphy • today at 6:31 PM

Is this unsurprising? It's a fancy markov chain. It's like using a slot machine to diagnose medical conditions. I guess it's a slot machine with really good marketing.

➕ show 2 replies

SoftTalker • today at 3:59 PM

I really only use ChatGPT as a better search engine. But it's often wrong, which has actually ended up costing me money. I don't put a lot of trust in it. Certainly would not try to use it as a doctor.

➕ show 1 reply

rendleflag • today at 7:02 PM

There is a concept of “the burden or knowledge”, in that doctors know the worst thing that could happen, so they recommend the most cautious approach. My son had stomach pain one time when he was young. We took him to urgent care because it was a stomach ache. The doctor there said we needed to go to the ER because it could be an appendicitis. So we trucked to the ER. Close to $2000 later he was diagnosed with idiopathic stomach pain and told to wait it out at home.

So when I read “they then compared the platform’s recommendations with the doctors’ assessments” and see a mismatch, I wonder if it’s because human doctors are overly cautious or that the AI was wrong.

But that all pales in what could be the actual issue. I can’t read the original study, but if it use the USA, it’s understandable why people are turning to AI for Health advice. Healthcare is painfully expensive here. Even a simple trip to the ER (e.g. a $2000 stomach ache) is beyond a lot of people’s ability to spend. That’s just a reality.

With that in mind, the real questions “should I do nothing about my symptoms because I can’t afford healthcare or should I at least ask AI knowing it could be wrong”.

Scoundreller • today at 4:49 PM

Search engines and Dr. Google must be feeling like they’ve missed some major artillery level bullets in this debate.

➕ show 1 reply

hayleox • today at 4:54 PM

I think there is so much potential for AI in healthcare, but we absolutely HAVE to go through the existing ruleset of conducting years of research and trials and approvals before pushing anything out to patients. Move fast and break things is simply not an option in healthcare.

➕ show 2 replies

dipflow • today at 5:07 PM

Adding normal lab results made the suicide crisis banner disappear? That's a weird failure mode. You'd expect unrelated context to be ignored, not to override the risk signal.

ben5 • today at 6:18 PM

I know this isn't always the best answer, but if you need real medical advice - see a doctor. Not the internet.

➕ show 2 replies

WalterBright • today at 5:04 PM

Doctors also miss things.

A friend of mine had an accident. He was taken to the emergency room, but the doctors there thought his injuries were minor. My friend insisted that he was bleeding out internally. They finally checked for that, and it turns out he was minutes from dying.

AI wasn't involved in this case, but it's good to have both AI and a trained doctor in the decision loop.

➕ show 1 reply

francisofascii • today at 6:24 PM

The reality is entering the healthcare system can result in thousands of dollars in bills. People make risk/cost judgement on going to the hospital or not.

nilamo • today at 7:45 PM

Amazing that some people thought a pseudorandom number generator would be good at diagnosing health issues it can't even see.

josefritzishere • today at 3:56 PM

It continues to amaze me how recklessly some people cram AI into spaces where it performs poorly and the consequences include death.

➕ show 3 replies

bsoles • today at 6:17 PM

>> "securely" (my emphasis) connect medical records and wellness apps” to generate health advice and responses.

No, no, no, and no. Are we going to never learn. Sharing medical data with AI tools is going to come back and bite you.

system2 • today at 8:36 PM

Maybe because human interaction, part of a doctor's training, is not documented as internet blog posts, so ChatGPT didn't learn and failed because of it? LLM is just learning from what's written.

jbverschoor • today at 4:46 PM

Sounds exactly like a GP in the Netherlands

nashashmi • today at 4:13 PM

Has anyone tried to suggest sudoku puzzles? In the middle of a hard game I will submit the screenshot to copilot or Gemini and it hallucinates suggestions on next move.

ml_giant • today at 3:54 PM

I’m not surprised.

dyauspitr • today at 4:47 PM

I feel like these need to be run against case histories from already determined cases, not cases were the doctors set up the scenarios, knowing they’re going to be run against ChatGPT.

TZubiri • today at 4:21 PM

How about we allow ChatGPT to be used alongside human MD diagnosis?

Win win right?

➕ show 2 replies

selridge • today at 6:48 PM

I’ve never heard of in my entire life a doctor failing to recognize a medical emergency. /s

One of the things that people need to come to grips with is that like Wikipedia people will use ChatGPT because it is there. And the alternative is to be rich and have a primary care doctor that you can reach out to at a moments notice. Until that is different people will use these web services. It’s the same thing as Wikipedia or WebMD.

➕ show 1 reply

qsera • today at 4:03 PM

[flagged]

varispeed • today at 4:09 PM

I find that 5.2 has been completely dumbed down. Feels more like talking to early versions of Gemini when it quickly enters into loop state.

alt Hacker News

ChatGPT Health fails to recognise medical emergencies – study

Comments