Also, you need to see an analysis of the incorrect calls. The goal of a human Dr is not to get the highest accuracy, it's to limit total harm to the patient. There can be cases where the odds favor picking X (but it may not be by that much), but the safe thing to do is to rule out some other option first, or start a safe treatment that covers several other possible options.
Simply getting the "high score" on this evaluation is not necessarily good medical treatment.
Yeah 100% this. We've all used AI. It's obvious that it can sometimes outperform humans in a "did it get the right answer" benchmark while being wildly worse overall because of worse failure modes.
I bet the AI's incorrect answers are less "I don't know, let's get a second opinion" and more "you're perfectly fine, 0% chance this is cancer".
Exactly this. Most diagnosis isn’t about pinpointing the underlying exact cause, it’s ruling out the really bad stuff and minimising harm. Differential diagnosis just isn’t real world medicine.