These faux questions always have a valid interpretation that the asker doesn't admit (for some reason). The model is then castigated for not making an opinionated choice
As a test, explaining away peculiar answers by imagining unlikely outlier scenarios is not the counter you seem to think it is.
For most of them, we’d worry that a human answerer using maximum effort to produce the same outcome was having a stroke.
[dead]
That’s not what’s happening.
The question is revealing that the model has a model of language but not of reality. It knows what words go together, but not real-world concepts.