What does this show that we didn't know already? LLMs cannot provide accurate answers to questions where data is not included in their training sets. This doesn't appear to have much substance
Unfortunately most people are not aware of this and treat LLM models as this superpowered brain who knows everything and can do everything.
They will happily google it for you and give you the top reddit comment.
This is worse.
Well then it shows that these models are using widely disparate training sets and have high confidence even when they shouldn't.
Questions like "is mouthwash effective" presumably has one solid data source -- medical journals.
LLMs can and will provide inaccurate answers to questions where data is included in their training sets too, that's in the nature of neural networks. It's just less likely that when the data is not in the training set...