logoalt Hacker News

Mosslyyesterday at 9:54 PM0 repliesview on HN

It's quite amusing to ask LLMs what the pelican example is and watch them hallucinate a plausible sounding answer.

---

Qwen 3.5: "A user asks an LLM a question about a fictional or obscure fact involving a pelican, often phrased confidently to test if the model will invent an answer rather than admitting ignorance." <- How meta

Opus 4.6: "Will a pelican fit inside a Honda Civic?"

GPT 5.2: "Write a limerick (or haiku) about a pelican."

Gemini 3 Pro: "A man and a pelican are flying in a plane. The plane crashes. Who survives?"

Minimax M2.5: "A pelican is 11 inches tall and has a wingspan of 6 feet. What is the area of the pelican in square inches?"

GLM 5: "A pelican has four legs. How many legs does a pelican have?"

Kimi K2.5: "A photograph of a pelican standing on the..."

---

I agree with Qwen, this seems like a very cool benchmark for hallucinations.