logoalt Hacker News

gf000yesterday at 9:06 AM1 replyview on HN

> the fact that this goes viral just goes to show how rare it is

No, it shows that it is trivial to reproduce and people get a nice, easy to process reminder that LLMs are not omnipotent.

Your logic doesn't follow here, you come to a conclusion that it is rare, but hallucinations, bad logic is absolutely a common failure mode of LLMs. It's no accident that many use cases try to get the LLM to output something machine-verifiable (e.g. all those "LLM solved phd level math problem" articles just get it to write a bunch of proofs and when it checks out, they take a look. So it's more of a "statistical answer generator" that may contain a correct solution next to a bunch of bullshit replies - and one should be aware of that)


Replies

TZubiriyesterday at 9:07 PM

If I tasked you to find a novel hallucination in a leading LLM, how long would it take you? I used to be able to find these and run into them often, but right now I can't generate new failure modes, I just have my list of known failures and run into them organically once every couple of weeks.

I don't think anyone at this stage believes that they don't make mistakes, but we prefer to use them for the times they are useful.

It can do very difficult things, and fail at very basic things. If you look at either of those and try to extrapolate, you can generate a hot take that it's super smart, or super dumb, sure. But it's a reductionist take that fails to see the bigger picture either way.

show 1 reply