logoalt Hacker News

godelskitoday at 2:13 AM1 replyview on HN

  > I personally would stay away from calling someone, or an LLM, 'stupid' for making this mistake because of several reasons.
I wouldn't. Because there's a difference between calling someone's action stupid and saying that someone is stupid. These are entirely dependent upon the context of the claim. Smart people frequently do stupid stuff. I have a PhD and by some metric that makes me "smart" but you'll also see me do plenty of stupid stuff every single day. Language is fuzzy...

But I think responses like yours are entirely dismissive at what's being attempted to be shown. What's being shown is how easily they are fooled. Another popular example right now being the cup with a sealed top and open bottom (lol "world model"?).

  > There are a lot of 'gotcha' articles
The point isn't about getting some gotcha, it is about a clear and concise example of how these systems fail.

What would not be a clear and concise example is showing something that requires domain subject expertise. That's absolutely useless as an example to everyone that isn't a subject matter expert.

The point of these types of experiments is to make people think "if they're making these types of errors that I can easily tell are foolish then how often are they making errors where I am unable to vet or evaluate the accuracy of its outputs?" This is literally the Gell-Mann Amnesia Effect in action[0].

  > I totally agree with the language ambiguity point. I think that is a feature and not a bug.
So does everybody. But there are limits to natural language and we've been discussing them for quite a long time[1]. There is in fact a reason we invented math and programming languages.

  > Finally, we often really don't know enough but we still need to say something and like gradient descent, an ambiguous statement may take us a step closer to a useful answer.
Was this sentence an illustrative example?

Sometimes I think we don't need to say something. I think we all (myself included) could benefit more by spending a bit longer before we open our mouths, or even not opening them as often. There's times where it is important to speak out but there are also times that it is important to not speak. It is okay to not know things and it is okay to not be an expert on everything.

[0] https://themindcollection.com/gell-mann-amnesia-effect/

[1] https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...


Replies

jason_ostertoday at 3:02 AM

> This is literally the Gell-Mann Amnesia Effect in action.

Absolutely! But there is some nuance, here. The failure mode is for an ambiguous question, which is an open research topic. There is no objectively correct answer to "Should I walk or drive?" given the provided constraints.

Because handling ambiguities is a problem that researchers are actively working on, I have confidence that models will improve on these situations. The improvements may asymptotically approach zero, leading to ever increasingly absurd examples of the failure mode. But that's ok, too. It means the models will increase in accuracy without becoming perfect. (I think I agree with Stephen Wolfram's take on computationally irreducibility [1]. That handling ambiguity is a computationally irreducible problem.)

EWD was right, of course, and you are too for pointing out rigorous languages. But the interactivity with an LLM is different. A programming language cannot ask clarifying questions. It can only produce broken code or throw a compiler error. We prefer the compiler errors because broken code does not work, by definition. (Ignoring the "feature not a bug" gag.)

Most of the current models are fine-tuned to "produce broken code" rather than "compiler error" in these situations. They have the capability of asking clarifying questions, they just tend not to, because the RL schedule doesn't reward it.

[1]: https://writings.stephenwolfram.com/2017/05/a-new-kind-of-sc...