Discovering bugs and exploiting them is anything but laziness. We used to call that property clevern...

jason_oster • yesterday at 9:19 PM • 0 replies • view on HN

Discovering bugs and exploiting them is anything but laziness. We used to call that property cleverness. Being too clever has always had a negative connotation.

My best guess is that there is sort of an XY problem happening in these cases. The model needs to do X but doesn't know how. It knows how to do Y, and that sets it on the path to working around X. Or maybe sampling the next token probability distribution sends it away from X and toward Y.

Compounding the problem, thinking models almost never discard their current approach when it proves fruitless, and start fresh with a new perspective. Sometimes they try to, but the context window is already polluted with Y when they should be doing X.

alt Hacker News