logoalt Hacker News

robot-wranglertoday at 4:20 AM0 repliesview on HN

Oh this is a very damning paper. Using simple languages from their definitions alone is a great proxy for studying truly out-of-distribution reasoning. Also just for following simple rules/instructions correctly, because a simple enough language is practically just a grammar. This paper is terrible for anyone who wants to make the case that models can do those things well.

To the extent today's AI can reason, add this to the pile of evidence that you definitely need a harness. Counter to what you hear.. that seems true for SOTA and frontier, not just toy models. Lots of people were saying many years ago someone should test exactly this, because it's obvious. Someone at megacorp probably did try and decided not to publish because they thought it was bad optics.