logoalt Hacker News

mvkelyesterday at 3:08 PM1 replyview on HN

If you read the charter of the eval (or any eval, really), this statement is pretty silly.

The whole point of each eval version is to identify a chunk of challenges that humans do well that AI can't. When AI gets to ~80, you move to the next chunk. When you run out of challenges, you have AGI.


Replies

dwaltripyesterday at 10:03 PM

HN occasionally devolves into “supremely pedantic and nitpicky” mode. Today is one of those days.