logoalt Hacker News

raincoletoday at 5:12 AM2 repliesview on HN

No one is claiming an agent can do 50% of arbitrary tasks. It's just 50% of METR's benchmark set.

> I think you're overestimating, or oversimplifying

Yeah if you only read comments on HN but not the actual linked article you will get oversimplified conclusion. Like, duh?


Replies

TeMPOraLtoday at 8:52 AM

> Yeah if you only read comments on HN but not the actual linked article you will get oversimplified conclusion. Like, duh?

Curiously, for most submissions it's the opposite - comments are much more useful and nuanced than the source being discussed.

boxedemptoday at 5:14 AM

Sorry for stating something so obvious. I'll comment less from now on.