No one is claiming an agent can do 50% of arbitrary tasks. It's just 50% of METR's benchmark set.
> I think you're overestimating, or oversimplifying
Yeah if you only read comments on HN but not the actual linked article you will get oversimplified conclusion. Like, duh?
Sorry for stating something so obvious. I'll comment less from now on.
> Yeah if you only read comments on HN but not the actual linked article you will get oversimplified conclusion. Like, duh?
Curiously, for most submissions it's the opposite - comments are much more useful and nuanced than the source being discussed.