If you gave me an agent that succeeded 50% of tasks I gave it, I could take over the world in a week. Faster if I wasn't so lazy.
I think you're overestimating, or oversimplifying. Maybe both.
No one is claiming an agent can do 50% of arbitrary tasks. It's just 50% of METR's benchmark set.
> I think you're overestimating, or oversimplifying
Yeah if you only read comments on HN but not the actual linked article you will get oversimplified conclusion. Like, duh?
> If you gave me an agent that succeeded 50% of tasks I gave it, I could take over the world in a week. Faster if I wasn't so lazy.
Assuming you used o3, that would cost $58800 per week. That’s an expensive bet for only 50% odds in your favor.
Of course the agents are only that good on benchmarks, in reality your odds are worse. Maybe roulette instead?