logoalt Hacker News

boxedemptoday at 4:45 AM2 repliesview on HN

If you gave me an agent that succeeded 50% of tasks I gave it, I could take over the world in a week. Faster if I wasn't so lazy.

I think you're overestimating, or oversimplifying. Maybe both.


Replies

jurgenburgentoday at 7:42 AM

> If you gave me an agent that succeeded 50% of tasks I gave it, I could take over the world in a week. Faster if I wasn't so lazy.

Assuming you used o3, that would cost $58800 per week. That’s an expensive bet for only 50% odds in your favor.

Of course the agents are only that good on benchmarks, in reality your odds are worse. Maybe roulette instead?

raincoletoday at 5:12 AM

No one is claiming an agent can do 50% of arbitrary tasks. It's just 50% of METR's benchmark set.

> I think you're overestimating, or oversimplifying

Yeah if you only read comments on HN but not the actual linked article you will get oversimplified conclusion. Like, duh?

show 2 replies