I love the total lack of humility on that site. "What if the METR study turns out not to...

MadxX79 • today at 12:57 PM • 0 replies • view on HN

I love the total lack of humility on that site. "What if the METR study turns out not to capture anything relevant? We just add a constant gap to be conservative!". But I guess these guys aren't really scientist, so it's probably a lot to ask that they relate critically to what they are doing and be honest about the limitations of their methods.

What if it turns out that the more you scale the more your LLM resembles a lobotomized human. It looks like it goes really well in the beginning, but you are just never going to get to Einstein. How does that affect everything?

What if it turned out that those AI companies were maybe having a whole bunch of humans solving the problems that are currently just below the 50% reliability threshold they set, and do fine tuning with those solutions. That will make their models perform better on the benchmark, but it's just training for the test... will the constant gap be a good approximation then?

alt Hacker News