It's quite logical that they cheat (and also other companies). During evaluation, benchmarks ar...

rvnx • yesterday at 10:45 PM • 3 replies • view on HN

It's quite logical that they cheat (and also other companies). During evaluation, benchmarks are sending their request to the backend of these companies. All these companies have to do, is to log these requests and "fix" them for the next model release.

Replies

buddhistdude • yesterday at 11:05 PM

I think what you are talking about is a different kind of cheating than the parent comment

varenc • today at 1:15 AM

That's a different and much more boring type of cheating. The interesting part of the METR report is that the model is hacking the evaluation environment, not that some AI model provider is hardcoding answers to known evaluation questions. (which wouldn't require the model to cheat/hack)

FromTheFirstIn • yesterday at 11:05 PM

Cheating is always logical for the cheater unless they’re discovered and held to account. I’m not sure what your comment is pointing out besides that it’s possible, but worth saying: just because you can cheat and would benefit from cheating doesn’t mean you’re not culpable for cheating.

alt Hacker News

Replies