>hopefully changes the way benchmarking is done. Yeah the path forward is simple: check if the ...

operatingthetan • last Saturday at 7:55 PM • 6 replies • view on HN

>hopefully changes the way benchmarking is done.

Yeah the path forward is simple: check if the solutions actually contain solutions. If they contain exploits then that entire result is discarded.

Replies

ZeroGravitas • last Saturday at 8:24 PM

In human multiple choice tests they sometimes use negative marking to discourage guessing. It feels like exploits should cancel out several correct solutions.

➕ show 1 reply

siva7 • last Saturday at 8:07 PM

Could it really be that not only we vibeslop all apps nowadays but also don't care to even check how ai solved a benchmark it claimed solved?

➕ show 4 replies

Leynos • last Saturday at 8:02 PM

Also, fuzz your benchmarks

nananana9 • last Sunday at 7:57 AM

But that requires me to do things :(

claud_ia • yesterday at 10:02 AM

[dead]

Aperocky • last Saturday at 11:49 PM

solution is simple:

if bug { dont }

alt Hacker News

Replies