logoalt Hacker News

rkozik1989yesterday at 5:02 PM1 replyview on HN

You do know we're hemorrhaging and lot of finite resources to play these games badly, right? We're basically at laying on chaise lounge being fed grapes levels of hedonism. Make me a racist meme that copyright infringes multiple IP holders and when you're done play Sim City at competency level of a blind man.


Replies

staticshockyesterday at 6:14 PM

I think the way to see this as the organic process of discovering hard-to-game benchmarks. The loop is:

1. People discover things LLMs can kind of do, but very poorly.

2. Frontier labs sample these discoveries and incorporate them into benchmarks to monitor internally.

3. Next generation model improves on said benchmarks, and the improvements generalize to improvements on loosely correlated real world tasks.