I literally cannot believe that people are wasting their time doing this either as a benchmark or...

throwuxiytayq • yesterday at 8:40 PM • 6 replies • view on HN

I literally cannot believe that people are wasting their time doing this either as a benchmark or for fun. After every single language model release, no less.

Replies

sharkjacobs • yesterday at 8:44 PM

It feels like the results stopped being interesting a little while ago but the practice has become part of simonw's brand, and it gives him something to post even when there is nothing interesting to say about another incremental improvement to a model, and so I don't imagine he'll stop.

➕ show 1 reply

cedws • yesterday at 10:22 PM

It’s not a waste of time. As the boundaries of AI are pushed we increasingly struggle to define what intelligence actually is. It becomes more useful to test what models cannot do instead of what they can. Random tasks like the pelican test can show how general the intelligence really is, putting aside the obvious flaw that the labs can optimise for such a simple public benchmark.

recursive • yesterday at 9:57 PM

Fun is so un-productive. Everyone doing things for "fun" is going to be sorry when they look back and realizes they were wasting time having a "good time" rather than optimizing their KPIs.

➕ show 1 reply

bschwindHN • today at 1:55 AM

I do wonder how much energy collectively has been burned on this useless "benchmark".

segmondy • yesterday at 9:29 PM

I can't believe you're such a party pooper. It's exciting times, the silly things do matter!

Marciplan • yesterday at 11:44 PM

I also can't understand how this goes so viral every time on Hackernews lol

alt Hacker News

Replies