logoalt Hacker News

forrestthewoodsyesterday at 7:39 PM1 replyview on HN

At the end of the day “feel” is what people rely on to pick which tool they use.

I’d feel unscientific and broken? Sure maybe why not.

But at the end of the day I’m going to choose what I see with my own two eyes over a number in a table.

Benchmarks are a sometimes useful to. But we are in prime Goodharts Law Territory.


Replies

AstroBenyesterday at 7:44 PM

yeah, to be honest it probably doesn't matter too much. I think the major models are very close in capabilities

show 1 reply