Someone needs to make an actual good benchmark for LLM's that matches real world expectations, ...

jeffybefffy519 • today at 7:40 PM • 2 replies • view on HN

Someone needs to make an actual good benchmark for LLM's that matches real world expectations, theres more to benchmarks than accuracy against a dataset.

Replies

robotpepi • today at 7:45 PM

this reminds me of that joke of someone saying "it's crazy that we have ten different standards for doing this", and then there're 11 standards

➕ show 1 reply

casey2 • today at 8:15 PM

We don't need real world benchmarks, if they were good for real world tasks people would use them We need scientific benchmarks that tease out the nature of intelligence. There are plenty of unsaturated benchmarks. Solving chess using "mostly" language modeling is still an open problem. And beyond that creating a machine that can explain why that move is likely optimal at some depth. AI that can predict the output of another AI.

alt Hacker News

Replies