logoalt Hacker News

3abitonlast Sunday at 12:31 PM0 repliesview on HN

Benchmarking has been already known to be far from a signal of quality for LLMs, but it's the "best" standardized way so far. Few exists like the food truck and the svg test. At the end of the day, there is only 1 way: having your own benchmark for your own application.