logoalt Hacker News

snemvaltstoday at 2:43 PM0 repliesview on HN

What about other benchmarks? Benchmarks where the contents are freely available have become useless for evaluating models.