logoalt Hacker News

DanielHalltoday at 10:58 AM0 repliesview on HN

These small models, having been fine-tuned for the test, achieve frighteningly high scores, yet perform abysmally in real-world scenarios.