logoalt Hacker News

oceanplexianyesterday at 4:10 AM0 repliesview on HN

Except coding, where it’s essentially middle of the pack. Which is the only thing that you can build objective benchmarks around. The fact that people on LM arena prefer the output has no relationship to how intelligent the model actually is.