logoalt Hacker News

ACCount37yesterday at 1:42 PM0 repliesview on HN

Which "tests", exactly? Do tell. Tests where LLMs don't beat a human baseline is genuinely hard to come by nowadays.