logoalt Hacker News

elitoday at 3:53 AM1 replyview on HN

I actually really like subjective benchmarks, so long as it's a human (ideally me) grading the results. LLM as judge never made much sense.


Replies

charcircuittoday at 4:50 AM

The issue is that you can't do unsupervised learning if you require humans.

show 1 reply