logoalt Hacker News

oreallytoday at 4:33 PM0 repliesview on HN

To add on context, the experiment you're giving is called a *blind judging test*. Remove the branding and labels, and let judges sample the results and see if they can tell which is ranked correctly.

Some examples are blind wine tasting tests. There are instances whereby some journalists invited renowned/established wine tasters and subjected them to blind wine tasting tests. Turns out the judges couldn't tell which was which. Pretty embarrassing.

It speaks volumes as to how people can accurately judge the value of things. There is research by some network scientist that says you can't generally can't tell the 1% from the top, though you can tell the really bad from the generally good. What OP's experiment might tell us is that the LLM competitive advantage is so small no one can tell which is objectively better.