That's because LLM output is "average"; so if you're below, it will obviously look better than what you can do, and vice-versa. It will be interesting to see what happens when current LLM output becomes the bottom, as everyone worse has pulled themselves up to that level.