logoalt Hacker News

qnleightoday at 8:43 AM2 repliesview on HN

As far as I understand, this is exactly how ELO scores work. If a more capable show up and starts beating all the other models, it literally takes ELO points from everyone else.

https://en.wikipedia.org/wiki/Elo_rating_system


Replies

TekMoltoday at 10:47 AM

    If a more capable show up and starts
    beating all the other models
There is an instance of this in the chart. In 2025-06-24 when Gemini-2.5-pro shows up. As you can see, the ELO of the others do not drop.
harperleetoday at 9:21 AM

Depends on the test design; is an agent competing against other agent in a given match, or against a test? Plus! Does the test's ELO fluctuate?