logoalt Hacker News

andrepdyesterday at 12:54 PM1 replyview on HN

Is it an arithmetic average of relative error over the given range? Because if yes then it can be misleading, and potentially a bad meshes to rank alternatives (though the HTML report includes a graph over the input range, which is quite nice, so I'm talking only about the accuracy number).

In the limit, an alternative with 10x better accuracy when x>10^150 and 10x worse in 1<x<10^150 would rank higher :) but more generally, not all inputs are equally important.

Furthermore, floats have underflow to 0 and overflow to infinity, which screw all this up because it can lead to infinite relative error.

Because of this you have some of the funny cases reported elsewhere in this thread :p

I'm not sure what would be a better approach though. Weigh the scores with a normal distribution around 0? Around 1? Exponents around 0?


Replies

pavpanchekhayesterday at 2:04 PM

Documented here but yes it's an average, of something similar to but not exactly the same as relative error: https://herbie.uwplse.org/doc/latest/error.html

It's true that averages can be misleading but we encourage users to think about it instead as a percentage of inputs. In practice the error distribution is very bimodal, the two modes being "basically fine" (a few ulps of error) and "garbage" (usually 0 instead of some actual value)