Really cool, numerical stability can be tricky as errors can accumulate with each operation and suddenly 53 bits of precision is not enough.
Also nice to see an article thats not about AI or politics
I resurrected the Rust Herbie lint (now using dylint) a while ago: https://github.com/urschrei/herbie-lint
Some trivial cases produce... interesting results.
For x in [−1.79e308, 1.79e308]:
Initial Program: 100.0% accurate, 1.0× speedup
def code(x):
return math.sqrt((x + 1.0))
Alternative 1: 67.5% accurate, 5.6× speedup def code(x):
return 1.0How useful is this when you are using numbers in a reasonable range, like 10^-12 to 10^12? Generally I try to scale my numbers to be in this range, whether by picking the right units or scaling constraints and objectives when doing nonlinear programming/ optimization.
Like looking at this example,
https://herbie.uwplse.org/demo/b070b371a661191752fe37ce0321c...
It is claimed that for the function f(x) =sqrt(x+1) -1
Accuracy is increased by from 8.5% accuracy to 98% for alternative 5 Which has f(x) = 0.5x
Ok so x=99, the right answer is sqrt(100) -1 = 9
But 0.5 * 99 = 49.5 which doesn't seem too accurate to me.
I wonder, is there a way to only request reformulations that don’t involve branches? The tool already seems quite nice, but that might be a good feature.
Also, I’m not sure I understand the speedup. Is it latency or throughput?
Working with tensor datatypes in numerical computing, I've been wondering if it would be possible to somehow add an extra dimension to tensors that would serve as the "floating point precision" dimension, instead of a data type. After all why couldn't the bit depth be one of the tensor dims? Maybe it would be possible to implement arbitrary floating point precision that way?
This is an awesome piece of software, one of my favorite little pieces of magic. Finding more precise or more stable floating point formulas is often arduous and requires a lot of familiarity with the behavior of floats. This finds good formulas completely automatically. Super useful for numerical computation.
This is very interesting! I made an interval arithmetic calculator a few months ago [0]. I wonder if you could combine the two techs to make something even more powerful.
I wonder who decided to use a step function for the speed accuracy plot. They must have thought the convex hull would be wrong because you can't really make linear combinations of algorithms (you could, but you'd have to use time not speed to make it linear). So I get why you would use step functions, but the step is the wrong way around. The current plot suggests accuracy doesn't drop if you need higher speeds
Thanks for reminding me to try this for audio approximations. :)
I posted this and it picked up steam over night, so I thought I'd add how I'm using it:
I work on 3D/4D math in F#. As part of the testing strategy for algorithms, I've set up a custom agent with an F# script that instruments Roslyn to find FP and FP-in-loop hotspots across the codebase.
The agent then reasons through the implementation and writes core expressions into an FPCore file next to the existing tests, running several passes, refining the pres based on realistic caller input. This logs Herbie's proposed improvements as output FPCore transformations. The agent then reasons through solutions (which is required, Herbie doesn't know algorithm design intent, see e.g. this for a good case study: https://pavpanchekha.com/blog/herbie-rust.html), and once convinced of a gap, creates additional unit tests and property tests (FsCheck/QuickCheck) to prove impact. Then every once in a while I review a batch to see what's next.
Generally there are multiple types of issues that can be flagged:
a) Expression-level imprecision over realistic input ranges: this is Herbie's core strength. Usually this catches "just copied the textbook formula" instance of naive math. Cancellation, Inf/NaN propagation, etc. The fixes are consistently using fma for accumulation, biggest-factor scaling to prevent Inf, hypot use, etc.
b) Ill-conditioned algorithms. Sometimes the text books lie to you, and the algorithms themselves are unfit for purpose, especially in boundary regions. If there are multiple expressions that have a <60% precision and only a 1 to 2% improvement across seeds, it's a good sign the algo is bad - there's no form that adequately performs on target inputs.
c) Round-off, accumulation errors. This is more a consequence of agent reasoning, but often happens after an apparent "100% -> 100%" pass. The agent is able to, via failing tests, identify parts of an algorithm that can benefit from upgrading the context to e.g. double-word arithmetic for additional precision.