> Unums have proven difficult to build efficient HW implementations for
Valid point, but not quite true anymore. It comes down basically to the latency of count_leading_ones/zeros for decoding the regime, on which everything else depends. But work has been done in the past ~2ish years and we can have posit units with lower latency than FP units of the same width! https://arxiv.org/abs/2603.01615
> IEEE floats have a few warts like any other 1980s standard, but they're a fantastic design.
Hmm I don't know if I would call it a fantastic design x) The "standard" is less a standard than a rough formalisation of a specific FPU design from back in the 1980s, and that design was in turn not really the product of a forward thinking visionary but something to fit the technical and business constraints of that specific piece of hardware.
It has more than a few warts and we can probably do much better nowadays. That's not really a diss on IEEE floats or their designers, it's just a matter of fact (which honestly applies to very many things which are 40 years old, let alone those designed under the constraints of IEEE754).
I'm sure you're much more knowledgeable about this than I am, but that's kind of my point. A month old preprint is the first thing to compare to implementations of a mildly evolved, warty old standard from 40 years ago. I consider that fantastic.
Thanks for the paper though. Looking forward to reading it more closely when I have time.