logoalt Hacker News

adrian_btoday at 10:18 AM2 repliesview on HN

While implementing double-precision by double-single may be a solution in some cases, the article fails to mention the overflow/underflow problem, which is critical in scientific/technical computing (a.k.a. HPC).

With the method from the article, the exponent range remains the same as in single precision, instead of being increased to that of double precision.

There are a lot of applications for which such an exponent range would cause far too frequent overflows and underflows. This could be avoided by introducing a lot of carefully-chosen scaling factors in all formulae, but this tedious work would remove the main advantage of floating-point arithmetic, i.e. the reason why computations are not done in fixed-point.

The general solution of this problem is to emulate double-precision with 3 numbers, 2 FP32 for the significand and a third number for the exponent, either a FP number or an integer number, depending on which format is more convenient for a given GPU.

This is possible, but it lowers considerably the achievable ratio between emulated FP64 throughput and hardware FP32 throughput, but the ratio is still better than the vendor-enforced 1:64 ratio.

Nevertheless, for now any small business or individual user can achieve a much better performance per dollar for FP64 throughput by buying Intel Battlemage GPUs, which have a 1:8 FP64/FP32 throughput ratio. This is much better than you can achieve by emulating FP64 on NVIDIA or AMD GPUs.

Intel B580 is a small GPU, so it has only a FP64 throughput about equal to a Ryzen 9 9900X and smaller than a Ryzen 9 9950X. However it provides that throughput at a much lower price. Thus if you start with a PC with a 9900X/9950X, you can double or almost double the FP64 throughput for a low additional price with an Intel GPU. Multiple GPUs will proportionally multiply the throughput.

The sad part is that with the current Intel CEO and with NVIDIA being a shareholder of Intel, it is unclear whether Intel will continue to compete in the GPU market, or they will abandon it, leaving us at the mercy of NVIDIA and AMD, which both refuse to provide products with good FP64 support to small businesses and individual users.


Replies

fp64enjoyertoday at 4:12 PM

Yeah fair enough. The exponent of an FP32 has only 8 bits instead of 11 bits. I'll make an edit to make this explicit.

It's also fairly interesting how Nvidia handles this for the Ozaki scheme: https://docs.nvidia.com/cuda/cublas/#floating-point-emulatio.... They generally need to align all numbers in a matrix row to the maximum exponent (of a number in the row) but depending on scale difference of two numbers this might not be feasible without extending the number of mantissa bits significantly. So they dynamically (Dynamic Mantissa Control) decide if they use Ozaki's scheme or execute on native FP64 hardware. Or they let the user decide on the number of mantissa bits (Fixed Mantissa Control) which is faster but has no longer the guarantees for FP64 precision.

nsajkotoday at 11:15 AM

Yeah, double-word floating-point loses many of the desirable properties of the usual floating-point.