Even the latest CPUs have a 2:1 fp64:fp32 performance ratio - plus the effects of 2x the data size i...

kimixa • yesterday at 8:24 PM • 2 replies • view on HN

Even the latest CPUs have a 2:1 fp64:fp32 performance ratio - plus the effects of 2x the data size in cache and bandwidth use mean you can often get greater than a 2x difference.

If you're in a numeric heavy use case that's a massive difference. It's not some outdated "Ancient Lore" that causes languages that care about performance to default to fp32 :P

Replies

pixelesque • yesterday at 9:30 PM

> Even the latest CPUs have a 2:1 fp64:fp32 performance ratio

Not completely - for basic operations (and ignoring byte size for things like cache hit ratios and memory bandwidth) if you look at (say Agner Fog's optimisation PDFs of instruction latency) the basic SSE/AVX latency for basic add/sub/mult/div (yes, even divides these days), the latency between float and double is almost always the same on the most recent AMD/Intel CPUs (and normally execution ports can do both now).

Where it differs is gather/scatter and some shuffle instructions (larger size to work on), and maths routines like transcendentals - sqrt(), sin(), etc, where the backing algorithms (whether on the processor in some cases or in libm or equivalent) obviously have to do more work (often more iterations of refinement) to calculate the value to greater precision for f64.

alt Hacker News

Replies