> the latency between float and double is almost always the same on the most recent AMD/Intel CPUs
If you are developing for ARM, some systems have hardware support for FP32 but use software emulation for FP64, with noticeable performance difference.