Apple GPUs run fp16 at the same rate as fp32 except on phones, so it is comparable for ML. No one runs inference from fp32 weights.
But the point was about area efficiency