logoalt Hacker News

brigadelast Saturday at 8:30 PM0 repliesview on HN

Apple GPUs run fp16 at the same rate as fp32 except on phones, so it is comparable for ML. No one runs inference from fp32 weights.

But the point was about area efficiency