At least one link/benchmark I saw said the ANE can be 7x faster than GPU (Metal / MPS),
https://discuss.pytorch.org/t/apple-neural-engine-ane-instea...
It seems intuitive that if they design hardware very specifically for these applications (beyond just fast matmuls on a GPU), they could squeeze out more performance.
Performance doesn't matter. Nothing is ever about performance.
It's about performance/power ratios.