I've managed to successfully use the ANE to accelerate text-to-speech models on iOS (as an aside - this was much more straightforward than the equivalent on Android).
I did however struggle to run a diffusion model on the ANE - but found that mlx-swift and iPhone GPU sufficed: https://www.duration.ai/blog/generating-images-with-a-2020-i...
At the release of Apple Silicon, there was this repos https://github.com/hollance/neural-engine That reference lot's of discovery and reverse engineer on the ANE.
It does not seem to cover the Neural Accelerators, Apple's equivalent of the Tensor Cores. They only got released on M5 platform. This is probably the most important part to cover.
This Neural Engine seems useless for LLMs. Trapped in the wrong architecture
Is there a non-slop version of this information available?
I am reading up on GPU / ML micro architecture and am looking for some good sources.
If anyone is interested in doing something seriously useful with these neural cores, there is this incredible write up on getting ModernBERT running on them: https://stephenpanaro.com/blog/modernbert-on-apple-neural-en...
Really wish this author would blog more, this piece is incredible and includes the code.
Also ModernBERT is amazing if you haven’t used it before, worth spending time with - have used it myself for classification tasks and it’s very impressive.