Some outdated information about bare-metal use of the ANE is available from the Whisper.cpp pull req...

zozbot234 • yesterday at 5:35 PM • 1 reply • view on HN

Some outdated information about bare-metal use of the ANE is available from the Whisper.cpp pull req: https://github.com/ggml-org/whisper.cpp/pull/1021 Even more outdated information at: https://github.com/eiln/ane/tree/33a61249d773f8f50c02ab0b9fe... In short, the early (M1/M2) versions of ANE are unlikely to be useful for modern LLM inference due to their seemingly exclusive focus on statically scheduled FP16 and INT8 MADDs.

More extensive information at https://github.com/tinygrad/tinygrad/tree/master/extra/accel... (from the Tinygrad folks, note that this is also similarly outdated) seems to basically confirm the above.

(The jury is still out for M3/M4 which currently have no Asahi support - thus, no current prospects for driving the ANE bare-metal. Note however that the M3/Pro/Max ANE reported performance numbers are quite close to the M2 version, so there may not be a real improvement there either. M3 Ultra and especially the M4 series may be a different story.)

Replies

kamranjon • yesterday at 6:32 PM

I wouldn't say that they aren't useful for inference (there are pretty clear performance improvements even from the asahi effort you linked) - it's just that you have to convert the model ahead of time to be compatible with the ANE which is explained in the readme docs for whisper.cpp that I linked above.

I would say though that this likely excludes them from being useful for training purposes.

➕ show 2 replies

alt Hacker News

Replies