Whisper.cpp has a coreml option which gives 3x speed up over cpu only according to the docs: https://github.com/ggml-org/whisper.cpp?tab=readme-ov-file#c...
> coreml option which gives 3x speed up over cpu
Which is still painfully slow. CoreML is not a real ML platform.
Some outdated information about bare-metal use of the ANE is available from the Whisper.cpp pull req: https://github.com/ggml-org/whisper.cpp/pull/1021 Even more outdated information at: https://github.com/eiln/ane/tree/33a61249d773f8f50c02ab0b9fe... In short, the early (M1/M2) versions of ANE are unlikely to be useful for modern LLM inference due to their seemingly exclusive focus on statically scheduled FP16 and INT8 MADDs.
More extensive information at https://github.com/tinygrad/tinygrad/tree/master/extra/accel... (from the Tinygrad folks, note that this is also similarly outdated) seems to basically confirm the above.
(The jury is still out for M3/M4 which currently have no Asahi support - thus, no current prospects for driving the ANE bare-metal. Note however that the M3/Pro/Max ANE reported performance numbers are quite close to the M2 version, so there may not be a real improvement there either. M3 Ultra and especially the M4 series may be a different story.)