Whisper.cpp has a coreml option which gives 3x speed up over cpu only according to the docs:

kamranjon • 05/03/2025 • 3 replies • view on HN

Whisper.cpp has a coreml option which gives 3x speed up over cpu only according to the docs: https://github.com/ggml-org/whisper.cpp?tab=readme-ov-file#c...

Replies

zozbot234 • 05/03/2025

Some outdated information about bare-metal use of the ANE is available from the Whisper.cpp pull req: https://github.com/ggml-org/whisper.cpp/pull/1021 Even more outdated information at: https://github.com/eiln/ane/tree/33a61249d773f8f50c02ab0b9fe... In short, the early (M1/M2) versions of ANE are unlikely to be useful for modern LLM inference due to their seemingly exclusive focus on statically scheduled FP16 and INT8 MADDs.

More extensive information at https://github.com/tinygrad/tinygrad/tree/master/extra/accel... (from the Tinygrad folks, note that this is also similarly outdated) seems to basically confirm the above.

(The jury is still out for M3/M4 which currently have no Asahi support - thus, no current prospects for driving the ANE bare-metal. Note however that the M3/Pro/Max ANE reported performance numbers are quite close to the M2 version, so there may not be a real improvement there either. M3 Ultra and especially the M4 series may be a different story.)

➕ show 1 reply

echelon • 05/03/2025

> coreml option which gives 3x speed up over cpu

Which is still painfully slow. CoreML is not a real ML platform.

jorvi • 05/03/2025

.. who is running LLMs on CPU instead of GPU or TPU/NPU

➕ show 4 replies

alt Hacker News

Replies