Run LLMs on Apple Neural Engine (ANE)

247 points • by behnamoh • yesterday at 3:29 PM • 100 comments • view on HN

Comments

I wonder if Apple ever followed up with this: https://github.com/apple/ml-ane-transformers

They claim their ANE-optimized models achieve "up to 10 times faster and 14 times lower peak memory consumption compared to baseline implementations."

AFAIK, neither MLX nor llama.cpp support ANE. Though llama.cpp started exploring this idea [0].

What's weird is that MLX is made by Apple and yet, they can't support ANE given its closed-source API! [1]

[0]: https://github.com/ggml-org/llama.cpp/issues/10453

[1]: https://github.com/ml-explore/mlx/issues/18#issuecomment-184...

➕ show 6 replies

cowmix • yesterday at 5:00 PM

This sorta reminds me of the lie that was pushed when the Snapdragon X laptops were being released last year. Qualcomm implied the NPU would be used for LLMs — and I bought into the BS without looking into it. I still use a Snapdragon laptop as my daily driver (it's fine) but for running models locally, it's still a joke. Despite Qualcomm's claims about running 13B parameter models, software like LM Studio only runs on CPU with NPU support merely "planned for future updates." XDA The NPU isn't even faster than the CPU for LLMs — it's just more power-efficient for small models, not the big ones people actually want to run. Their GPUs aren't much better for this purpose either. The only hope for LLMs is the Vulkan support on the Snapdragon X — which still is half-baked.

➕ show 3 replies

htk • yesterday at 4:03 PM

I always felt that the neural engine was wasted silicon, they could add more gpu cores in that die space and redirect the neural processing api to the gpu as needed. But I'm no expert, so if anyone here has a different opinion I'd love to learn from it.

➕ show 8 replies

antirez • yesterday at 5:43 PM

The README lacks the most important thing: how many more tokens/sec at the same quantization, compared to llama.cpp / MLX? It is worth to switch default platforms only if there is a major improvement.

➕ show 2 replies

simonw • yesterday at 4:04 PM

I'm trying to figure out what the secret sauce for this is. It depends on https://github.com/apple/coremltools - is that the key trick or are there other important techniques going on here?

➕ show 1 reply

daedrdev • yesterday at 5:39 PM

Apple is a competitive choice simply because their unified memory allows you to get enough ram that would take multiple Gpus to have enough space to run larger models.

➕ show 1 reply

jvstokes • yesterday at 7:03 PM

But coreml utilizes ANE, right? Is there some bottleneck in coreml that requires lower level access?

➕ show 1 reply

gitroom • yesterday at 7:06 PM

Man, Apple's tight grip on ANE is kinda nuts - would love to see the day they let folks get real hands-on. you ever think companies hold stuff back just to keep control, or is there actually some big tech reason for it?

➕ show 1 reply

cwoolfe • yesterday at 5:46 PM

Is there a performance benefit for inference speed on M-series MacBooks, or is the primary task here simply to get inference working on other platforms (like iOS)? If there is a performance benefit, it would be great to see tokens/s of this vs. Ollama.

➕ show 1 reply

kamranjon • yesterday at 4:07 PM

I am curious if anyone knows if the neural cores in apple silicon based machines are at all useful in training? I’ve been using the MLX framework but haven’t seen them mentioned anywhere so I’m just wondering if they are only useful for inference? I know whisper.cpp takes advantage of them in the inference context.

Edit: I changed llama.cpp to whisper.cpp - I didn’t realize that llama.cpp doesn’t have a coreml option like whisper.cpp does.

➕ show 2 replies

randmeerkat • today at 2:11 AM

Yet Siri is still dumber than a doorknob…

neves • today at 3:17 AM

What about Microsoft copilot AI notebooks? Would they run anything quick enough to be useful?

m3kw9 • yesterday at 6:30 PM

Even 1gb model is prohibitively big for phones if you want mass adoption.

➕ show 2 replies

shihabkhanbd • yesterday at 5:30 PM

[dead]

nvk255 • yesterday at 5:42 PM