I was trying to figure the same thing out a couple months ago, and didn't find much information...

rz2k • 05/03/2025 • 0 replies • view on HN

I was trying to figure the same thing out a couple months ago, and didn't find much information.

It looked like even ANEMLL provides limited low level access to specifically direct processing toward the Apple Neural Engine, because Core ML still acts as the orchestrator. Instead, flags during conversion of a PyTorch or TensorFlow model can specify ANE-optimized operations, quantization, and parameters hinting at compute targets or optimization strategies. For example `MLModelConfiguration.computeUnits = .cpuAndNeuralEngine` during conversion would disfavor the GPU cores.

Anyway, I didn't actually experiment with this, but at the time I thought maybe there could be a strategy of creating a speculative execution framework, with a small ANE-compatible model to act as the draft model paired with a larger target model running on GPU cores. The idea being that the ANE's low latency and high efficiency could accelerate results.

However, I would be interested to hear the perspective of people who actually know something about the subject.

alt Hacker News