coremltools is the only way to run on ANE, so less of a trick and more of a requirement. The trick...

smpanaro • 05/03/2025 • 1 reply • view on HN

coremltools is the only way to run on ANE, so less of a trick and more of a requirement.

The tricks are more around optimizing for the hardware capabilities/constraints. For instance:

- conv2d is faster than linear (see Apple's post [0]) so you rewrite the model for that (example from the repo [1])

- inputs/outputs are static shapes, so KV cache requires some creativity (I wrote about that here [2])

- compute is float16 (not bfloat16) so occasionally you have to avoid activation overflows

thadk • 05/04/2025

➕ show 1 reply

alt Hacker News