But coreml utilizes ANE, right? Is there some bottleneck in coreml that requires lower level access?
Memory bandwidth is the main bottleneck. It got better with M3/M4. ANE is really fast in FLOPS but low in memory bandwidth.
Memory bandwidth is the main bottleneck. It got better with M3/M4. ANE is really fast in FLOPS but low in memory bandwidth.