Have you tried optimizing for MLX? It seems like a waste to have neural cores and not use them.
I've often wondered why the hype around apple neural core when 99% of software doesn't use them.