Is there a performance benefit for inference speed on M-series MacBooks, or is the primary task here simply to get inference working on other platforms (like iOS)? If there is a performance benefit, it would be great to see tokens/s of this vs. Ollama.
See my other comment for results.
mlx is much faster, but anemll appeared to use only 500MB of memory compared to the 8GB mlx used.