.. who is running LLMs on CPU instead of GPU or TPU/NPU
Depends on the size of the model and how much VRAM you have (and how long you're willing to wait).
Not all of us own GPUs worth using. Now, among people using macs... Maybe if you had a hardware failure?
Actually that's a really good question, I hadn't considered that the comparison here is just CPU vs using Metal (CPU+GPU).
To answer the question though - I think this would be used for cases where you are building an app that wants to utilize a small AI model while at the same time having the GPU free to do graphics related things, which I'm guessing is why Apple stuck these into their hardware in the first place.
Here is an interesting comparison between the two from a whisper.cpp thread - ignoring startup times - the CPU+ANE seems about on par with CPU+GPU: https://github.com/ggml-org/whisper.cpp/pull/566#issuecommen...