logoalt Hacker News

jorvi05/03/20254 repliesview on HN

.. who is running LLMs on CPU instead of GPU or TPU/NPU


Replies

kamranjon05/03/2025

Actually that's a really good question, I hadn't considered that the comparison here is just CPU vs using Metal (CPU+GPU).

To answer the question though - I think this would be used for cases where you are building an app that wants to utilize a small AI model while at the same time having the GPU free to do graphics related things, which I'm guessing is why Apple stuck these into their hardware in the first place.

Here is an interesting comparison between the two from a whisper.cpp thread - ignoring startup times - the CPU+ANE seems about on par with CPU+GPU: https://github.com/ggml-org/whisper.cpp/pull/566#issuecommen...

show 1 reply
fc417fc80205/04/2025

Depends on the size of the model and how much VRAM you have (and how long you're willing to wait).

yjftsjthsd-h05/03/2025

Not all of us own GPUs worth using. Now, among people using macs... Maybe if you had a hardware failure?

thot_experiment05/03/2025

[flagged]

show 3 replies