logoalt Hacker News

jorviyesterday at 6:15 PM4 repliesview on HN

.. who is running LLMs on CPU instead of GPU or TPU/NPU


Replies

kamranjonyesterday at 6:41 PM

Actually that's a really good question, I hadn't considered that the comparison here is just CPU vs using Metal (CPU+GPU).

To answer the question though - I think this would be used for cases where you are building an app that wants to utilize a small AI model while at the same time having the GPU free to do graphics related things, which I'm guessing is why Apple stuck these into their hardware in the first place.

Here is an interesting comparison between the two from a whisper.cpp thread - ignoring startup times - the CPU+ANE seems about on par with CPU+GPU: https://github.com/ggml-org/whisper.cpp/pull/566#issuecommen...

show 1 reply
fc417fc802today at 10:35 AM

Depends on the size of the model and how much VRAM you have (and how long you're willing to wait).

yjftsjthsd-hyesterday at 8:30 PM

Not all of us own GPUs worth using. Now, among people using macs... Maybe if you had a hardware failure?

thot_experimentyesterday at 6:40 PM

[flagged]

show 3 replies