logoalt Hacker News

PlatoIsADiseasetoday at 2:57 PM3 repliesview on HN

You might want to clarify that this is more of a "Look it technically works"

Not a "I actually use this"

The difference between waiting 20 minutes to answer the prompt '1+1='

and actually using it for something useful is massive here. I wonder where this idea of running AI on CPU comes from. Was it Apple astroturfing? Was it Apple fanboys? I don't see people wasting time on non-Apple CPUs. (Although, I did do this for a 7B model)


Replies

mholmtoday at 3:54 PM

The reason Macs get recommended is the unified memory, which is usable as VRAM for the GPU. People are similarly using the AMD Strix Halo for AI which also has a similar memory architecture. Time to first token for something like '1+1=' would be seconds, and then you'd be getting ~20 tokens per second, which is absolutely plenty fast for regular use. Token/s slows down at the higher end of context, but it's absolutely still practical for a lot of usecases. Though I agree that agentic coding, especially over large projects, would likely get too slow to be practical.

show 2 replies
simonwtoday at 5:25 PM

MLX uses the GPU.

That said, I wouldn't necessarily recommend spending $20,000 on a pair of Mac Studios to run models like this. The performance won't be nearly as good as the server-class GPU hardware that hosted models run on.

tucnaktoday at 3:32 PM

Mac studio way is not "AI on CPU," as M2/M4 are complex SoC, that includes a GPU with unified memory access.

show 1 reply