logoalt Hacker News

Majromaxlast Wednesday at 1:30 AM2 repliesview on HN

Are Macs/etc compute bound with their 'it fits in unified memory' language models? Certainly by the time you're streaming weights from SSD you must be back in a bandwidth-bound regime.


Replies

dd8601fnlast Wednesday at 3:23 PM

From what I understood, if we’re talking a single user on a mac (not batching) you’re rarely compute bound in the first place. More rows per pass is nearly free that way when cores were sitting idle anyway.

If that’s wrong I would certainly appreciate being corrected, though. But if it’s right, a 2.9x speed-up after rejected tokens, nearly for free, sounds amazing.

nodjalast Thursday at 12:15 AM

That will depend on the model, but they'll hit compute limits before a typical GPU in almost all cases. Macs will still benefit a speedup from this, just not one as big as the one reported.