logoalt Hacker News

regularfryyesterday at 3:01 PM1 replyview on HN

They're claiming 20+tps inference on a macbook with the unsloth quant.


Replies

embedding-shapeyesterday at 5:48 PM

Yeah, I'm guessing the Mac users still aren't very fond of sharing the time the prefill takes, still. They usually only share the tok/s output, never the input.