logoalt Hacker News

jaggederestyesterday at 12:25 AM1 replyview on HN

35b A3b runs ~100 tokens a second on the best M5 Max gpu setup.


Replies

ctkhnyesterday at 1:46 PM

I got around 50-60 on my m3 max so 100tps seems very realistic for 2 gens later of chip and double the ram