logoalt Hacker News

sosodevyesterday at 8:44 PM0 repliesview on HN

Around 20ish tokens a second with 6-bit quant at very long context lengths on my AMD AI Max 395+

I’m trying to use local models whenever possible. Still need to lean on the frontier models sometimes.