Around 20ish tokens a second with 6-bit quant at very long context lengths on my AMD AI Max 395+ I...

sosodev • yesterday at 8:44 PM • 0 replies • view on HN

Around 20ish tokens a second with 6-bit quant at very long context lengths on my AMD AI Max 395+

I’m trying to use local models whenever possible. Still need to lean on the frontier models sometimes.

alt Hacker News