Around 20ish tokens a second with 6-bit quant at very long context lengths on my AMD AI Max 395+
I’m trying to use local models whenever possible. Still need to lean on the frontier models sometimes.