Anthropic serves quantized versions of their models and you can run q8 locally.
I don't even use Sonnet anymore. Current feels worse than Claude 3.5 couple years ago. They have quantized that much? Switched to GPT 5.5, let's see how long it will stay good.
I don't even use Sonnet anymore. Current feels worse than Claude 3.5 couple years ago. They have quantized that much? Switched to GPT 5.5, let's see how long it will stay good.