>Meanwhile GGUF Q2 and Q3 quantizations on llama.cpp keep getting better
Can you tell me more about this? It's been about a year since I looked into it, but it looked like performance dropped hard below Q4. I'd love to see more about this.
Also what's a good way to run them? I mostly use Ollama which only goes down to Q4. I think it supports HF urls though?
This recent discussion is still open and may provide some helpful info:
How to run Qwen 3.5 locally https://news.ycombinator.com/item?id=47292522