>Meanwhile GGUF Q2 and Q3 quantizations on llama.cpp keep getting better Can you tell me more a...

andai • today at 5:35 PM • 1 reply • view on HN

>Meanwhile GGUF Q2 and Q3 quantizations on llama.cpp keep getting better

Can you tell me more about this? It's been about a year since I looked into it, but it looked like performance dropped hard below Q4. I'd love to see more about this.

Also what's a good way to run them? I mostly use Ollama which only goes down to Q4. I think it supports HF urls though?

Replies

password4321 • today at 7:11 PM

This recent discussion is still open and may provide some helpful info:

How to run Qwen 3.5 locally https://news.ycombinator.com/item?id=47292522

alt Hacker News

Replies