logoalt Hacker News

andaitoday at 5:35 PM1 replyview on HN

>Meanwhile GGUF Q2 and Q3 quantizations on llama.cpp keep getting better

Can you tell me more about this? It's been about a year since I looked into it, but it looked like performance dropped hard below Q4. I'd love to see more about this.

Also what's a good way to run them? I mostly use Ollama which only goes down to Q4. I think it supports HF urls though?


Replies

password4321today at 7:11 PM

This recent discussion is still open and may provide some helpful info:

How to run Qwen 3.5 locally https://news.ycombinator.com/item?id=47292522