Absolutely. Difference in Q6 vs Q8 is not as immediately noticeable, but if I test by starting from ...

walrus01 • today at 6:55 AM • 1 reply • view on HN

Absolutely. Difference in Q6 vs Q8 is not as immediately noticeable, but if I test by starting from a blank slate context and giving it the same complicated task with Q4 vs a Q8 GGUF file loaded, the difference is apparent. The Q4 will struggle or do 'stupid' things with even simple bash or python. Q4 might not be as noticeable for conversational purely text one on one interaction with an LLM, but when you dig deeper into something that's more esoteric in a training dataset than a chat conversation, absolutely a big gap there.

I think some of the folks in the local llm social media communities are using them for things like company-hosted customer service chat bots, or purely english text writing stuff where Q4 will probably not cause a problem. For more discrete technical work I stick pretty much exclusively to Q8.

Replies

theanonymousone • today at 9:45 AM

Thanks a lot. How about Q8 vs FP16/BF16? Have you checked them too?

➕ show 1 reply

alt Hacker News

Replies