logoalt Hacker News

flux3125today at 8:12 PM0 repliesview on HN

In my experience if you're coding or doing something that requires precision, quantizing the kv cache is definitely not worth it.

If you're just chatting or doing less precise things it's 1000% worth it going down to Q8 or sometimes even Q4