logoalt Hacker News

zozbot234last Saturday at 7:01 PM0 repliesview on HN

Models may be released as unquantized (and even then they are gradually shifting towards lower precisions over time), but most people are going to be running them in a quantized version simply because that gives you the best bang for your buck (you can fit more interesting models on the same hardware). Of course this is strictly about local LLM inference, though one may reasonably assume that the big players are also doing something similar.