logoalt Hacker News

Catloafdevlast Monday at 6:01 PM2 repliesview on HN

Nobody runs unquantized, there's literally no reason to. Q8 would be the largest anyone actually runs on consumer hardware for inference.


Replies

bityardlast Monday at 9:15 PM

Halving the precision of the weights is not a free lunch...

show 1 reply