logoalt Hacker News

bityardlast Monday at 9:15 PM1 replyview on HN

Halving the precision of the weights is not a free lunch...


Replies

Catloafdevlast Monday at 11:30 PM

Q8 is virtually lossless. The quantization is much more noticeable around Q4 and below. FP16->Q8 on consumer hardware is 2x the speed at ~99.99% the quality.

show 1 reply