logoalt Hacker News

scrlktoday at 12:09 PM2 repliesview on HN

IME, unquantised -> FP8 is pretty much lossless. What matters more is having an unquantized KV cache - using an FP8 KV cache can result in a significant drop in quality.


Replies

johnnyApplePRNGtoday at 3:28 PM

>unquantised -> FP8 is pretty much lossless

Claude Shannon is rolling in his grave.

show 1 reply
ComputerGurutoday at 2:23 PM

Do infra providers reveal that level of implementation detail?

show 1 reply