IME, unquantised -> FP8 is pretty much lossless. What matters more is having an unquantized KV cache - using an FP8 KV cache can result in a significant drop in quality.
>unquantised -> FP8 is pretty much lossless
Claude Shannon is rolling in his grave.
Do infra providers reveal that level of implementation detail?
>unquantised -> FP8 is pretty much lossless
Claude Shannon is rolling in his grave.