logoalt Hacker News

ComputerGuruyesterday at 2:23 PM1 replyview on HN

Do infra providers reveal that level of implementation detail?


Replies

scrlkyesterday at 2:46 PM

I've seen a few articles from providers talking about KV cache quantisation, but it's not something they explicitly point out like they do with weights.

So you could end up paying more for unquantised weights, only to get silently hit with a quantised KV cache...