logoalt Hacker News

landl0rdyesterday at 4:03 AM0 repliesview on HN

What they are probably doing is speculative decoding, given they've mentioned identical distribution at 2.5x speed. That's roughly in the range you'd expect for that to achieve; 10x is not.

It's also absolute highway robbery (or at least overly-aggressive price discrimination) to charge 6x for speculative decoding, by the way. It is not that expensive and (under certain conditions, usually very cheap drafter and high acceptance rate) actually decrease total cost. In any case, it's unlikely to be even a 2x cost increase, let alone 6x.