How useful is speculative decoding in a batched setting where you get paid for throughput (aggregate...

amluto • yesterday at 7:05 PM • 1 reply • view on HN

How useful is speculative decoding in a batched setting where you get paid for throughput (aggregated across users) and you mostly don’t get paid for latency or single-session throughput?

Replies

onlyrealcuzzo • yesterday at 7:07 PM

It's useful at the local level, where there will be SOTA models developed...

➕ show 1 reply

alt Hacker News

Replies