> Frontier labs have their own variants of MLA Yes, variants typically 2-3x less good... Same...

onlyrealcuzzo • yesterday at 5:54 PM • 1 reply • view on HN

> Frontier labs have their own variants of MLA

Yes, variants typically 2-3x less good...

Same with speculative decoding... They all do something, but there are known techniques that are substantially better - that just were't known when they started development of the previous models.

Replies

amluto • yesterday at 7:05 PM

How useful is speculative decoding in a batched setting where you get paid for throughput (aggregated across users) and you mostly don’t get paid for latency or single-session throughput?

➕ show 1 reply

alt Hacker News

Replies