logoalt Hacker News

onlyrealcuzzoyesterday at 5:54 PM1 replyview on HN

> Frontier labs have their own variants of MLA

Yes, variants typically 2-3x less good...

Same with speculative decoding... They all do something, but there are known techniques that are substantially better - that just were't known when they started development of the previous models.


Replies

amlutoyesterday at 7:05 PM

How useful is speculative decoding in a batched setting where you get paid for throughput (aggregated across users) and you mostly don’t get paid for latency or single-session throughput?

show 1 reply