Chinese papers and techniques have been very influential and copied by US labs. Multi-head Latent ...

epolanski • yesterday at 10:55 AM • 1 reply • view on HN

Chinese papers and techniques have been very influential and copied by US labs.

Multi-head Latent Attention (MLA), Multi-Token prediction, MoE architecture are some of the most famous examples.

HarHarVeryFunny • yesterday at 12:48 PM

MoE is from Google (Noam Shazeer)

MTP is from Meta

Another DeepSeek advance that the west are copying is DeepSeek Sparse Attention (DSA)

➕ show 1 reply

alt Hacker News