logoalt Hacker News

epolanskiyesterday at 10:55 AM1 replyview on HN

Chinese papers and techniques have been very influential and copied by US labs.

Multi-head Latent Attention (MLA), Multi-Token prediction, MoE architecture are some of the most famous examples.


Replies

HarHarVeryFunnyyesterday at 12:48 PM

MoE is from Google (Noam Shazeer)

MTP is from Meta

Another DeepSeek advance that the west are copying is DeepSeek Sparse Attention (DSA)

show 1 reply