Chinese papers and techniques have been very influential and copied by US labs.
Multi-head Latent Attention (MLA), Multi-Token prediction, MoE architecture are some of the most famous examples.
MoE is from Google (Noam Shazeer)
MTP is from Meta
Another DeepSeek advance that the west are copying is DeepSeek Sparse Attention (DSA)
MoE is from Google (Noam Shazeer)
MTP is from Meta
Another DeepSeek advance that the west are copying is DeepSeek Sparse Attention (DSA)