> The training and deployment of LongCat-2.0 are built on large-scale clusters of tens of thousands of AI ASIC superpods. Compared to the mature Nvidia GPU ecosystem, the supporting software community is still less developed. We have therefore put significant effort into building a stable, secure, and scalable infrastructure.
This is the real news story. It looks like they may have used Huawei Ascend 910C chips: https://nitter.net/teortaxesTex/status/2071708141037781407#m
huh? who knows what they did, it's not like any of it is audited. it sounds like they started with deepseek v4 pro, and made a bunch of random changes to it, and called the parts of it different things?
[flagged]
If they really managed this from pre-training a 1.6 T parameter model through to post-training without NVIDIA, Dwarkesh Patel got what he wanted.