New arch seems cool, and it's amazing that we have these published in the open. That being sa...

NitpickLawyer • yesterday at 8:24 AM • 1 reply • view on HN

New arch seems cool, and it's amazing that we have these published in the open.

That being said, qwen models are extremely overfit. They can do some things well, but they are very limited in generalisation, compared to closed models. I don't know if it's simply scale, or training recipes, or regimes. But if you test it ood the models utterly fail to deliver, where the closed models still provide value.

Replies

vintermann • yesterday at 8:28 AM

Could you give some practical examples? I don't know what Qwen's 36T-token training set is like, so I don't know what it's overfitting to...

➕ show 1 reply

alt Hacker News

Replies