Seems impressive, i believe better architectures are really the path forward, i don't think you...

Jgoauh • yesterday at 8:08 AM • 2 replies • view on HN

Seems impressive, i believe better architectures are really the path forward, i don't think you need more than 100B params taking this model and what GPT OSS 120B can acchieve

Replies

CuriouslyC • yesterday at 12:36 PM

We definitely need more parameters, low param models are hallucination machines, though low actives is probably fine assuming the routing is good.

NitpickLawyer • yesterday at 8:24 AM

New arch seems cool, and it's amazing that we have these published in the open.

That being said, qwen models are extremely overfit. They can do some things well, but they are very limited in generalisation, compared to closed models. I don't know if it's simply scale, or training recipes, or regimes. But if you test it ood the models utterly fail to deliver, where the closed models still provide value.

➕ show 1 reply

alt Hacker News

Replies