The craziest part is how far MoE has come thanks to Qwen. This beats all those 72B dense models we’v...

syntaxing • yesterday at 9:14 AM • 2 replies • view on HN

The craziest part is how far MoE has come thanks to Qwen. This beats all those 72B dense models we’ve had before and runs faster than 14B model depending on how you off load your VRAM and CPU. That’s insane.

Replies

halJordan • yesterday at 7:57 PM

Qwen isn't directing the forward progress of llms. SOTA llms have been moe since gpt-4. The og 4.

Out of context, but i honestly hate how HN let itself get so far behind the times that this is the sort of inane commentary we get on AI.

➕ show 1 reply

moffkalast • yesterday at 10:11 AM

In retrospect it's actually funny that last year Meta spent so many resources training a dense 405B model that both underperforms compared to models a tenth its size and is impossible to run at a reasonable speed on any hardware in existence.

➕ show 2 replies

alt Hacker News

Replies