In retrospect it's actually funny that last year Meta spent so many resources training a dense 405B model that both underperforms compared to models a tenth its size and is impossible to run at a reasonable speed on any hardware in existence.
It's not that clear. Yes, it underperforms in recent benchmarks and usecases (i.e. agentic stuff), but it is still one of the strongest open models in terms of "knowledge". Dense does have that advantage of MoE, even if it's extremely expensive to run inference on.
Check out this great exercise - https://open.substack.com/pub/outsidetext/p/how-does-a-blind...
Strong disagree.
Llama 4's release in 2025 is (deservedly) panned, but Llama 3.1 405b does not deserve that slander.
https://artificialanalysis.ai/#frontier-language-model-intel...
Do not compare 2024 models to the current cutting edge. At the time, Llama 3.1 405b was the very first open source (open weights) model to come close to the closed source cutting edge. It was very very close in performance to GPT-4o and Claude 3.5 Sonnet.
In essence, it was Deepseek R1 before Deepseek R1.