logoalt Hacker News

reissbakertoday at 10:36 AM1 replyview on HN

Dense is (much) worse in terms of training budget. At inference time, dense is somewhat more intelligent per bit of VRAM, but much slower, so for a given compute budget it's still usually worse in terms of intelligence-per-dollar even ignoring training cost. If you're willing to spend more you're typically better off training and running a larger sparse model rather than training and running a dense one.

Dense is nice for local model users because they only need to serve a single user and VRAM is expensive. For the people training and serving the models, though, dense is really tough to justify. You'll see small dense models released to capitalize on marketing hype from local model fans but that's about it. No one will ever train another big dense model: Llama 3.1 405B was the last of its kind.


Replies

Der_Einzigetoday at 2:42 PM

You want to take bets on this? I'm willing to bet 500USD that an open access dense model of at least 300B is released by some lab within 3 years.