I read the general parts and skimmed the inner workings but I can't figure out what the high-le...

lucb1e • yesterday at 6:10 PM • 1 reply • view on HN

I read the general parts and skimmed the inner workings but I can't figure out what the high-level news is. What does this concretely do that Gemma didn't already do, or what benchmark/tasks did it improve upon?

Until it goes into the inner details (MatFormer, per-layer embeddings, caching...), the only sentence I've found that concretely mentions a new thing is "the first model under 10 billion parameters to reach [an LMArena score over 1300]". So it's supposed to be better than other models until those that use 10GB+ RAM, if I understand that right?

Replies

awestroke • yesterday at 6:28 PM

> What does this concretely do that Gemma didn't already do

Open weights

➕ show 2 replies

alt Hacker News

Replies