logoalt Hacker News

lucb1eyesterday at 6:10 PM1 replyview on HN

I read the general parts and skimmed the inner workings but I can't figure out what the high-level news is. What does this concretely do that Gemma didn't already do, or what benchmark/tasks did it improve upon?

Until it goes into the inner details (MatFormer, per-layer embeddings, caching...), the only sentence I've found that concretely mentions a new thing is "the first model under 10 billion parameters to reach [an LMArena score over 1300]". So it's supposed to be better than other models until those that use 10GB+ RAM, if I understand that right?


Replies

awestrokeyesterday at 6:28 PM

> What does this concretely do that Gemma didn't already do

Open weights

show 2 replies