It’s a 6bn model. Totally different class. I’m more excited about “frontier small language models” tbh.
It's a 119B model, 6B active.
That's still 3-10x smaller than the other models in that graph though (400B, 1T, 1.5T).
It's a 119B model, 6B active.
That's still 3-10x smaller than the other models in that graph though (400B, 1T, 1.5T).