logoalt Hacker News

kristopolousyesterday at 8:16 PM5 repliesview on HN

That M versus B is way too subtle. 0.026B is my suggestion


Replies

bigyabaitoday at 12:18 AM

The "M" nomenclature has been around since at least BERT and T5/FLAN. It's valid to use it even if today's LLM devs are more familiar with billion-scale models.

DrammBAtoday at 1:50 AM

I was so confused by many comments in this post but thanks to you I realized that some people are apparently reading it as 26B and that's why their comments make no sense.

HenryNdubuakuyesterday at 8:34 PM

Haha, we were trying to not be hand-wavy too much :)

dymkyesterday at 10:23 PM

[flagged]

show 3 replies