logoalt Hacker News

naaskingtoday at 3:21 PM1 replyview on HN

It's a two year old base model that's only 3B parameters, trained on only 100B tokens. It's still a research project at this point.


Replies

gardnrtoday at 4:02 PM

The new model they just released has impressive benchmark results: https://huggingface.co/microsoft/bitnet-b1.58-2B-4T

Except on GSM8K and math...

show 2 replies