While Qwen2.5 was pre-trained on 18 trillion tokens, Qwen3 uses nearly twice that amount, with appro...

marcd35 • yesterday at 6:13 PM • 1 reply • view on HN

While Qwen2.5 was pre-trained on 18 trillion tokens, Qwen3 uses nearly twice that amount, with approximately 36 trillion tokens covering 119 languages and dialects.

https://qwen.ai/blog?id=qwen3

Replies

arendtio • yesterday at 8:03 PM

Thanks for the info, but I don't think it answers the question. I mean, you could train a 20-node network on 36 trillion tokens. Wouldn't make much sense, but you could. So I was asking more about the number of nodes / parameters or GB of file size.

In addition, there seem to be many different versions of Qwen3. E.g. here the list from ollama library: https://ollama.com/library/qwen3/tags

➕ show 1 reply

alt Hacker News

Replies