> By scaling up model parameters and leveraging substantial computational resources So, how lar...

arendtio • yesterday at 4:02 PM • 2 replies • view on HN

> By scaling up model parameters and leveraging substantial computational resources

So, how large is that new model?

Replies

While Qwen2.5 was pre-trained on 18 trillion tokens, Qwen3 uses nearly twice that amount, with approximately 36 trillion tokens covering 119 languages and dialects.

https://qwen.ai/blog?id=qwen3

➕ show 1 reply

naji_alazhar • yesterday at 5:29 PM

[dead]

alt Hacker News

Replies