> By scaling up model parameters and leveraging substantial computational resources
So, how large is that new model?
While Qwen2.5 was pre-trained on 18 trillion tokens, Qwen3 uses nearly twice that amount, with approximately 36 trillion tokens covering 119 languages and dialects.
https://qwen.ai/blog?id=qwen3
[dead]
While Qwen2.5 was pre-trained on 18 trillion tokens, Qwen3 uses nearly twice that amount, with approximately 36 trillion tokens covering 119 languages and dialects.
https://qwen.ai/blog?id=qwen3