Hmm. Let's just say if this is true, that this is actually better with such a much lower total parameter count, it's the greatest accomplishment in over a year of LLM development. With the backdrop of bechmaxxing in 2025, I'll believe in this when I see the results on closed benchmarks and SimpleBench. My concern is this might be a hallucination machine.
Might be. FWIW, my experience with the Qwen3 30b model basically took ChatGPT out of rotation for me. It's not hard for me to imagine an 80b model pushing that further, especially with thinking enabled.
I recommend playing with the free hosted models to draw your own conclusions: https://chat.qwen.ai/
In my testing this model is quite bad and far behind 235b a22b. https://fiction.live/stories/Fiction-liveBench-Sept-12-2025/...