1. Deepseek V4 is still in preview (training is not finished)
2. Qwen is much more demanding and borderline unusable on consumer hardware because it's a dense model. The 27B parameters are active all time for each token. It's not a MoE architecture where a router activates only some of them.
1. Deepseek V4 is still in preview (training is not finished)
2. Qwen is much more demanding and borderline unusable on consumer hardware because it's a dense model. The 27B parameters are active all time for each token. It's not a MoE architecture where a router activates only some of them.
3. Qwen doesn't like quantization at all.