You should enable MTP now that its available.
LLamaCPP has had some massive updates in the last week or so.
Yes, Qwen 3.6 MoE is hitting like 80-90tk/s on Strix halo. On R9700 I had like 170t/s. It was not possible to keep up. But MoE is circling very often. I switch then to dense model and have 20-30t/s but it is able to solve quite a lot of tasks.
Yes, Qwen 3.6 MoE is hitting like 80-90tk/s on Strix halo. On R9700 I had like 170t/s. It was not possible to keep up. But MoE is circling very often. I switch then to dense model and have 20-30t/s but it is able to solve quite a lot of tasks.