You should enable MTP now that its available. LLamaCPP has had some massive updates in the last we...

intothemild • today at 9:59 AM • 1 reply • view on HN

You should enable MTP now that its available.

LLamaCPP has had some massive updates in the last week or so.

Replies

Yes, Qwen 3.6 MoE is hitting like 80-90tk/s on Strix halo. On R9700 I had like 170t/s. It was not possible to keep up. But MoE is circling very often. I switch then to dense model and have 20-30t/s but it is able to solve quite a lot of tasks.

➕ show 2 replies

alt Hacker News

Replies