Qwen3.6-35B-A3B-UD-Q4_K_M runs at about 11 tokens/second on my poor old 1060. Absolutely nuts how far we've come
I tried running any model on my 1070 and it instantly crashes my old tower, probably time to get off windows and run linux on it.
Mind sharing your llama.cpp settings for that?
I tried running any model on my 1070 and it instantly crashes my old tower, probably time to get off windows and run linux on it.