logoalt Hacker News

danielhanchenyesterday at 4:06 PM6 repliesview on HN

For those interested, made some Dynamic Unsloth GGUFs for local deployment at https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF and made a guide on using Claude Code / Codex locally: https://unsloth.ai/docs/models/qwen3-coder-next


Replies

genpfaultyesterday at 6:17 PM

Nice! Getting ~39 tok/s @ ~60% GPU util. (~170W out of 303W per nvtop).

System info:

    $ ./llama-server --version
    ggml_vulkan: Found 1 Vulkan devices:
    ggml_vulkan: 0 = Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
    version: 7897 (3dd95914d)
    built with GNU 11.4.0 for Linux x86_64
llama.cpp command-line:

    $ ./llama-server --host 0.0.0.0 --port 2000 --no-warmup \
    -hf unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_XL \
    --jinja --temp 1.0 --top-p 0.95 --min-p 0.01 --top-k 40 --fit on \
    --ctx-size 32768
show 2 replies
bityardyesterday at 7:09 PM

Hi Daniel, I've been using some of your models on my Framework Desktop at home. Thanks for all that you do.

Asking from a place of pure ignorance here, because I don't see the answer on HF or in your docs: Why would I (or anyone) want to run this instead of Qwen3's own GGUFs?

show 1 reply
MrDrMcCoyyesterday at 10:38 PM

Still hoping IQuest-Coder gets the same treatment :)

ranger_dangeryesterday at 4:27 PM

What is the difference between the UD and non-UD files?

show 1 reply
binsquareyesterday at 4:38 PM

How did you do it so fast?

Great work as always btw!

show 1 reply
CamperBob2yesterday at 9:08 PM

Good results with your Q8_0 version on 96GB RTX 6000 Blackwell. It one-shotted the Flappy Bird game and also wrote a good Wordle clone in four shots, all at over 60 tps. Thanks!

Is your Q8_0 file the same as the one hosted directly on the Qwen GGUF page?

show 1 reply