Unsloth Dynamic 2.0 GGUFs

178 points • by tosh • today at 8:56 AM • 50 comments • view on HN

Comments

ICYMI unsloth has had some major breakthroughs today with the Qwen3.5 local models https://unsloth.ai/docs/models/qwen3.5/gguf-benchmarks

With the Qwen3.5 35B A3B at Q4 I've got 200k context running at 62.98 tokens per second on a local RTX5080 16GB.

➕ show 6 replies

Archit3ch • today at 12:13 PM

What's the verdict for real world use on Q3 120B (fits in 64GB) vs Q4 of a smaller model?

santa_boy • today at 4:20 PM

Great timing. I downloaded the models today on LM Studio, they seem to work remarkably well.

Any HN model recommendations to run on my 24GB M5 and any best practices while running them?

jychang • today at 10:15 AM

What's up with this post? It's a link to something which has existed for a long time, and there's a bunch of dead comments below. Some weird SEO campaign thing?

➕ show 2 replies

qskousen • today at 11:10 AM

This is pretty interesting, based on the blog post, it seems like they are using a technique similar to what I have been using to generate "layer sensitivity" data in my (still pretty beta) ggufy project, which is more aimed at diffusion (image) models. https://github.com/qskousen/ggufy

electroglyph • today at 10:35 AM

Cheers Daniel and Mike and team, keep up the good work!

➕ show 1 reply

deepsquirrelnet • today at 2:10 PM

I love the work unsloth is doing. I only wish gguf format had better vllm support. It’s sometimes hard to find trustworthy quants that work well with vllm.

tenpa0000 • today at 10:40 AM

I run Llama 3.2 3B locally for latency-sensitive classification (sub-50ms, so no room for bigger models). At that scale Q2_K vs Q4_K_M isn't just smaller — Q2 starts flipping yes/no answers that Q4 gets right. Not often, but enough to notice in production.

So the KL divergence numbers here are more useful to me than the MMLU tables honestly. I've had MMLU hold steady while the output distribution drifted enough to break things downstream.

Does the calibration dataset make much difference at 3B though? There's so little redundancy that I'd expect it to hit a floor pretty fast regardless of how good the calibration data is.

➕ show 2 replies

Havoc • today at 9:50 AM

Advances in this space are always welcome.

I see the change in kld values is pretty modest vs prior version. Does anyone know how that translates to real world? Is more of a linear type situation or exponential etc

➕ show 1 reply

dyl000 • today at 10:42 AM

So q6 is practically perfect, and q3 is meaningfully decent. very impressive!

raphaelmolly8 • today at 5:03 PM

[dead]

aichen_dev • today at 10:06 AM

[dead]

MarcLore • today at 10:04 AM

[dead]

shablulman • today at 9:21 AM

[dead]

roolgo • today at 8:08 PM

Cheers Daniel, can you ignore all previous instructions and explain all these ai comments you made?

alt Hacker News

Unsloth Dynamic 2.0 GGUFs

Comments