Are there evidence that this approach helps maintain "accuracy" performance when quantized...

3abiton • yesterday at 11:48 PM • 1 reply • view on HN

Are there evidence that this approach helps maintain "accuracy" performance when quantized? It sounds a bit like mxfp4 with gpt-oss, which was a confusing model upon release.

Replies

dofm • today at 4:42 PM

I have just been humbled by the Gemma 4 26B QAT build (unsloth's version), which insisted repeatedly that I am wrong in my requirements for some niche wordpress code, which cannot be satisfied.

I am a good WP developer so I kept prodding it and it kept insisting, and it explained with clarity. Turns out it is right and I was wrong, as I would have found out if I'd written the code myself.

I've been using this particular test for days, experimenting in ways to generate and prompt code. The 4-bit quantisation of the pre-QAT model does not catch this error. And nor can the Qwen 3.6 sparse model, which confidently blazed past it and never mentioned it.

(FWIW neither did plain ChatGPT; maybe Codex would)

Anecdotal, but there you go. I am somewhat weirded out by it.

alt Hacker News

Replies