logoalt Hacker News

fxwintoday at 8:08 AM1 replyview on HN

I'm very skeptical of the advantage they're claiming here. The whitepaper [0] only compares these to full precision models, when the more interesting (and probably more meaningful) comparison would be with other quantized models with a similar memory footprint.

Especially considering that these models seem to more or less just be quantized variants of Qwen3 with custom kernels and other inference optimizations (?) rather than fine tuned or trained from scratch with a new architecture, I am very surprised (or suspicious rather) that they didn't do the obvious comparison with a quantized Qwen3.

Their (to my knowledge) new measure/definition of intelligence seems reasonable, but introducing something like this without thorough benchmarking + model comparison is even more of a red flag to me.

[0] https://github.com/PrismML-Eng/Bonsai-demo/blob/main/1-bit-b...


Replies

riedeltoday at 8:19 AM

Actually IMHO the promise would be beyond standard FP4 quants. I think the goal is more where 1.58 bit (ternary) quants are heading. Having said that it would be interesting to see performance on nonstandard HW.