The metric they're selling this on is intelligence per byte, rather than total intelligence. So, if they used the quantized competing models, the intelligence per byte gap shrinks, because most models hold up very well down to 6-bit quantization, and 4-bit is usually still pretty good, though intelligence definitely tends to fall below 6-bit.
Nonetheless, the Prism Bonsai models are impressive for their size. Where it falls apart is with knowledge. It has good prose/logic for a tiny model, and it's fast even on modest hardware, but it hallucinates a lot. Which makes sense. You can't fit the world's data in a couple of gigabytes. But, as a base model for fine-tuning for use cases where size matters, it's probably a great choice.
unfortunately, there doesn't seem to be a clear way to fine-tune these models yet. excited for when that happens though.