I like the technique described here around distillation to recover from quantization, but I don'...

BoorishBears • today at 6:14 PM • 0 replies • view on HN

I like the technique described here around distillation to recover from quantization, but I don't understand why we keep performing lossy compression on LLMs then using benchmarks that were nearly saturated before post-training to measure the effects.

You could erase the gains from literally half the compute going into some of these recent models and barely make a dent in MMLU-Pro and GPQA-D.

alt Hacker News