logoalt Hacker News

monster_trucklast Friday at 11:25 PM0 repliesview on HN

Even if it wasn't outright beneficial for decoding by itself, it would still allow you to connect a second machine running a smaller, more heavily quantized version of the model for speculative decoding which can net you >4x without quality loss