logoalt Hacker News

dvdkontoday at 12:23 PM1 replyview on HN

As far as I know, speculative decoding still verifies that the proposed tokens are what the "big" model would generate, it just uses the guesses to make that process faster. Setting the probability threshold too low then shouldn't affect correctness, just speed (time will be wasted verifying bad guesses).


Replies

lreevestoday at 12:26 PM

But won't setting it to accept 100% of the proposed tokens will skip the verification?