In theory, you could do that and increase the speed at higher temperatures, but it would subtly alte...

Klaus23 • today at 11:10 AM • 1 reply • view on HN

In theory, you could do that and increase the speed at higher temperatures, but it would subtly alter your output based on the draft model preferences. Rather than picking randomly from the main model probabilities, you would have to accept a draft model pick if it is close enough.

As far as I know, this is not used in practice. Currently popular implementations always match the main model output, and the draft model only affects the speed.

Replies

furyofantares • today at 1:46 PM

Here is the line in vLLM's source code that determines if a draft token is accepted:

    accepted = draft_prob > 0 and target_prob / draft_prob >= uniform_prob

It does have a branch that checks only token id equality, which is used if temperature is 0.

➕ show 1 reply

alt Hacker News

Replies