logoalt Hacker News

sva_today at 4:30 PM0 repliesview on HN

I don't think its mathematically equivalent or even close because the context/logprobs will be very different, since you only produce 1 token per pass. I'd say the token itself has a lot less information than the signal propagating through the residual stream of transformer blocks.