Block auto regressive generation can give you big speedups. Consider that outputting two tokens at...

sdenton4 • last Tuesday at 6:20 PM • 1 reply • view on HN

Block auto regressive generation can give you big speedups.

Consider that outputting two tokens at a time will be a (2-epsilon)x speedup over running one token at a time. As your block size increases, you quickly get to fast enough that it doesn't matter sooooo much whether you're doing blocks or actual all-at-once generation. What matters, then, is there quality trade-off for moving to block-mode output. And here it sounds like they've minimized that trade-off.

Replies

RugnirViking • last Wednesday at 8:22 AM

can it go back and use future blocks as context? Thats what i'm most interested in here - fixing line 2 because of a change/discovery we made in the process of writing line 122. I think that problem is a big part of the narrowsightedness of current coding models

➕ show 1 reply

alt Hacker News

Replies