logoalt Hacker News

mlmonkeyyesterday at 5:30 PM1 replyview on HN

I'm no expert (just a monkey... ;) ), but isn't Diffusion supposed to generate ALL of the output at once? From their diagram, it looks like their I-LDM model seems to use previously generated context to generate the next tokens (or blocks).


Replies

sdenton4yesterday at 6:20 PM

Block auto regressive generation can give you big speedups.

Consider that outputting two tokens at a time will be a (2-epsilon)x speedup over running one token at a time. As your block size increases, you quickly get to fast enough that it doesn't matter sooooo much whether you're doing blocks or actual all-at-once generation. What matters, then, is there quality trade-off for moving to block-mode output. And here it sounds like they've minimized that trade-off.

show 1 reply