logoalt Hacker News

radarsat1last Wednesday at 10:34 AM0 repliesview on HN

This reminds me a lot of the tricks to turn BERT into a generative model. I guess the causal masking that keeps it to essentially be autoregressive is an important difference though. Kind of best of both worlds.