logoalt Hacker News

GardenLetter27today at 8:16 AM0 repliesview on HN

It's not just the architecture but also the data - the decoder only approach lets you train in parallel over blocks of text (no RNN serial waiting), that allows you train on much, much more data.