Maybe impressive in one way, but I'm also pretty sure a simple n-gram Markov model (a la Niall ...

vintermann • today at 10:43 AM • 0 replies • view on HN

Maybe impressive in one way, but I'm also pretty sure a simple n-gram Markov model (a la Niall on the Amiga) would have a lower loss on the test set.

Transformers don't scale down very well, in my experience - I used to train local models all the time as new ones were released, as I recall transformers were the first ones I couldn't get better results out of with my limited training data and GPU.

alt Hacker News