Aren’t books massively outweighed by the crawled internet corpus?
I would doubt that because books are probably weighed as higher quality and more trustworthy than random Reddit posts
Especially if it's unsupervised training
I would doubt that because books are probably weighed as higher quality and more trustworthy than random Reddit posts
Especially if it's unsupervised training