For all we know, AI tech companies could theoretically have converted all of the "acquired" (ahem!) training set material into base64 and used it for training as well, just like you would encode say japanese romaji or hebrew written in the english alphabet.
'Yes, I know we already trained on all that data, but now I want you to convert to base64 and train it again! at enormous cost!'
Unlikely that every company would have bothered to do this.