How about requiring AI companies to pay creators for training rights? Alternatively, models trained on the commons must be owned by the commons. Right now these AI companies are trying to have it both ways: it’s The People’s Data for training on comrade but ownership is privatized.
Practically speaking, who is going to enforce such a regime? Do you really want to give Chinese companies such a huge competitive advantage, that they aren't subject to the same costs as western companies? How do you even sort out which "creators" are owed, and how much? It's next to impossible, and would drown the legal system in litigation; it would likely cause more problems than it solves. On top of which you can find open weights for most, if not all, of the scraped material already. If you make those illegal to use, or prohibitively expensive, you just destroyed local LLM legality, and put the technology firmly in the hands of only the monopolists.