How much did it cost to produce all the data on the internet and every book ever published? Surely even the most conservative calculations put it at multiple years of planetary GDP. The same argument can be made to say that letting the big labs get away with pirating it will disincentivize people to publish anything.
Not only publishing, it has already disincentivised a huge part of what made Web 2.0: public APIs for data access to platforms.
It was amazing to be able to create some toy projects using data from big platforms, now they're all afraid LLM trainers will scrape their contents and create a competitor to their moat, the data.
It just sucks at many different levels.
I personally have stopped publishing publicly, since my research is still on the fuzzy boundary of AI's current knowledge, my website gets scraped daily, and I don't want to contribute to paid models for zero acknowledgement or compensation.