You seem like the type of person that will believe anything as long as someone cites a case without looking into it. Bartz v Anthropic only looked at books, and there was still a 1.5 billion settlement that Anthropic paid out because it got those books from LibGen / Anna's Archive, and the ruling also said that the data has to be acquired "legitimately".
Whether data acquired from a licence that specifically forbids building a derivative work without also releasing that derivative under the same licence counts as a legitimate data gathering operation is anyone's guess, as those specific circumstances are about as far from that prior case as they can be.
As long as they don't distribute the model's weights, even a strict interpretation of the GPL should be fine. Same reason Google doesn't have to upstream changes to the Linux kernel they only deploy in-house.
Have you actually read the text of the GPL?
> This License acknowledges your rights of fair use or other equivalent, as provided by copyright law.
It is legitimate to acquire GPL software. The requirements of the license only occur if you're distributing the work AND fair use does not apply.
Training certainly doesn't count as distribution, so the buck passes to inference, which leaves us dealing with substantial similarity test, and still, fair use.