Have you actually read the text of the GPL?
> This License acknowledges your rights of fair use or other equivalent, as provided by copyright law.
It is legitimate to acquire GPL software. The requirements of the license only occur if you're distributing the work AND fair use does not apply.
Training certainly doesn't count as distribution, so the buck passes to inference, which leaves us dealing with substantial similarity test, and still, fair use.
You and I are not a fucking judge, our opinions on this don't matter one bit. We might as well print it on a piece of paper and wipe our asses with it.
There is the clean room problem though.
If a human reads GPL code and outputs a recreation of that code (derivative work) using what they learned - that is illegal.
If an AI reads GPL code and outputs a recreation of that code using what it "learned" - it's not illegal?
If that is the case, then copyright holds no weight any more. I should be allowed to train an LLM on decompiled firmware (say, Playstation, Switch, iPhone) in countries where decompilation is legal - then have the LLM produce equivalent firmware that I later use to build an emulator (or competing open source firmware).