This is the very question under debate. Training LLMs on publicly available data is a novel situation, and neither law nor social opinion have settled a consensus on the subject.
Copyright maximalists like to borrow unearned moral weight for their position by conflating copyright infringement with "stealing", but this is not actually true in any legal sense. It's not clear that training an AI on publicly available data should even constitute copyright infringement, much less "stealing".