Can you explain how you "know" two things are related? If I ask you the similarities between a cat and a dog, is your answer based solely on an understanding of their genetic phylogeny and how those genes express traits?
Grouping vectors in concept space is exactly how you create semantic understanding. The proof is in how good they are at creating semantically valid text. The fact that it took massive amounts of data is irrelevant. That just shows how much knowledge is encoded in all our language. It takes humans a ton of training to know things too.
> is exactly how
We don't know that. It seems like great hubris to declare we know how the human brain works. You are asking me to explain how we know things and then telling me we've already figured it out in the same breath, and that's hilarious.
It doesn't take massive amounts of language data to train a baby human. It is almost entirely just: "Look. Here's a cat. Can you say cat? Cats go meow." "Over here, your aunt has a dog. Dogs go woof."
There's generally a flood of non-lingual contextual data in such moments such as sights, smells, sounds, movements, touch but that also only further underscores how different LLM training is from anything we'd consider human learning. Our memories aren't just "conceptual spaces of linguistic topics", they are complex sensory maps where a smell can remind you of the first dog you ever met. There is so much of our human knowledge that is not and never been encoded in most of our languages.
The fact that LLMs take massive amounts of linguistic data is relevant, because it shows how far we still have to go in barely scratching the surface of how the human brain seems to work. (Which again, we know only the barest details. Anyone who tells you they know 100% of how the human brain operates so far tends to be a snake oil salesman.)