Embeddings are still mostly just vectors into n-dimensional K-means clusters. It isn't "kn...

WorldMaker • yesterday at 7:59 PM • 1 reply • view on HN

Embeddings are still mostly just vectors into n-dimensional K-means clusters. It isn't "knowing" two things are related and here's the evidence, it is guessing two things are statistically likely to be related, based on trained patterns, and running with it without evidence.

It has no "semantic understanding" as we would define it. It's just increasingly good at winning cluster lotteries because we've increased the amount of training data to incredible heights.

Replies

anon84873628 • today at 4:33 AM

Can you explain how you "know" two things are related? If I ask you the similarities between a cat and a dog, is your answer based solely on an understanding of their genetic phylogeny and how those genes express traits?

Grouping vectors in concept space is exactly how you create semantic understanding. The proof is in how good they are at creating semantically valid text. The fact that it took massive amounts of data is irrelevant. That just shows how much knowledge is encoded in all our language. It takes humans a ton of training to know things too.

➕ show 1 reply

alt Hacker News

Replies