logoalt Hacker News

dinobonesyesterday at 4:17 PM1 replyview on HN

So this is basically an “embedding of embeddings”, an approximation of multiple embeddings compressed into one, to reduce dimensionality/increase performance.

All this tells me is that: the “multiple embeddings” are probably mostly overlapping and the marginal value of each additional one is probably low, if you can represent them with a single embedding.

I don’t otherwise see how you can keep comparable performance without breaking information theory.


Replies

kevmo314yesterday at 8:13 PM

> marginal value of each additional one is probably low

This is the point of the paper. Specifically, that single embedding vectors are sparse enough that you can compact more data from additional vectors together to improve retrieval performance.