logoalt Hacker News

CuriouslyClast Saturday at 11:44 PM1 replyview on HN

Embeddings are good at partitioning document stores at a coarse grained level, and they can be very useful for documents where there's a lot of keyword overlap and the semantic differentiation is distributed. They're definitely not a good primary recall mechanism, and they often don't even fully pull weight for their cost in hybrid setups, so it's worth doing evals for your specific use case.


Replies

visargayesterday at 7:00 AM

"12+38" won't embed close to "50", as you said they capture only surface level words ("a lot of keyword overlap") not meaning, it's why for small scale I prefer a folder of files and a coding agent using grep/head/tail/Python one liners.