logoalt Hacker News

13pixelsyesterday at 3:09 PM2 repliesview on HN

The 'Vector DB vs Keyword Search' section caught my eye. In your testing for RAG pipelines, where do you draw the line?

We've found keyword search (BM25) often beats semantic search for specific entity names/IDs, while vectors win on concepts. Do you cover hybrid search patterns/re-ranking in the book? That seems to be where most production systems end up.


Replies

eshaham78yesterday at 7:03 PM

Great question. In our production experience, the hybrid approach (BM25 + vector) typically wins for most use cases around 70/30 split favoring keyword for exact matches. The key insight is that reranking becomes critical - without it, you're just concatenating results and hoping. We typically use cross-encoder rerankers (like Cohere or custom fine-tuned models) to score the combined results. The break-even point for pure semantic search is usually when queries are abstract concept-heavy, not entity-specific.

xx123122yesterday at 3:34 PM

Thanks for the insight.We definitely plan to cover these patterns in future updates. Please excuse a slight delay as our team is currently celebrating the Chinese New Year. We'll be back to shipping code right after the holidays.OWO