logoalt Hacker News

eshaham78yesterday at 7:03 PM0 repliesview on HN

Great question. In our production experience, the hybrid approach (BM25 + vector) typically wins for most use cases around 70/30 split favoring keyword for exact matches. The key insight is that reranking becomes critical - without it, you're just concatenating results and hoping. We typically use cross-encoder rerankers (like Cohere or custom fine-tuned models) to score the combined results. The break-even point for pure semantic search is usually when queries are abstract concept-heavy, not entity-specific.