Someone simply assumed at some point that RAG must be based on vector search, and everyone followed.
We were given a demo of a vector based approach, and it didn't work. They said our docs were too big and for some reason their chunking process was failing. So we ended up using a good old fashioned Elastic backend because that's what we know, and simply forwarding a few of these giant documents to the LLM verbatim along with the user's question. The results have been great, not a single complaint about accuracy, results are fast and cheap using OpenAI's micro models, Elastic is mature tech everyone understands so it's easy to maintain.
I think this turned out to be one of those lessons about premature optimization. It didn't need to be as complex as what people initially assumed. Perhaps with older models it would have been a different story.
I don't think this was a simple assumption. LLMs used to be much dumber! GPT-3 era LLMS were not good at grep, they were not that good at recovering from errors, and they were not good at making followup queries over multiple turns of search. Multiple breakthroughs in code generation, tool use, and reasoning had to happen on the model side to make vector-based RAG look like unnecessary complexity
It was the terminology that did that more than anything. The term 'RAG' just has a lot of consequential baggage. Unfortunately.
Certainly a lot of blog posts followed. Not sure that “everyone” was so blinkered.
Doesn't have to be tho, I've had great success letting an agent loose on an Apache Lucene instance. Turns out LLMs are great at building queries.
RAG is like when you want someone to know something they're not quite getting so you yell a bit louder. For a workflow that's mainly search based, it's useful to keep things grounded.
Less useful in other contexts, unless you move away from traditional chunked embeddings and into things like graphs where the relationships provide constraints as much as additional grounding
It’s something of a historical accident
We started with LLMs when everyone in search was building question answering systems. Those architectures look like the vector DB + chunking we associate with RAG.
Agents ability to call tools, using any retrieval backend, call that into question.
We really shouldn’t start RAG with the assumption we need that. I’ll be speaking about the subject in a few weeks
https://maven.com/p/7105dc/rag-is-the-what-agentic-search-is...