We use both a virtual file system and RAG — they each excel in different areas. The trick with RAG is the quality of data: we use an LLM to chunk into semantically cohesive sections, as well as generate metadata (including fact triples and links to other related chunks in the document) for every chunk as well as the document as a whole. We use voyage contextual embeddings to then embed each chunk with the document and chunk metadata. Works incredibly well. At retrieval time the agent can follow chunk links if needed, as well as analyze the raw file in the vfs. High quality instruction based reranking helps a lot too! We are often looking over 10s of thousands of documents and it’d be very inefficient to have our agents analyze just the vfs without rag.
Our vfs is also pretty powerful too, though: it is all backed by postgres then projected into files/directories for our agents. They get basic grep etc but also optimized fts tools for bm25, jq, and preview tools that show representative slices of large documents. All on top of Pydantic AI.