We replaced RAG with a virtual filesystem for our AI documentation assistant

383 points • by denssumesh • last Thursday at 6:24 PM • 145 comments • view on HN

Comments

softwaredoug • last Friday at 5:41 PM

The real thing I think people are rediscovering with file system based search is that there’s a type of semantic search that’s not embedding based retrieval. One that looks more like how a librarian organizes files into shelves based on the domain.

We’re rediscovering forms of in search we’ve known about for decades. And it turns out they’re more interpretable to agents.

https://softwaredoug.com/blog/2026/01/08/semantic-search-wit...

➕ show 17 replies

tensor • last Friday at 8:38 PM

This is one of the most confusing claims I've seen in a long time. Grep and others over files would be the equivalent of an old fashioned keyword search where most RAG uses vector search. But everything else they claim about a file system just suggests that they don't know anything about databases.

I'm not familiar with how most out of the box RAG systems categorize data, but with a database you can index content literally in any way you want. You could do it like a filesystem with hierarchy, you could do it tags, or any other design you can dream up.

The search can be keyword, like grep, or vector, like rag, or use the ranking algorithms that traditional text search uses (tf-idf, BM25), or a combination of them. You don't have to use just the top X ranked documents, you could, just like grep, evaluate all results past whatever matching threshold you have.

Search is an extremely rich field with a ton of very good established ways of doing things. Going back to grep and a file system is going back to ... I don't know, the 60s level of search tech?

➕ show 6 replies

sunir • last Friday at 7:25 PM

I am really enjoying this renaissance in CLI world applications. There's so much possible.

I'm working on a related challenge which is mounting a virtual filesystem with FUSE that mirrors my Mac's actual filesystem (over a subtree like ~/source), so I can constrain the agents within that filesystem, and block destructive changes outside their repo.

I have it so every repo has its own long-lived agent. They do get excited and start changing other repos, which messes up memory.

I didn't want to create a system user per repo because that's obnoxious, so I created a single claude system user, and I am using the virtual file system to manage permissions. My gmail repo's agent can for instance change the gmail repo and the google_auth repo, but it can't change the rag repo.

Edit: I'm publishing it here. It's still under development. https://github.com/sunir/bashguard

slp3r • last Friday at 8:48 PM

This feels like massive overengineering just to bypass naive chunking. Emulating a POSIX shell in TS on top of ChromaDB to do hierarchical search is going to destroy your TTFT. Every ls and grep the agent decides to run is a separate inference cycle. You're just trading RAG context-loss for severe multi-step latency

➕ show 2 replies

benlm • yesterday at 2:31 PM

We use both a virtual file system and RAG — they each excel in different areas. The trick with RAG is the quality of data: we use an LLM to chunk into semantically cohesive sections, as well as generate metadata (including fact triples and links to other related chunks in the document) for every chunk as well as the document as a whole. We use voyage contextual embeddings to then embed each chunk with the document and chunk metadata. Works incredibly well. At retrieval time the agent can follow chunk links if needed, as well as analyze the raw file in the vfs. High quality instruction based reranking helps a lot too! We are often looking over 10s of thousands of documents and it’d be very inefficient to have our agents analyze just the vfs without rag.

➕ show 1 reply

chelm • yesterday at 3:57 PM

RAG provided me no way to read the content myself. I now integrate the knowledge into a static page that I can read and edit myself in Markdown. Similar to MkDocs. But after I edit the content or remove elements that are no longer true, I build a JSON file and tell the agent how to query this source.

python -c " import json, wire, pathlib d = json.loads((pathlib.Path(wire.__file__).parent / 'assets/search_index.json').read_text()) [print(e['title'], e['url']) for e in d if 'QUERY' in (e.get('body','') + e.get('title','')).lower()] "

python -c " import json, wire, pathlib d = json.loads((pathlib.Path(wire.__file__).parent / 'assets/search_index.json').read_text()) [print(e['body']) for e in d if e.get('url','') == 'PATH'] "

https://wire.wise-relations.com/use-cases/replace-rag/

Galanwe • last Friday at 6:21 PM

I am not familiar with the tech stack they use, but from an outsider point of view, I was sort of expecting some kind of fuse solution. Could someone explain why they went through a fake shell? There has to be a reason.

➕ show 1 reply

nlawalker • last Friday at 8:33 PM

Relative to making docs accessible to AI via filesystem tools, I've been looking around to see what kinds of patterns SDK authors are using to get AI coding agents to use the freshest documentation, and Vercel is doing something interesting with their AI SDK that I haven't seen elsewhere (although maybe I just haven't looked hard enough).

The "ai" npm package includes a root-level docs folder containing .mdx versions of the docs from their site, specific to the version of the package. Their intended AI-assisted developer experience is that people discover and install their ai-sdk skill (via their npx skills tool, which supports discovery and install of skills from most any provider, not just Vercel). The SKILL.md instructs the agent to explicitly ignore all knowledge that may have been trained into its model, and to first use grep to look for docs in node_modules/ai/docs/ before searching the website.

https://github.com/vercel/ai/blob/main/skills/use-ai-sdk/SKI...

petcat • yesterday at 2:46 PM

> At 850,000 conversations a month, even a minimal setup (1 vCPU, 2 GiB RAM, 5-minute session lifetime) would put us north of $70,000 a year

Am I crazy or is 850,000/month of anything...not really that much? Where are you spending all your CPU cycles and memory usage?

> ChromaFs is built on just-bash by Vercel Labs (shoutout Malte!), a TypeScript reimplementation of bash that supports grep, cat, ls, find, and cd

Oh.. I see.

➕ show 1 reply

pboulos • last Friday at 6:04 PM

I think this is a great approach for a startup like Mintlify. I do have skepticism around how practical this would be in some of the “messier” organisations where RAG stands to add the most value. From personal experience, getting RAG to work well in places where the structure of the organisation and the information contained therein is far from hierarchical or partition-able is a very hard task.

➕ show 3 replies

seanlinehan • last Friday at 5:42 PM

This is definitely the way. There are good use cases for real sandboxes (if your agent is executing arbitrary code, you better it do so in an air-gapped environment).

But the idea of spinning up a whole VM to use unix IO primitives is way overkill. Makes way more sense to let the agent spit our unix-like tool calls and then use whatever your prod stack uses to do IO.

➕ show 1 reply

emson • yesterday at 7:14 AM

This is interesting as there is definitely a middle ground for agent memory. On the openclaw side you have a single MEMORY.md file on the other you have RAG and GraphRAG. I wonder if Agent memory should be more nuanced? When an agent learns something how should it promote or degrade these memory blocks - you don’t want a trading agent memorising a bad trading pattern, for example. Also the agent might want to recall semantically similar memories, but it might also want to retrieve block relationships or groups of blocks for different purposes. We’ve been exploring all these concepts with “elfmem” (sELF improving MEMory): https://github.com/emson/elfmem Would love your feedback!

ACCount37 • yesterday at 11:42 AM

Traditional RAG is a poor fit for this generation of LLMs, because it doesn't fit the "agentic tool use" workflow at all.

Self-guided "grep on a filesystem" often beats RAG because it allows the LLM to run "closed loop" and iteratively refine its queries until it obtains results. Self-guided search loop is a superset of what methods like reranking try to do.

I don't think vector search and retrieval is dead, but the old-fashioned RAG is. Vector search would have to be reengineered to fit into the new agentic workflows, so that the advantages of agentic LLMs can compound with that of vector search - because in current day "grep vs RAG" matchups, the former is already winning on the agentic merits.

"Optimize grep-centric search" is a surprisingly reasonable stopgap in the meanwhile.

kangraemin • yesterday at 1:07 PM

This is essentially tool use with a filesystem interface — the LLM decides what to read instead of a retrieval pipeline choosing for it. Clean idea, and it sidesteps the chunking problem entirely.

Curious about the latency though. RAG is one round trip: embed query, fetch chunks, generate. This approach seems like it needs multiple LLM calls to navigate the tree before it can answer. How many hops does it typically take, and did you have to do anything special to keep response times reasonable?

➕ show 1 reply

jdthedisciple • last Friday at 7:33 PM

But SQLite is notoriously 35% faster than the filesystem [0], so why not use that?

[0] https://news.ycombinator.com/item?id=14550060

➕ show 2 replies

kenforthewin • last Friday at 6:34 PM

I don't get it - everybody in this thread is talking about the death of vector DBs and files being all you need. The article clearly states that this is a layer on top of their existing Chroma db.

➕ show 2 replies

tylergetsay • last Friday at 6:33 PM

I dont understand the additional complexity of mocking bash when they could just provide grep, ls, find, etc tools to the LLM

➕ show 2 replies

dangoldbj • yesterday at 12:39 PM

I think the interesting bit here is that filesystems give the model something it can actually operate on (ls, grep, etc), not just query.

pwr1 • last Friday at 8:15 PM

This mirrors something we ran into building an AI pipeline for audio content. The problem with traditional RAG is that chunking destroys the structure that actually matters — you end up retrieving fragments that are semantically similar but contextually useless.

The filesystem metaphor works because it preserves heirarchy. Documents have sections, sections have relationships, and those relationships carry meaning that gets lost when you flatten everything into embeddings.

Curious how this handles versioning though. Docs change constantly and stale context fed to an LLM is arguably worse than no context at all.

jiusanzhou • yesterday at 7:01 AM

Clever use of just-bash to avoid the sandbox cold-start problem. The key insight here is that agents don't need a real filesystem — they need a familiar interface backed by whatever storage you already have. We're seeing the same pattern in coding agents: directory hierarchy turns out to be a surprisingly effective knowledge graph that LLMs navigate better than embedding-based retrieval, mostly because they've been heavily trained on shell interactions.

ahstilde • yesterday at 6:07 PM

buy off the shelf: https://archil.com/

shaial • last Friday at 9:34 PM

The title says you replaced RAG, but ChromaFs is still querying Chroma on every command — you replaced RAG's interface, not RAG itself. Which is actually the more interesting finding: the retrieval was never the bottleneck, the abstraction was. Agents don't need better search. They need `grep`.

➕ show 1 reply

namxam • last Friday at 8:21 PM

And you did not teach it to access chroma directly, because there is no adapter? Or because it is so much better at using FS tooling?

But in the end, I would expect, that you could add a skill / instructions on how to use chromadb directly

To be honest, I have no idea what chromadb is or how it works. But building an overlay FS seems like quite lot of work.

dmix • last Friday at 6:19 PM

This puts a lot of LLM in front of the information discovery. That would require far more sophisticated prompting and guardrails. I'd be curious to see how people architect an LLM->document approach with tool calling, rather than RAG->reranker->LLM. I'm also curious what the response times are like since it's more variable.

➕ show 1 reply

bluegatty • last Friday at 6:35 PM

RAG should no have have been represented as a context tool but rather just vector querying ad an variation of search/query - and that's it.

We were bitten by our own nomenclature.

Just a small variation in chosen acronym ... may have wrought a different outcome.

Different ways to find context are welcome, we have a long way to go!

➕ show 1 reply

mandeepj • last Friday at 6:08 PM

> even a minimal setup (1 vCPU, 2 GiB RAM, 5-minute session lifetime) would put us north of $70,000 a year based on Daytona's per-second sandbox pricing ($0.0504/h per vCPU, $0.0162/h per GiB RAM)

$70k?

how about if we round off one zero? Give us $7000.

That number still seems to be very high.

➕ show 2 replies

maille • last Friday at 5:57 PM

Let's say I want a free, local or free-tier-llm, simple solution to search information mostly from my emails and a little bit from text, doc and pdf files. Are there any tool I should try to have ollamma or gemini able to reply with my own knowledge base?

➕ show 1 reply

zbyforgotpass • last Friday at 8:32 PM

I don't know - we are discussing techniques - like having information in files, or in a semantic database, or in a relational database - as if there was one way that could dominate all information access. But finding the right information is not one task - if the needed information is a summary of expenses from a period of time then the best source of it will be a relational database, if it is who is the head of the HR department in a particular company - then it could probably be easy found on the company intranet pages (which are kind of graph database). It does not really matter much if the searcher is a human or LLM - there are some differences in the speed, the one time useful context length and the fact that LLMs are amnesiac - but these are just parameters, the task for humans is immensely complicated and there is no one architecture and there will not be one for LLMs.

I also vibed a brainstorming note with my knowledge base system. The initial prompt: """when I read "We replaced RAG with a virtual filesystem for our AI documentation assistant (mintlify.com)" title on HackerNews - the discussion is about RAG, filesystems, databases, graphs - but maybe there is something more fundamental in how we structure the systems so that the LLM can find the information needed to answer a question. Maybe there is nothing new - people had elaborate systems in libraries even before computers - but maybe there is something. Semantic search sounds useful - but knowing which page to return might be nearly as difficult as answering the question itself - and what about questions that require synthesis from many pages? Then we have distillation - an table of content is a kind of distillation targeting the task of search. """ Then I added a few more comments and the llm linked the note with the other pages in my kb. I am documenting that - because there were many voices against posting LLM generated content and that a prompt will be enough. IMHO the prompt is not enough - because the thought was also grounded in the whole theory I gathered in the KB. And that is also kind of on topic here. Anyway - here is the vibed note: https://zby.github.io/commonplace/notes/charting-the-knowled...

dust42 • last Friday at 6:33 PM

If grep and ls do the trick, then sure you don't need RAG/embeddings. But you also don't need an LLM: a full text search in a database will be a lot more performant, faster and use less resources.

nithril • yesterday at 6:28 AM

The news headline is misleading, it does not reflect the actual article title, which is much closer to what they truly did.

They did not replaced RAG because they are still using chunk and embedding. What they changed is the interface.

znnajdla • yesterday at 2:42 PM

> The obvious way to do this is to just give the agent a real filesystem. Most harnesses solve this by spinning up an isolated sandbox and cloning the repo. We already use sandboxes for asynchronous background agents where latency is an afterthought, but for a frontend assistant where a user is staring at a loading spinner, the approach falls apart. Our p90 session creation time (including GitHub clone and other setup) was ~46 seconds.

Am I the only one who read this and thought this is fucking insane? Who in their right mind would even consider spinning up a virtual machine and cloning a repo on every search query? And if all you need is a real filesystem why would you emulate a filesystem on top of a database (Chroma)? If you need a filesystem just use an actual filesystem! This sounds like insane gymnastics just to fit a “serverless” workflow. 850,000 searches a month (less than 1 request per second) sounds like something a single raspberry pi or Mac Mini could handle.

tschellenbach • last Friday at 6:24 PM

I think generally we are going from vector based search, to agentic tool use, and hierarchy based systems like skills.

➕ show 2 replies

stuaxo • last Friday at 11:27 PM

Oh that's funny, I just built a RAG and exposing the files inside the database as files seemed like the next logical steo.

I would have used Fuse if it got to that point as then it is an actual filesystem.

kjgkjhfkjf • last Friday at 8:45 PM

Seems like it would be simpler to give the agent tools to issue ChromaDB (or SQL) queries directly, rather than giving the LLM unix-like tools that are converted into queries under the hood using a complicated proprietary setup.

➕ show 1 reply

siliconc0w • yesterday at 1:26 PM

I'm working on a filesystem for agents for similar reasons - https://clawfs.dev - lmk if your team would like an invite..

fudged71 • yesterday at 3:31 AM

since the solution here doesnt appear to be open source, I think you can get something similar by asking your agents to take AgentFS and replacing the DB with ChromaDB

HanClinto • last Friday at 6:40 PM

> "The agent doesn't need a real filesystem; it just needs the illusion of one. Our documentation was already indexed, chunked, and stored in a Chroma database to power our search, so we built ChromaFs: a virtual filesystem that intercepts UNIX commands and translates them into queries against that same database. Session creation dropped from ~46 seconds to ~100 milliseconds, and since ChromaFs reuses infrastructure we already pay for, the marginal per-conversation compute cost is zero."

Not to be "that guy" [0], but (especially for users who aren't already in ChromaDB) -- how would this be different for us from using a RAM disk?

> "ChromaFs is built on just-bash ... a TypeScript reimplementation of bash that supports grep, cat, ls, find, and cd. just-bash exposes a pluggable IFileSystem interface, so it handles all the parsing, piping, and flag logic while ChromaFs translates every underlying filesystem call into a Chroma query."

It sounds like the expected use-case is that agents would interact with the data via standard CLI tools (grep, cat, ls, find, etc), and there is nothing Chroma-specific in the final implementation (? Do I have that right?).

The author compares the speeds against the Chroma implementation vs. a physical HDD, but I wonder how the benchmark would compare against a Ramdisk with the same information / queries?

I'm very willing to believe that Chroma would still be faster / better for X/Y/Z reason, but I would be interested in seeing it compared, since for many people who already have their data in a hierarchical tree view, I bet there could be some massive speedups by mounting the memory directories in RAM instead of HDD.

[0] - https://news.ycombinator.com/item?id=9224

➕ show 1 reply

devops000 • last Friday at 7:44 PM

Why not a simple full text search in Postgres ?

➕ show 1 reply

jrm4 • last Friday at 6:40 PM

Is this related to that thing where somehow the entire damn world forgot about the power of boolean (and other precise) searching?

bitwize • yesterday at 1:31 AM

What if... each agent had its own virtual file system, and anything the agent needed to access was accessible as files in the filesystem?

Congratulations, you just reinvented Plan 9. I think we're going to end up reinventing a lot of things in computing that we discovered and then forgot about because Apple/Microsoft/Google couldn't monetize them, "because AI". And I don't know how to feel about that.

badgersnake • last Friday at 7:56 PM

So you did GraphRAG but your graph is a filesystem tree.

yieldcrv • last Friday at 7:48 PM

I love the multipronged attack on RAG

RIP RAG: lasted one year at a skillset that recruiters would list on job descriptions, collectively shut down by industry professionals

ctxc • last Friday at 6:25 PM

haha, sweet. One of the cooler things I've read lately

ryguz • yesterday at 4:24 PM

[dead]

LeonTing1010 • yesterday at 11:04 AM

[dead]

microbuilderco • yesterday at 8:09 AM