This is one of the most confusing claims I've seen in a long time. Grep and others over files w...

tensor • last Friday at 8:38 PM • 6 replies • view on HN

This is one of the most confusing claims I've seen in a long time. Grep and others over files would be the equivalent of an old fashioned keyword search where most RAG uses vector search. But everything else they claim about a file system just suggests that they don't know anything about databases.

I'm not familiar with how most out of the box RAG systems categorize data, but with a database you can index content literally in any way you want. You could do it like a filesystem with hierarchy, you could do it tags, or any other design you can dream up.

The search can be keyword, like grep, or vector, like rag, or use the ranking algorithms that traditional text search uses (tf-idf, BM25), or a combination of them. You don't have to use just the top X ranked documents, you could, just like grep, evaluate all results past whatever matching threshold you have.

Search is an extremely rich field with a ton of very good established ways of doing things. Going back to grep and a file system is going back to ... I don't know, the 60s level of search tech?

Replies

brap • last Friday at 8:44 PM

I get what you’re saying, and you’re right, however I can also see where they’re coming from:

Empirically, agents (especially the coding CLIs) seem to be doing so much better with files, even if the tooling around them is less than ideal.

With other custom tools they instantly lose 50 IQ points, if they even bother using the tools in the first place.

➕ show 1 reply

pjm331 • last Friday at 9:10 PM

Yeah I’ve had a lot of success with agentic search against a database.

The way I think of it, the main characteristic of agentic search is just that the agent can execute many types of adhoc queries

It’s not about a file system

As I understood it early RAG systems were all about performing that search for the agent - that’s what makes that approach “non agentic”

But when I have a database that has both embeddings and full text and you can query against both of those things and I let the agent execute whatever types of queries it wants - that’s “agentic search” in my book

➕ show 1 reply

thefourthchime • last Friday at 9:16 PM

I didn't get into the details too much, but I kept thinking, why isn't he just having an agent discover things from various data sources? I've had much better success with that.

jimbokun • yesterday at 2:11 PM

Isn’t this the approach described in the article?

dboreham • yesterday at 4:03 PM

Also odd in that most filesystems implement directories and file names as...a database. You can use a filesystem as a database but you're not being as clever as you thought.

alt Hacker News

Replies