logoalt Hacker News

tensorlast Friday at 8:48 PM5 repliesview on HN

Sorry, this still makes no sense. LLMs don't care about files. The way most codings systems work is that they simply provide the whole file to the LLM rather than a subset of it. That's just a choice in how you implemented your RAG search system and database. In this case the "record" is big, a file. No doubt that works for code, but it's nonsensical outside that.

E.g. for wikipedia the logical unit would likely be an article. For a book, maybe it's a chapter, or maybe it's a paragraph. You need to design the system around your content and feed the LLM an appropriate logically related set of data.


Replies

braplast Friday at 10:19 PM

>LLMs don't care about files.

Oh but they do. These CLI agents are trained and specifically tuned to work with the filesystem. It’s not about the content or how it’s actually stored, it’s about the familiar access patterns.

I can’t begin to tell you how many times I’ve seen a coding agent figure out it can get some data directly from the filesystem instead of a dedicated, optimized tool it was specifically instructed to use for this purpose.

You basically can’t stop these things from messing with files, it’s in their DNA. You block one shell command, they’ll find another. Either revoke shell access completely or play whackamole. You cannot believe how badly they want to work with files.

raincoleyesterday at 11:20 AM

> LLMs don't care about files

They do. I highly suggest not try to derive LLMs' behaviors (in your mind) from first principles, but actually use them.

darkteflonlast Friday at 9:38 PM

Yeah, some of the uplift people are anecdotally seeing from “just using the filesystem” is, imo, on account of how difficult it is to take a principled approach to pre-chunking when implementing other approaches.

girvoyesterday at 12:38 AM

They've been RLHF'd to the nth degree around working with *nix tools and filesystems, in practice.

pertymcpertyesterday at 7:22 AM

They do care about files. They also care about how you express yourself, your tone, all sorts of seemingly unimportant details.