> The example I recently read was that the contexts are large enough for the entire "The Lor...

pussyjuice • today at 7:02 PM • 0 replies • view on HN

> The example I recently read was that the contexts are large enough for the entire "The Lord of the Rings" books.

Not really, though. Not in practice at least, e.g. code writing.

Paste a 200 line React component into your favorite LLM, ask it to fix/add/change something and it will do it perfectly.

Paste a 2000 line one though, and it starts omitting, starts making mistakes, assumptions, re-writing what it already has, and so-on.

So what's going on? It's supposed to be able to hold 1000s of lines in context, but in practice it's only like 200.

What happens is the accuracy and agency drops significantly as you need to pan larger and larger context windows.

And it's not that it's most accurate when the window is smallest either - but there is a sweet spot.

Outside that sweet spot, you will get "unacceptable responses" - slop you can't use.

That's what happens when you paste the 2000 line React component for example. You get a response you can't quite use. Yet the 200 line one is typically perfect.

What would make the 2000 line one usually perfect every time?

We need a way to increase that "accurate window size" lets call it "working memory", so that we can generate more code, more writing, more pixels at acceptable levels of quality. You'd also have enough language space for agents to operate and collaborate sans the amnesia they have today.

RAG is basically the interim workaround for all this. Because you can put everything in a vector DB and search/find what you need in the context when you need it.

So, RAG is a great solution for today's problems: Say you have a bunch of Python code files written in a certain style and the main use case of your LLM is writing Python code in specified ways, with this setup you can probably deliver "better Python code" than your competitor because of RAG - because you have this deterministic supplement to your LLMs outputs to basically do research and augment the output in predetermined ways every time it responds to a prompt.

But eventually, if I don't have to upload "The Lord of the Rings" documents, and vector search to find different areas in order to generate responses, if I can just paste the entire txt into the input, it can generate the answer considering "all of it" not just that little area, it would presumably be a better quality response.

alt Hacker News