logoalt Hacker News

The Road to a Billion-Token Context

25 pointsby pseudoluslast Friday at 10:28 PM30 commentsview on HN

Comments

stephschietoday at 12:34 PM

Hmm, I'm not convinced that is the direction we want to go in. It's not like we have all the context of everything we ever learned present when making decisions. Heck, even for CPUs and GPUs we have strict hierachy of L1,L2,L3 shared, caches to larger memory units with constant management of those. Feel free to surprise me, but I believe having a similar stack for LLMs is the better way to go where we will have short-term memory (system-prompt, prompt, task), mid-term memory (session-knowledge, preferences), long-term memory (project knowledge, tech/stack insights), intuition memory (stemming from language, physics, rules). But right now we haven't developed best-practices yet of what information should go into what layer at what times. Increasing the overall context window is nice, but IMHO won't help us much.

Schlagbohrertoday at 10:34 AM

Amazing that they are trying to solve this with hardware rather than with a new software architecture but I suppose the current technology underlying LLM software must be far and away the best theoretically or most established, or the time taken to seek a new model isn't worth it for the big companies.

I know Yann LeCun is trying to do a completely different architecture and I think that's expected to take 2-3 years before showing commercial results, right? Is that why they're finding it quicker to change the hardware?

show 2 replies
Schlagbohrertoday at 10:32 AM

What does this mean: "In addition, because most AI models are not trained uniformly across their maximum context length, their reasoning quality tends to degrade gradually near the limit rather than fail abruptly."

Models aren't trained across their context, their context is their short term memory at runtime, right? Nothing to do with training. They are trained on a static dataset.

show 7 replies
Havoctoday at 11:36 AM

Having it would be useful but I'd say long before you get there one should think about structuring the data in a more meaningful sense. Breaking tasks out into subagents etc.

schnitzelstoattoday at 9:51 AM

Is such a large context window even desirable? It seems like it would consume an awful lot of tokens and, unless one was very careful to curate the context, could even result in worse performance.

show 6 replies
__alexstoday at 10:14 AM

Does having 1 billion tokens mean more total tokens in the context window are actually good quality, or do we just get more dumb tokens?

show 1 reply
AureliusMAtoday at 10:26 AM

How large would a 1 billion token kv even be ?!

show 1 reply
alexreysatoday at 11:08 AM

[flagged]