logoalt Hacker News

gioboxlast Sunday at 10:01 PM8 repliesview on HN

> they don't have a used by date

For quite a lot of use cases, the current systems arguably do get worse over time if not continually updated. The knowledge cutoff date will start to hurt more and more as the weights age in a hypothetical scenario where you are stuck with them forever.

Coding, one of the most popular usescases today, would not be great if it say only understood java to a version from years ago etc.

https://en.wikipedia.org/wiki/Knowledge_cutoff


Replies

throwyawayyyylast Sunday at 10:35 PM

One solution is not to advance anything of course. I'm not even joking, is there going to be a successor to React? I suspect not, with the vast amount of training data for React now, it's going to look silly to move to something else with less support. What is the last new popular programming language, rust? Will there be another one? I suspect not. Same reasoning. The irony of all this AI acceleration talk is it'll work best if we don't accelerate the underlying tech at all.

show 6 replies
rrvshlast Sunday at 10:26 PM

Nobody is unaware of the knowledge cutoff, and sharing the Wikipedia article is not helping anyone. Your point is easily rebutted by taking whatever open weights/source model has an outdated cutoff and training or fine tuning it on more data, which is again always going to be viable given a modicum of compute

tcp_handshakerlast Sunday at 10:28 PM

You could learn how to code...a whole generation did it before...

mrtesthahlast Sunday at 11:55 PM

>Coding, one of the most popular uses cases today, would not be great if it say only understood java to a version from years ago etc.

This LLM trained only and entirely on pre-1930s texts was able to code Python programs when given only a short example:

https://talkie-lm.com/introducing-talkie

nullclast Sunday at 11:46 PM

Small models are more useful for "doing stuff" than "knowing stuff" to begin with. Add in an agentic harness and a small model can happily read more current information on demand (including from e.g. a local wikipedia snapshot).

lowbloodsugartoday at 2:57 AM

Laughs in JDK8 code base.

AlienRobotyesterday at 11:25 AM

I genuinely don't understand how can this possibly be a problem long term.

It feels very obvious that the solution is to have a smaller model that can be trained exclusively on Java information to augment the older model. If the architecture doesn't support it currently, then that's what the architecture will look like in the future.

Otherwise you'd be arguing that, to serve users who want to an up-to-date LLM on topic X, you have to train the model on the entire ABC all over again.

It's simply ludicrous to have a coding LLM that needs to be retrained on the latest published poems and pastry recipes to generate Java.

moffkalastyesterday at 11:14 AM

Ha yes I used to think this was not a notable issue, but just today I was getting qwen 3.5 to fix my network drivers and it immediately freaked out like: "kernel 6.17, what the fuck? that doesn't exist yet!". It almost had a mental breakdown over that detail and derailed the conversation towards checking what's wrong with the kernel version reporting lol.