logoalt Hacker News

AlienRobotyesterday at 11:25 AM0 repliesview on HN

I genuinely don't understand how can this possibly be a problem long term.

It feels very obvious that the solution is to have a smaller model that can be trained exclusively on Java information to augment the older model. If the architecture doesn't support it currently, then that's what the architecture will look like in the future.

Otherwise you'd be arguing that, to serve users who want to an up-to-date LLM on topic X, you have to train the model on the entire ABC all over again.

It's simply ludicrous to have a coding LLM that needs to be retrained on the latest published poems and pastry recipes to generate Java.