logoalt Hacker News

2001zhaozhaotoday at 6:18 AM1 replyview on HN

At some point we have to be running into some inherent mathematical limits of knowledge compression, right? No way the knowledge benchmarks on these 8B models will keep getting better without overfitting on these benchmarks


Replies

yorwbatoday at 6:51 AM

If you give the model access to specialized tools (e.g. web search for question answering) the knowledge doesn't have to be stored in the model weights, which leaves some room for improvement. You'd still be overfitting to benchmarks (since different tasks might require different tools) but not necessarily to specific benchmark questions, so within-domain generalization could be quite good.

As an example for a similar approach, Teapot AI has trained very small models https://teapotai.com/models to only answer questions where the answer can be found within the context window, and although not perfect, they do quite well at this compared to larger, more general models.

show 1 reply