logoalt Hacker News

jason_ostertoday at 1:50 AM0 repliesview on HN

The goal of training data is not to collect and reproduce facts. It is to align the model with the statistical probability of producing the correct (expected) response for data outside of its knowledge domain. This is called generalization [1], and it has been explored in depth.

[1]: https://arxiv.org/abs/2312.17173