logoalt Hacker News

nearbuyyesterday at 9:00 PM0 repliesview on HN

The parent comment probably forgot about the RLHF (reinforcement learning) where predicting the next token from reference text is no longer the goal.

But even regular next token prediction doesn't necessarily preclude it from also learning to give correct and satisfying answers, if that helps it better predict its training data.