logoalt Hacker News

HarHarVeryFunnytoday at 3:10 PM0 repliesview on HN

This is nothing new - these companies don't want their model's output to be useful for distillation/training, so they just give a "summary" of its thinking steps rather than the actual sequence.

RL (the basis of LLM "thinking") is a pretty crude way to achieve the appearance of reasoning given that it reinforces all the steps, including missteps, that got it to a reward. Providing a summary could be seen as form of sane-washing, making the model look more purposeful and directed than it really is!