Good point and it's actually worse than that : the thinking tokens aren't affected by this...

DimitriBouriez • today at 11:50 AM • 1 reply • view on HN

Good point and it's actually worse than that : the thinking tokens aren't affected by this at all (the model still reasons normally internally). Only the visible output that gets compressed into caveman... and maybe the model actually need more thinking tokens to figure out how to rephrase its answer into caveman style

Replies

zozbot234 • today at 12:15 PM

Grug says you can tune how much each model thinks. Is not caveman but similar. also thinking is trained with RL so tends to be efficient, less fluffy. Also model (as seen locally) always drafts answer inside thinking then output repeats, change to caveman is not really extra effort.

alt Hacker News

Replies