logoalt Hacker News

robertkarlyesterday at 2:11 PM2 repliesview on HN

https://arxiv.org/abs/2606.00206

In this paper they nerf an LLMs ability to emit waffling thinking tokens like "wait", "but", "alternatively", and the models (they're old, small models in the paper) terminate reasoning faster and perform better. I bet Anthropic is tuning this on their backend.


Replies

orbital-decayyesterday at 10:11 PM

I imagine Anthropic would rather train a small control model instead of resorting to sampling hacks

meatmanekyesterday at 5:24 PM

This is super cool. Do you know if any of the inference backends (llama.cpp, vllm, etc) support this technique?

show 1 reply