I have a philosophical problem with adaptive thinking. It’s a dumb guess for how much thinking budget to allocate ahead of thinking. At least in the context of LLMs there is probably no way of knowing how much thinking (token generation) is needed. The problem space is infinity vast, similarly of two prompts is not going to help any LLM decide how much thinning is needed. Models already stop thinking before hitting the thinking budget.
Why there is so much effort in making adaptive thinking happen and don’t we train models to produce the end of thinning token better?
Feels like a bandaid. We need models to be trained to do a reasonable amount of reasoning (no pub intended):
reason
estimate remaining uncertainty
continue?
reason more
repeatAt least there should be a tool call that's the equivalent of saying "wow, this is more complicated than I thought". Humans are also often prone to under-allocating reasoning time and coming to wrong conclusions because their reasoning ends up too shallow. But the best humans are great at mentally mapping the problem space and readjusting on the fly
This is what I do in llm-consortium. An arbiter evaluates the response(s) and decides if more iterations are needed. You can also loop until a minimum confidence threshold, but self-reported confidence isn't a great metric.
Right now we have a LOT of band aids. You want to optimize compute and thinking to a particular problem, sort of like we do. Yes you cannot perfectly predict this but you can do decently well and save a ton of tokens at the cost of this band aid being sort of leaky and gross.
But the larger problem is sound, and the answer is something jointly optimized (idk how they do the routing) but it’s hard to shoehorn it into the current paradigm.
Modern LLMs are nothing but band-aids, starting with the absurd bandwidth of HBM3 RAM that makes them possible.
I agree, adaptive thinking is a pest and without a minimum thinking budget especially Claude for me currently defaults to not think at all even on max effort.
Sequential-thinking was really a step in the right direction, and works almost exactly how you've described, though when it was popular before the reasoning models and even now when I tried it recently I have never once see it use its branching feature and it tends also to have the RLHF urge to answer something "helpful" quickly instead.