logoalt Hacker News

mohsen1today at 8:21 AM5 repliesview on HN

I have a philosophical problem with adaptive thinking. It’s a dumb guess for how much thinking budget to allocate ahead of thinking. At least in the context of LLMs there is probably no way of knowing how much thinking (token generation) is needed. The problem space is infinity vast, similarly of two prompts is not going to help any LLM decide how much thinning is needed. Models already stop thinking before hitting the thinking budget.

Why there is so much effort in making adaptive thinking happen and don’t we train models to produce the end of thinning token better?

Feels like a bandaid. We need models to be trained to do a reasonable amount of reasoning (no pub intended):

    reason

    estimate remaining uncertainty

    continue?

    reason more

    repeat

Replies

CjHubertoday at 4:20 PM

I agree, adaptive thinking is a pest and without a minimum thinking budget especially Claude for me currently defaults to not think at all even on max effort.

Sequential-thinking was really a step in the right direction, and works almost exactly how you've described, though when it was popular before the reasoning models and even now when I tried it recently I have never once see it use its branching feature and it tends also to have the RLHF urge to answer something "helpful" quickly instead.

wongarsutoday at 12:46 PM

At least there should be a tool call that's the equivalent of saying "wow, this is more complicated than I thought". Humans are also often prone to under-allocating reasoning time and coming to wrong conclusions because their reasoning ends up too shallow. But the best humans are great at mentally mapping the problem space and readjusting on the fly

irthomasthomastoday at 1:10 PM

This is what I do in llm-consortium. An arbiter evaluates the response(s) and decides if more iterations are needed. You can also loop until a minimum confidence threshold, but self-reported confidence isn't a great metric.

aspenmartintoday at 12:05 PM

Right now we have a LOT of band aids. You want to optimize compute and thinking to a particular problem, sort of like we do. Yes you cannot perfectly predict this but you can do decently well and save a ton of tokens at the cost of this band aid being sort of leaky and gross.

But the larger problem is sound, and the answer is something jointly optimized (idk how they do the routing) but it’s hard to shoehorn it into the current paradigm.

UltraSanetoday at 2:43 PM

Modern LLMs are nothing but band-aids, starting with the absurd bandwidth of HBM3 RAM that makes them possible.