I'm finding the "adaptive thinking" thing very confusing, especially having written code against the previous thinking budget / thinking effort / etc modes: https://platform.claude.com/docs/en/build-with-claude/adapti...
Also notable: 4.7 now defaults to NOT including a human-readable reasoning token summary in the output, you have to add "display": "summarized" to get that: https://platform.claude.com/docs/en/build-with-claude/adapti...
(Still trying to get a decent pelican out of this one but the new thinking stuff is tripping me up.)
> Still trying to get a decent pelican out of this one but the new thinking stuff is tripping me up
Wouldn't that be p-hacking where p stands for pelican?
Note that for Claude Code, it looks like they added a new undocumented command line argument `--thinking-display summarized` to control this parameter, and that's the only way to get thinking summaries back there.
VS Code users can write a wrapper script which contains `exec "$@" --thinking-display summarized` and set that as their claudeCode.claudeProcessWrapper in VS Code settings in order to get thinking summaries back.
Does this mean Claude no longer outputs the full raw reasoning, only summaries? At one point, exposing the LLM's full CoT was considered a core safety tenet.
yeah they took "i pick the budget" and turned it into "trust us".
"Also notable: 4.7 now defaults to NOT including a human-readable reasoning token summary in the output, you have to add "display": "summarized" to get that"
I did not follow all of this, but wasn't there something about, that those reasoning tokens did not represent internal reasoning, but rather a rough approximation that can be rather misleading, what the model actual does?
The reasoning modes are really weird with 4.7
In my tests, asking for "none" reasoning resulted in higher costs than asking for "medium" reasoning...
Also, "medium" reasoning only had 1/10 of the reasoning tokens 4.6 used to have.
... here's the pelican, I think Qwen3.6-35B-A3B running locally did a better job! https://simonwillison.net/2026/Apr/16/qwen-beats-opus/
> Also notable: 4.7 now defaults to NOT including a human-readable reasoning token summary in the output, you have to add "display": "summarized" to get that
That’s extremely bothersome because half of what helps teams build better guardrails and guidelines for agents is the ability to do deep analysis on session transcripts.
I guess we shouldn’t be surprised these vendors want to do everything they can to force users to rely explicitly on their offerings.
Don't look at "thinking" tokens. LLMs sometimes produce thinking tokens that are only vaguely related to the task if at all, then do the correct thing anyways.
https://github.com/anthropics/claude-agent-sdk-python/pull/8... - created PR for that cause hit it in their python sdk
If you do include reasoning tokens you pay more, right?
Prompts seem to need to evolve with every new model.
Claude Opus 4.6 has been hilarious for me so far: https://i.imgur.com/jYawPDY.png
[dead]
It's likely hiding the model downgrade path they require to meet sustainable revenue. Should be interesting if they can enshittify slowly enough to avoid the ablative loss of customers! Good luck all VCs!
[flagged]
[flagged]
[dead]
Its especially concerning / frustrating because boris’s reply to my bug report on opus being dumber was “we think adaptive thinking isnt working” and then thats the last I heard of it: https://news.ycombinator.com/item?id=47668520
Now disabling adaptive thinking plus increasing effort seem to be what has gotten me back to baseline performance but “our internal evals look good“ is not good enough right now for what many others have corroborated seeing