Boris from the Claude Code team here. We agree, and will be spending the next few weeks increasing our investment in polish, quality, and reliability. Please keep the feedback coming.
Sure, I've cancelled my Max 20 subscription because you guys prioritize cutting your costs/increasing token efficiency over model performance. I use expensive frontier labs to get the absolute best performance, else I'd use an Open Source/Chinese one.
Frontier LLMs still suck a lot, you can't afford planned degradation yet.
My biggest problem with CC as a harness is that I can't trust "Plan" mode. Long running sessions frequently start bypassing plan mode and executing, updating files and stuff, without permission, while still in plan mode. And the only recovery seems to be to quit and reload CC.
Right now my solution is to run CC in tmux and keep a 2nd CC pane with /loop watching the first pane and killing CC if it detects plan mode being bypassed. Burning tokens to work around a bug.
Can you please fix https://github.com/anthropics/claude-code/issues/43713 ?
Here's one person's feedback. After the release of 4.7, Claude became unusable for me in two ways: frequent API timeouts when using exactly the same prompts in Claude Code that I had run problem-free many times previously, and absurdly slow interface response in Claude Cowork. I found a solution to the first after a few days (add "CLAUDE_STREAM_IDLE_TIMEOUT_MS": "600000" to settings.json), but as of a few hours ago Cowork--which I had thought was fantastic, by the way--was still unusable despite various attempts to fix it with cache clearing and other hacks I found on the web.
hm. ml people love static evals and such, but have you considered approaches that typically appear in saas? (slow-rollouts, org/user constrained testing pools with staged rollouts, real-world feedback from actual usage data (where privacy policy permits)?
I am considering proving my feedback by not providing my money any longer.
> Please keep the feedback coming
if only there were a place with 9.881 feedbacks waiting to be triaged...
and that maybe not by a duplicate-bot that goes wild and just autocloses everything, just blessing some of the stuff there with a "you´ve been seen" label would go a long way...
Why ban third party wrappers? All of this could've been sidestepped had you not banned them.
And you didn't invest anything in polish, quality and reliability before... why? Because for any questions people have you reply something like "I have Claude working on this right now" and have no idea what's happening in the code?
A reminder: your vibe-coded slop required peak 68GB of RAM, and you had to hire actual engineers to fix it.
[dead]
Thanks, I have a lot of trust in and admiration for the team & respect for the work you guys have done and continue to do.
> investment in polish, quality, and reliability
For there to be any trust in the above, the tool needs to behave predictably day to day. It shouldn't be possible to open your laptop and find that Claude suddenly has an IQ 50 points lower than yesterday. I'm not sure how you can achieve predictability while keeping inference costs in check and messing with quantization, prompts, etc on the backend.
Maybe a better approach might be to version both the models and the system prompts, but frequently adjust the pricing of a given combination based on token efficiency, to encourage users to switch to cheaper modes on their own. Let users choose how much they pay for given quality of output though.