logoalt Hacker News

misnomeyesterday at 6:58 PM1 replyview on HN

Yes, I’ve been evaluating since the start of the year and since 4.6 suddenly the most innocuous requests will sit there “thinking” for 5+ minutes and if I can get it to show me the thinking it’s just going round in circles.

Or, it decided it needs to get API documentation out and spends tens of thousands of tokens fetching every file in a repo with separate tool use instead of reading the documentation.

Profitable, if you are charging for token usage, I suspect.

But I’m reaching the point where I can’t recommend claude to people who are interesting in skeptically trying it out, because of the default model.


Replies

JohnMakintoday at 1:02 AM

I guess I engineered around this before 4.6 - I did notice a regression in it wanting to search deeper than I wanted and had specified, but just restricted it with tooling I wrote that would enforce what I wanted. In that respect, I feel comfortable running 4.6 with the guardrails I already have, but did notice some squirrelyness I didnt anticipate in my utility scripts.

It is clever. its its best and worst feature.