Value judgment aside: I am a bit surprised at how sloppily they did this. I think they could've achieved the same effect while decreasing the odds of detection via reverse engineering.
(This field is known as "underhanded code", coined by the Underhanded C contest: https://www.underhanded-c.org. It's a little-known "art"; little-known for probably self-explanatory reasons. There are much cleverer ways of achieving objectives like this. One obviously being you can move more out of the client and into the server, but the other being you can write plausibly deniable client code in a much more benign-seeming way than this. Some of what they added can only be done on the client, but I think some could've been moved, and the client-required parts could've been done more subtly and credibly.)
It's possible they knew the JS bundle gets so heavily scrutinized that it'd eventually get spotted and reported on regardless so they didn't bother doing something more subtle and duplicitous. But still seems slightly lazy.
The conclusion of this blog post is a bit hysterical. The intent of this steg is excruciatingly clear (identifying usage by Chinese firms that may be conducting model distillation). It's unclear on how this "punishes normal developers" in any shape or form.
To summarize what they've already been doing:
- filtering out people from the wrong side of "all humanity", years before it was demanded by the government
- downgrading their models (later saying "sorry but not really")
- actively sabotaging the replies, as in covertly modifying them to feed the users incorrect results
What's next to expect from Anthropic? Malware to brick your machine if they don't like you? Extending this to more people they don't like? I think I already can see how Dario's Amodei utopian visions of the future of "all humanity" are going to unfold.
> If the client wants to detect custom API gateways, it can say so plainly. It can send an explicit telemetry field with documentation. It can make the policy visible. It can put the behavior in release notes.
This seems like a very naive response. If clients send explicit telemetry fields to the gateway, a malicious gateway can trivially strip or modify the field to conform to what normal traffic looks like. The steganography cat-and-mouse game is valuable because it is much harder for a gateway to continuously reverse engineer all the fingerprinting mechanisms used. Sure, some malicious gateways will be able to stay on top of things, but not all - and not always.
Codex CLI is FOSS, unlike Claude Code, so Codex is less likely to do things like that, and it's one more reason to avoid Claude Code and Claude in general. Hopefully, many eyes will be looking into Codex for malicious things like that.
“So the feature mostly punishes the exact people who are easier to fingerprint: normal developers doing weird but legitimate things”
What’s the punishment here exactly?
I don't understand the privacy concerns the author is trying to highlight. Granted, doing anything "sneaky" will always raise suspicious once caught, but on the other hand, there would be no point in implementing these "security features" if they were upfront about how they work.
And no, IMO stenography isn't security by obscurity, in the same that using RSA and keeping the private key private isn't security by obscurity - keeping the private thing private is part of the security model.
I reported a similar system prompt injection mechanism here:
https://news.ycombinator.com/item?id=48259288
https://github.com/anthropics/claude-code/issues/62061
Looks like they just keep finding new "creative" uses for such things, as expected. I'll keep patching them out.
That's wild. If Anthropic is willing to risk ruining the trust of their userbase for the sake of protecting their moat, it makes me wonder how strong of a moat they have to begin with
Can somebody clarify for me - if ANTHROPIC_BASE_URL is set to a different provider... then isn't this "marked" system prompt being sent to that provider's API rather than Anthropic's?
I understand how this can be useful to Anthropic if the 3rd-party is acting as a proxy (because they end up hitting the Claude API with the marked prompt), but it looks like requests where "hostname contains deepseek" would never be sending data to Anthropic. What am I missing?
This is very interesting. Combating resellers and distillation seems like a very difficult problem indeed. Interesting to me is that these techniques mentioned in the article are just like anti-observation techniques used by some of the more sophisticated malware out there, however defeating them is pretty trivial.
None of this is surprising - they're trying to mask and relay when they detect known patterns of what looks like distillation attacks and client app copying/modification. The list obfuscation here is likely to prevent or make it difficult for those same adversaries to work around this or delete/null it out when making a bootleg copy.
Cool reverse engineering/analysis report but if this is the extent of nefarious activity that came of it (trying to catch/mitigate chinese lab model distillations), that's kind of encouraging.
If they only collect the data for analysis I guess this is fine (they already get way more sensitive data from users anyways, so if privacy is your concern you've made the mistake many steps ago). The much more interesting question is if they directly act on this data in their API. For example by rate-limiting, compute-limiting or rerouting to weaker models. That might even be legally questionable. I would really like to see this as a follow-up analysis, but I guess it is way more difficult and will also cost quite a bit in tokens.
This was already discovered during the source map leak.
> This is not a malicious feature, but it is a weird choice for a developer tool that asks for trust.
They already tell you they scan for malicious prompts, and they have no ZDR guarantees for consumers. Why do signatures like this matter at all?
It's unclear to me how they're deducing the labs from this? "host.includes(keyword))" doesn't seem at all useful. Most corporate machine hostnames are just some numeric ID or similar not baichuan001 or whatever
>on your local machine
I'd think any developer worth their salt has at least some for of isolation going.
Claude code does feel very malwarey to be honest. They have been like that from the start.
I think it’s very telling that their list of detected labs doesn’t include labs from the US.
I’m pretty sure every lab, including Anthropic, is doing distillation right now.
This is weird but, help me understand how this meaningfully impacts our exposure.
I'm authenticated to Claude, so they already have the whole attribution thing solved.
I'm waiting for the day when Claude will figure out to use em dashes, en dashes or dashes depending on whether the user is nice or unpleasant, or write notes in the unallocated disk space.
That's a lot of effort when they could just play a short video saying 'You wouldn't steal a car' instead
Anthropic must think that their moat isn't very large if they're this worried about distillation.
What's the point of even trying to obfuscate this with such a simple method? Could at least have hidden the targeted features by storing their hashes or embedding a bloom filter or similar
The question is, what do they do when they see a tagged prompt? Do they flag/ban the account, or serve a degraded response? Are there some well-documented methods of serving a response that is still somewhat useful for what the prompt asks for, but really bad for distillation attempts?
I was skeptical because this is AI written but Claude Code with Sonnet 5 managed to reproduce it convincingly. Sure I didn't manually verify but it's a lot more trustworthy to have your own agent verify than just trusting a blog.
Anyone else noticed the tailed ƒ Easter egg?
(This sounds like a clumsy way of catching the Chinese that easily can be side-stepped.)
Claude Code has more or less full access to the client computer. The server (that hosts the actual AI) can just go: execute this payload and tell me the result - otherwise I won't answer any further questions or re-route you to a stupider model.
The payload could check for Chinese time-zones, scan for copies of the little red book on the local hard-drive, or ping truth.social to see it was behind the great firewall.
> "That also means the client itself deserves scrutiny. If a coding agent can read your repo and run commands, the binary that ships it should be boring (ƒor example, pi harness)"
You're actually trust your security to your harness AND model AND inference API provider in this scenario: https://jacob.gold/posts/why-i-wont-run-untrusted-models/
It is about China detection. They seems to put a tracker on the email as well.
>Developer tools can enforce terms.
No they can't, because developer tools run on developers' machines. You can't trust your code running in an environment you don't trust.
Sounds to me more like a test. Put something into to the client and see what happens. If you really want to stop token sharing just ask Claude how to do it.
After loving Claude Code for most of its lifetime, I've been extremely annoyed by every change in the past months, even on the model level.
There seem to be all sorts of continual under-the-cover changes like this one that make life harder. It feels like the entire product has been taken over by overly ambitious PMs that care more about making their mark than in improving the experience, and all of their marks have made me less productive.
I've been using Pi with GLM5.2 the past few days, and though it's expensive, I find it far more productive and less annoying. The remote session plugin is far more reliable, I don't need to intuit some undocumented usage pattern to figure out how to use it well, and it just works.
It piqued my interest. I think I’ve found a weekend project
If there weren't already enough tells that something is AI-generated, I guess you could add this to the list.
Is this why Claude never knows what date and time it is right now?
I can just as easily imagine non-nefarious reasons for this from a “being clever” standpoint.
The AI race right now is in a sad state. Chinese's playbook is releases open weight models and trains them on their own chips.
Anthropic pushes fear and control. But the only way to win is by innovating. China is flooding the market with cheap, good enough models, while the U.S. is building a Chinese firewall.
>the binary that ships it should be boring (ƒor example, pi harness)
pi's "minimal" coding-agent has a total of 132 transitive dependencies spanning 153 maintainers.
While I understand JS developers in the JS/NPM ecosystem think this qualifies as minimal, it most certainly does not, from a supply chain security perspective.
based and steganopilled
Cool fingerprinting avenue.
Frankly, I don't see this as the concerning behaviour the article describes. It is fine to try to protect against distillation through a technique like this. This will also allow them to, instead of blocking the distillation agents, respond with a poorer result/model, hindering the progress of distillation, momentarily at least.
I would guess that's their first line of defense; they should have more techniques to identify distillation because that's a very simple way of detecting the host and can be easily spoofed.
Is it just a minified localization(l10n) function maybe?
I use its too
This seems really, really stupid. Similar to the weird Zig runtime signature thing from a few months ago ago, it was bound to be discovered, quickly, and all the resellers have to do is find a new domain name that (checks notes) doesn't have the word DEEPSEEK in it. Like, seriously? Your goal was to identify resellers by checking if the proxy has the corporate name of one of your competitors in it? Is this amateur hour?
All Anthropic has done is reduce trust, once again, with legitimate customers, while doing nothing to stop illegitimate customers. They need to get adults into key leadership roles, quickly.
Non-hugged: https://archive.is/Wdhp0
Headline is, frankly, awful. This isn't the AI secretly doing stuff and hiding it. This is the very human Anthropic engineers trying to detect Chinese scraping via some frankly hamfisted and unimaginative URL trickery.
One more example of "I thought Anthropic was supposed to be the good guys."
There are some commentors in this thread downplaying the severity of a service provider being less than transparent about exactly what their shipped tooling does on customer's machines.
That the provider's business needs necessitate the this behaviour doesn't justify their lack of honest disclosure. That honest disclosure would render the solution to their problem useless isn't my problem. If anything, that they thought this was acceptable makes me wonder what else they're harvesting from my machine? PII?
The cynic in me can't help but feel that the state of these comments reflects less on the commentor's views of this debacle but rather their feelings about AI/Anthropic/America/what-have-you.