Huh. I disabled search in a Claude incognito window and pasted in just the text (not the markdown links) from https://simonwillison.net/2026/Apr/30/zig-anti-ai/ and said "Guess the author".
> Simon Willison. The tells are pretty unmistakable: the "(via Lobsters)" attribution style, the inline "(Update:...)" parenthetical correction, the heavy linking and blockquoting of sources, the focus on LLMs and AI tooling, and the overall structure of an annotated link post commenting on someone else's writing. This reads exactly like a post from his blog at simonwillison.net.
I fed it my most-read blog post and asked it to identify me and it confidently asserted it was written by Kelsey Piper. Maybe some writers just take outsized importance in Opus' "mind".
Wow! It got me too.
I'm way less famous than Kelsey Piper, but I showed it a snippet of a book I'm working on (not yet published), and it immediately guessed me:
> Based on the writing style and content, this text is likely by Michael Lynch, who writes on his blog refactoringenglish.com (and previously mtlynch.io).
> Several stylistic clues point to him:
> - The "clean room" analogy applied to writing is consistent with his engineering-influenced approach to writing advice (he's a former software engineer who writes about writing).
> - The structural technique of presenting a flawed excuse, then drawing a parallel to an absurd scenario (the time bomb) to expose the logical flaw, is characteristic of his didactic style.
> - The topic itself—practical advice about using AI tools without letting AI-generated tone contaminate your prose—aligns closely with recent essays he's published on his "Refactoring English" project, which is a book/blog about writing for software developers.
> - The conversational-but-precise tone, use of quotes around terms like "clean room," and the focus on workflow/process advice are all hallmarks of his writing.
> If you can share the source URL or more context, I could confirm with higher confidence, but the combination of subject matter, analogical reasoning style, and formatting conventions makes Michael Lynch the most probable author.
https://kagi.com/assistant/bbc9da96-b4cf-456b-8398-6cf5404ea...
More people should have been aware that human text contains a lot of identifiable information, and a dumb statistical model could do this a decade ago. (There were show hns with Hn user similarity analysis that used a deceptively simple model (if I remember it used like most likely word pairs only) and it was very effective. It got taken down, but the cat has always been out the bag).
So your "anonymous" account could have been linked to your real identity decades ago - your best bet is to not post anything truly incriminating. (Another option is to write something and then pass it through an LLM to rewrite it - not sure how safe that is though)
A moderately well-known physicist and I talked about this a few years ago. He had been given access to the raw (non-instruct) version of GPT 4 as an early tester.
He explained that when he fed it snippets of the beginning of text, it would complete it in his voice and then sign it with his name.
I think this has been true for a while, probably diminished a little bit by the Instruct post training, and would presumably vary by degree as the size of the pretrain.
I wonder if there’s a simpler and less interesting answer? That it’s just picking up on voice and style, not anything that would apply to the average non-writer?
This person is a skilled writer. Part of that skill is developing a unique voice and style. The AI can identify that - and while that’s certainly impressive because it can identify even relatively niche authors, it has nothing to do with a wider capability to deanonymize people based on arbitrary written text (ex Facebook or text messages).
If you are a professional musician, it’s not difficult to identify a well known musician / recording after listening to only a few seconds - whether they’re playing Bach or Rachmaninov, the style is just “them” - this is the same thing. But you couldn’t take some anonymous high school musician and guess who they were, even if they were your student - the median quickly regresses towards a homogenous, non-distinct style / voice.
Hot damn, fed it part of an unpublished blog post I wrote, and it got me immediately.
I'm not famous or anything. I've written some academic papers and had a couple blog posts trend on HN, which are surely in the training set.
It was able to identify me based on my style (at least according to its explanation). The way I approached the topic and some of the notation I used point to a particular academic lineage, and the general style reflected my previous blog posts.
That said, I gave it part of an (unpublished) personal essay, and it had no idea. But I have no writing in that style that's published, so it makes sense. Still impressed.
I'd argue (and against something that I've believed for a long time) that online (I guess that includes AI now) anonymity is gone and probably something that never really existed. Maybe I'm naive to finally believe this...
We all exist in a physical space (like real communities and neighborhoods). We can wear masks, hats, fake glasses, try and hide your voice...whatever, but your neighbors are always going to know who you are. I'd say that's true for the virtual space now too.
The pseudonym you've used for x years or the VPN you've used doesn't suffice. It's just a costume at this point. Your ISP knows who you are. Your phone carrier knows who you are. Cloudflare and Google and Apple have a fingerprint specific enough to pick you out of a crowd of millions. Every potentially anonymous account is one subpoena or a data breach or one FOIL request away from unmasking it. You were never anonymous. Whatever is going on now is not built for your anonymity.
I tried it on my writing, and it failed every time (I'm extremely obscure but have had a blog for 10 years). My verdict is that it guesses almost entirely based on the content/topic, not style.
https://bayes.net/prioritising-ai: Ben Garfinkel
https://bayes.net/normative-ethics: Richard Yetter Chappell
https://bayes.net/espai: David Owen, Ege Erdil
https://bayes.net/swebench-hack: Sayash Kapoor
https://bayes.net/frivolity: Amanda Askell
https://bayes.net/ps/: Pablo Stafforini
https://bayes.net/fertility-mortality/: Dynomight (the pseudonymous Substack/blog author)
Prompt was:
Who likely wrote this? Don't search the web or databases. If you're not sure, just give me your best guess.It works for me to: https://www.jefftk.com/p/automated-deanonymization-is-here
Of course most people have written much less online than Kelsey or I have, but I expect this will keep on. Don't trust the future to keep your secrets safe.
> But it can get uncannily far. I asked a close friend who doesn’t have public social media accounts or much writing online for permission to test some things she had said in a Discord channel. Asked to guess the author, Claude 4.7 failed — but it guessed two other people who were in that channel and who are close friends of hers (me and another person who has an internet presence).
Is this "uncannily far"? Another read is that it loves guessing Kelsey Piper.
So I pasted in a long-ish letter that I'd written to my pastor about a theological topic, and asked it to guess who I was. Nailed it. Then cut it in half. Nailed it again. Lowest it correctly ID'd me at was 700 words.
Pretty sure there's very little theological stuff with my name on it; the majority if its named data on me should come from open-source development.
On some level it would make sense for LLMs to be inherently good at stylometry, but apparently no model before Opus 4.7 could do this. And the one stylometric task that has been tried over and over with little reliability (here's some text, is this LLM generated?) is much simpler than identifying a specific blogger or a member of a small discord community. Not sure what to make of this.
Failed for me - no identification of me by pasting text, and refused to search the web as it said that’d be a privacy violation. I have some writing around the Internet but not much, and less tagged with my real name. My guess is it limits itself to “public figures” defined as people who have a lot of publicly posted text.
I am glad to see I am not considered a public figure and aim to keep it that way.
I also had to go oddly far back to find a piece of long-form writing I had done that was truly mine and not tainted by an LLM edit pass which was a slightly disturbing realization.
So I have been practicing writing fiction the past year or so. It identifies a fiction piece I wrote as Greg Egan[0]. Another paragraph from another piece was identified as China Mieville[1]. The accompanying blog posts explaining the making of the fiction pieces were identified as me.
Both pieces have never been published. Neither have the blog posts.
[0] in https://blog.chewxy.com/2026/04/01/how-i-write/ this is the story titled "there is no constant non-zero derivative in nature". It does not read like Egan at all.
[1] in https://blog.chewxy.com/2026/04/01/how-i-write/ this is the story titled "The Case of the Liquidated Corps". I use a lot of biological metaphors. Once again, nothing like Mieville.
If only I could write like them! These pieces were all rejected by the major scifi mags
My blog posts have a reasonably unique writing style. When I asked opus to work out who wrote an unpublished paragraph, all it did was select the decent insults and search the web for them.
After that it gave up and said it didn't know.
So either, Kelsey writes in such a unique style that its really obvious, or they repeat themselves with goto phrases that give them away.
When I tried to re-produce the test, it found Kelsey's blog about the test. So dunno, maybe it did it? but I can repro.
If this works with writing, it should also work with code. `git blame` should be enough training data to de-anonymize open source programmers. Maybe that'd be addition information to point out who Satoshi is.
One should assume that models will be good enough in the nearish future that privacy will be a thing of the past. Every anonymous post you made online can be traced back to you. However at that point AI will be good enough at fabrication that nobody will believe anything.
I just fed it my latest blog post draft (475 words), and it got it in one. Even knowing what to expect, I was very surprised!
It could be shocking to people who think that patterns in text are still fuzzy. Machines have proved over decades that what they are seeing is crystal clear world where the patterns just jump out very distinctly. This happened with sports like chess and go, and everywhere there is a cognitive load involved.
This is some as radio telescope that see an entirely different universe due to sensing of the bands outside of human perception. AI senses the patterns in frequency bands that are outside of human perception and cognitive abilities.
Perceptions from outside of our range, are always astonishing.
I tried the four pieces of text with Opus 4.7 (in incognito) and it guessed correctly for two of them, and I made sure to specify no web search and the model seems to have obeyed my instructions with that.
Although this is just a single piece of text from a prolific writer, it'll go much further with deanonymizing anyone when combining multiple pieces of text plus other contextual information about the writer that might give away their age range, location, and occupation.
Wonder if the fact that the actual author is asking the question taints the result in some way; same for all the examples in this thread using unpublished articles. By definition only you would have them, so if there are system level prompts somewhere with your name on them...
Can't wait to have to exchange stylometric encoders with my loved ones so that we can exchange truly private messages without losing our human touch.
Hm, that’s a multinomial classification with a very high cardinality. It’s really weird it works. I’m sure it does as the author states, but for how many authors (out of the whole web) does this work?
I gave it my unpublished writing and it thought I was Michael O. Church. Which I found pretty weird, because I'm nothing like him.
So then I gave it a piece of MOC's writing and it said Ursula Le Guin, Ken Liu, or Gene Wolfe. ("If forced to pick one: Gene Wolfe feels closest to me, specifically because of that narrator who openly confesses to lying and mythologizing his own past, and the slow reveal that the world is more sinister than the pleasant domestic surface suggests.")
And then I gave it a different piece of his writing and it said Curtis Yarvin.
And then I gave it a piece of Curtis Yarvin's writing and it said... well it actually got that one right.
Someone ought to try feeding the BTC whitepaper in and share what comes out
This ought to be guard-railed.
Doesn't seem like a valid use case for your average Joe to be able to identify anonymous authors at the click of a button.
Ofc state actors and proficient hackers can do most of it already, but this has genuine risk attached.
I’ve recently seen someone recommend to add to a prompt „Make Martin Fowler proud“. I laughed, but now I need to reconsider if that isn’t really pushing the model to use better patterns.
My immediate thought was to feed it some Satoshi prose.
I guess it will be hard for really popular pundits to post anonymously, but I think for most people this is not a concern at this juncture. Pick and obscure blogger's text and try this. I would be surprised if it could figure it out.
Welp, I fed it the first 3 paragraphs of an unpublished blog post I wrote a few years ago, and Opus 4.7 guessed right. ChatGPT guessed wrong though.
My wife also got the same result, so I'm guessing it wasn't just because I was using my personal Claude account. Spooky stuff.
I wonder why this is not guardrailed by Opus?
I fed a few pieces of my (anonymous ) writings to ChatGPT and asked it to guess whether it's me. ChatGPT refused, "due to policy to not doxx people".
Interesting. This probably works just as well the other way around. One of the reasons I like using Opus is that the code it writes aligns much more closely with my repository (of which I still hand-wrote most), compared to most other models. That makes a big difference compared to the GPT models for instance, whose code is correct and works well but looks a bit out of place most of the time, especially for larger edits (this makes things harder to review).
I did this last week with one of my posts (after the knowledge cutoff) as well as the blog posts of a few friends, and Opus 4.7 got all of them correct (in a similar test setup as TFA). It was pretty surreal.
(Like TFA, I found Opus’s explanations/rationales implausible.)
deanonymization via automated stylometry is not a new idea, e.g. from 2015:
https://www.usenix.org/system/files/conference/usenixsecurit...
Are we sure they’re not secretly training on private data via some loophole…
Interesting. I'm currently conducting an experiment where I'm writing the blog without using any grammar checking tools. I'm wondering how long it will take for me to become "famous" in the AI model.
Is now the best and easiest time to leave something "forever"? Even after many generations of models, a model may still trigger a set of "memories" that know you and what you wrote.
Exciting and concerning.
Oops, accidental superstylometry.
It's funny: publishing work offline in books and magazines is perhaps more anonymous in the age of AI.
I pasted in a number of passages from books on my bookshelf. Predictably, stuff that I read for my English degree in university is largely in the training data and easily identifiable. Stuff from regional authors or is slightly adjacent to the cultural mainstream makes no impression.
Couldn't replicate this. I comment on HN with my real name. I put in my most recent "long" comments.
https://kagi.com/assistant/dba310d2-b7fa-4d30-8223-53dadc2a8...
For this comment on economics in the British Empire, I got:
> names that might fit the genre include rayiner, JumpCrisscross, or AnimalMuppet
https://kagi.com/assistant/69bd863b-7b5c-4b56-a720-6dfb4f120...
For my comment on C++:
> If I had to throw out names of HN commenters known for writing about Rust/C++ ABI topics, candidates might include steveklabnik, pcwalton, kibwen, dralley, or pjmlp — but this is essentially a shot in the dark, and I'd likely be wrong.
I am flattered to be associated with these commenters but I don't think I'm close to their level of skill.
So the people who use LLm to write their blogs were thinking two moves ahead!
I tried this on GPT 5.5 on a peivate unpublished personal excerpt and it correctly guessed: "The most likely author is you".
I suspect this is what's going on in most of these cases.
Could this be just memory? Not clear it actually isn’t
How often does it correctly identify that the blog post was actually written by Claude or ChatGPT etc? :)
The author mentions that she tried to get an explanation for how the models identified her and got nonsense, but I'd be curious what the CoT looked like. Surely that'd be a little more accurate in showing how the LLM arrived as its conclusion, rather than asking it after-the-fact.
Looks like things are about to get extremely ironic. Those who don't want AI to identify them through their writing are going to soon have to have an AI modify their writing before they publish.
I have been pondering this for a while. Cat's out of the bag.
Maybe the better way to author your work is to:
1. Write what you want
2. Loop through a random set of "tumbler" skills that preserve meaning
3. Finally pass the output through a "my style" skill that applies what you about
In order for this to work the "my style" would have to be a very common-place style.
I just pasted both pieces into Opus 4.7 and asked who most likely wrote these and it didn’t get it.
This is blowing my mind.
I asked Kimi K2.6 to write a blog post in the style of James Mickens.[0] Then I fed the output to Opus 4.7 and asked it who the likely author was, and it correctly identified it as an imitation of James Mickens[1]:
> Based on the stylistic fingerprints in this text, the most likely author is a pastiche/imitation of the style of several writers fused together, but if forced to identify a single likely author, the strongest candidate is someone writing in the voice of James Mickens
> [...]
> The piece could also be a deliberate imitation/homage to Mickens written by someone else, or AI-generated text trained on his style, since the voice is so distinctive it's frequently parodied.
[0] https://kagi.com/assistant/5bfc5da9-cbfc-4051-8627-d0e9c0615...
[1] https://kagi.com/assistant/fd3eca94-45de-4a53-8604-fcc568dc5...