It's been known for a long while that model outputs = data for training another model to cop...

cs702 • today at 6:47 PM • 3 replies • view on HN

It's been known for a long while that model outputs = data for training another model to copy the original model's behavior, also known as distillation.

What I didn't know is that the three groups mentioned "created over 24,000 fraudulent accounts and generated over 16 million exchanges with Claude, extracting its capabilities to train and improve their own models." There's some irony in that, given that Anthropic and all other established AI shops have been criticized for using copyrighted materials without permission to train their own models. I wouldn't be shocked if we subsequently find out tat every major AI shop has secretly engaged in distillation at some point in the past.

Still, wow, 24,000 accounts. I can't help but wonder, how many other AI shops have surreptitious accounts with other AI shops right now?

Replies

lejalv • today at 7:11 PM

So they did pay to distill a piratic model.

More than can be said from Anthropic et al.’s leeching of a substantial proportion of human culture

➕ show 1 reply

lumost • today at 7:02 PM

Also makes you wonder how much of the user growth could just be distillation attempts from one model vendor to another.

➕ show 1 reply

taytus • today at 7:02 PM

This reads like AI slop.

alt Hacker News

Replies