This is why I don’t understand the concerns about “our AI overlords” monopolizing all the gains from AI. It doesn’t seem like there’s much of a moat around the models themselves. So the race is mainly about compute. But compute is subject to power law effects. I remember Intel building the first Teraflop computer (ASCI red) in 1996. It was the size of a house. By 2014 you had more compute and 50% more memory in an off the shelf dual processor server system.
Hey, Alanis Morissette, this one is ironic.
Wait so they're upset that people used their IP to train a model without their consent or paying them anything?
or is this just about the token reselling?
To quote an infamous cop in the UK, I don't think you are mate.
⢰⣶⣶⣤⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⢻⣿⣿⡏⠉⠓⠦⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⣀⣀ ⠀⠀⢹⣿⡇⠀⠀⠀⠈⠙⠲⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣠⡴⠖⢾⣿⣿⣿⡟ ⠀⠀⠀⠹⣷⠀⠀⠀⠀⠀⠀⠀⠙⠦⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⣤⠶⠚⠋⠁⠀⠀⣸⣿⣿⡟⠀ ⠀⠀⠀⠀⠹⣇⠀⠀⠀⠀⠀⠀⠀⠀⠈⠓⢦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡴⠖⠋⠁⠀⠀⠀⠀⠀⠀⠀⣿⣿⠏⠀⠀ ⠀⠀⠀⠀⠀⠙⣦⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢦⡀⠀⠀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⠀⣀⡤⠖⠋⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣸⡿⠃⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠈⢳⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⠉⠉⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠈⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣰⡟⠁⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠙⢦⡀⠀⠀⢀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡴⠋⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⣠⡿⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠠⡄⠀⠀⢀⡴⠟⠁⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠟⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢹⣦⠾⠋⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢠⠏⠀⠀⠀⠀⣠⣴⣶⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢠⡴⣶⣦⡀⠀⠀⠀⠀⠀⠹⣆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡏⠀⠀⠀⠀⠀⣯⣀⣼⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣿⣄⣬⣿⡇⠀⠀⠀⠀⠀⠀⠘⣧⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⣼⠁⠀⠀⠀⠀⠀⠻⣿⡿⠏⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⠿⠿⠟⠀⠀⠀⠀⠀⠀⠀⠀⢹⣇⠀⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⢀⡇⠀⢀⣀⣀⡀⠀⠀⠀⠀⠀⠀⠀⠀⢰⣷⣶⠤⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⢿⡀⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⢸⢁⡾⠋⠉⠉⠙⢷⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣴⠞⠋⠉⠛⢶⡄⠀⠀⠘⡇⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⣿⠸⣇⠀⠀⠀⠀⣸⠇⠀⠀⠀⠀⠀⢀⣠⠤⠴⠶⠶⣤⡀⠀⠀⠀⠀⠀⠀⣇⠀⠀⠀⠀⢀⡇⠀⠀⠀⢿⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⢿⠀⠉⠳⠶⠶⠞⠁⠀⠀⠀⠀⠀⠀⢾⡅⠀⠀⠀⠀⠈⣷⠀⠀⠀⠀⠀⠀⠙⠷⢦⡤⠴⠛⠁⠀⠀⠀⢸⡀⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠈⣧⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣤⡀⠀⠀⣠⠟⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⡇⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⣷⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠙⠛⠋⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣇⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⣇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢹⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⣇⣀⣀⣀⣠⣠⣠⣠⣠⣀⣀⣀⣀⣀⣀⣄⣄⣄⣄⣄⣠⣀⣀⣀⣀⣠⣠⣠⣠⣠⣠⣀⣀⣀⣀⣀⣼⡆⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠀⠀⠀⠀⠀⠀⠀
It's not fair when others do it.
I mean I believe in protecting your company's IP, but IP and patent law is absurd these days, designed to protect investors and their fake money rather than actual inventors (who usually get no proceeds/are shafted).
They trained from the internet, so if someone trains from them it's fair game. Their clever tech should be in the mechanism with which it uses to provide an answer, not the answer itself.
Alibaba did a research on Anthropic capabilities? Interesting.
Please. These AI companies scraped everything under the sun. It's only fair that they get distilled into open weights models. Their own models should have been open weight from the start.
Call the wambulance a company that stole all of humanities public data to train a model is mad that someone used their model to train another model.
Give me a break. Every employee of anthropic is going to have $20m or more at the IPO.
I found out today that an employee of the home care agency I own is homeless. We are trying to figure out how to help her but it's shockingly common in the industry and there are limited resources to solve the reality of working homelessness.
Oh gee, I've misplaced my world's smallest violin.
Something something about benefiting all humanity
People prefer Chinese models to US models. Looks like it is a counterattack.
How do I donate my logs
If true then Alibaba is doing us a public service, good job, I hope this extraction was successful.
What goes around comes around!
What goes around, comes around.
Oh, Alibaba destilled data without consens out of Anthropics models that are trained with data from the internet without consens? Who cares?!
What goes around, comes around.
Has anyone else noticed that Deepseek v4 running in Claude Code will try to read, list, tail as many files/logs/... as it can for even the most simple tasks?
Booohooo the people who stole everything they have want to cry about having what they produced stolen???
This is like a Gardner complaining that you watch him as he works to learn his craft. My dude you do not have to take the job, but most people just accept it as the way the world works. If they feel like they do not want to serve the Chinese they can do that on their own, why do they need the government?
Haha cry us a river Anthropic.
ooohhh nooo... anyway...
As a Open Source contributor who was never asked by Anthropic or OpenAI if they could use my work in their training datasets, this sounds so deliciously ironic.
Why is it called "distillation" when it seems to be "scraping"? (as in web scraping)
When bots open the same board 1 million times per day it is web scraping to train the AI model and OK. When someone asks 150 thousand questions it is now distilling.
On an unrleated note, 150k qieries feels like nothing?
Scrapers seem to account for 50% total internet trafic.
Do they use different methodology since it is suddenly bad when scraping happens to them?
Thieves complaining about theft and then gaslighting the victims; rich, but not smooth.
is there a good recipe or guide on doing a successful distillation these days?
Where did Anthropic get all their training data? Funny that these companies care about the sanctity of IP all of a sudden.
Why would it not be fair use?
If it's out there on the internet it's ok to use it for training, independently of what the licenses or the TOS say.
If not, then we should look at Alibaba, but we should look at Anthropic as well.
Can we finally just nope out of this closed model of AI development?
It should all be open source with each gain shared and celebrated by all.
It's so funny how LLMs, which trained on millions of books, stolen (and even if they weren't, which they were, pirated from online pirate sites like libg and annas, they didn't have consent for the VAST majority of them), and stolen code, and stolen comments, etc.
Now complain about their stuff getting "stolen"... lol.
Perhaps this is related to the "Mythos is too dangerous and cannot be exported" movements? It'd be a fairly effective way to justify extreme actions in combating it.
One could even wonder if they requested it, as a tactic to support their eventual IPO valuation.
Which is part of the problem of such an obviously-corrupt government: conspiracy theories are somewhat reasonable, as they keep getting validated.
Karma is a thing.
Please, honor among thieves!
So let me get this straight, a company which built its whole business on ignoring IP is all of a sudden upset that somebody is not respecting their IP?
AI is awesome tech but it’s also to some extent mass piracy. The models are trained on huge amounts of material with dubious or non existent rights.
I have a hard time being concerned about “you pirated my piracy.”
I hold the view that many of these models should not be copyrightable. Anthropic and all the others talk about “safety” but you never hear them bring up attribution of the data that trained the model or compensation of anyone for it.
It's hard to sympathize with Anthropic for this or the export ban, the hype over model capabilities probably fuels both things (in some ways). Training data for me, but not for thee (at any scale) doesn't seem like a tenable position. If anything, Claude's constitutional outputs should be trained on more rather than less.
And Alibaba is releasing the full model weights open source under Apache 2.0, Anthropic… fuck that company.
Anthropic training their models full of copyright data, so?
Model makers need to get off their high-horse, and face the reality that they are selling a commodity.
More distillation please. This is only good for me.
the biggest irony of 21st
if they’re paying for the tokens, what’s the problem