If scrapping content is legal, model distillation should be legal too.
I suppose model distillation is technically legal, in terms of copyright, because LLM output is automatically public domain.
It's only "illegal" from a standpoint of breach of contract given its against the terms of use/service, which is to say its not illegal at all, there's no criminality there.
> If scrapping content is legal, model distillation should be legal too.
No, because legality should be determined by what's in the best interests of Athropic and OpenAI's business models.
Hopefully they're working on RLHF their models to insert clauses making that reality clear into any legislation their models generate or review. That way it's only a matter of time until the confusion is cleared up.