logoalt Hacker News

armcattoday at 7:25 PM2 repliesview on HN

This is such an insane rabbit hole. AI labs distill weights from the entirety of the internet knowledge, (mostly) without anyone's consent, which (technically) amounts to theft. However the chinchilla law dictates you need to expend X amount of energy to make this knowledge useful. Then the data law dictates that you need to shift the weights to a more useful latent space by paying maths, coding and domain experts lots of money. So you have "stolen" the data, but then paid billions to make it useful. And useful it is!

Then another lab comes, and "steals" from you - that beautiful, refined dataoil - by distilling your weights using inferior equipment but with a toolbox of ingenuity and low-level hacking tricks. They reach 90% of your performance at 20x cost reduction.

What happens when another lab distills from the distilled lab?

Who is the thief? How far will the Alice go?


Replies

roborovskistoday at 7:36 PM

What would you define as 'distillation' versus 'learning'? How do you know that what a LLM is doing is 'distillation' vs a process closer to a human reading a book?

From my perspective, pretraining is pretty clearly not 'distilling', as the goal is not to replicate the pretraining data but to generalize. But what these companies are doing is clearly 'distilling' in that they want their models to exactly emulate Claude's behavior.

show 1 reply
joquarkytoday at 8:06 PM

> which (technically) amounts to theft

Why bother writing so many words when you lack the discipline to choose the words with correct semantics?