logoalt Hacker News

ACCount37yesterday at 11:22 AM2 repliesview on HN

What the fuck. Are you a literal, honest to god distillation denier? Straight up "wake up sheeple, model distillation isn't real"?

I've seen plenty of things in the dumpsters of AI discourse, but this got to be among the most baffling.

Yes, there are "giant returns" on distilling from a more capable model into a less capable model. And even more so when the more capable model was trained for something you want and lack. Like: better coding performance.

Someone like OpenAI had to RLVR for it the hard way (and if you think "distillation is expensive", wait till you hear how many bits per rollout hardcore RLVR gets you), but you get to peek into the results of their work and copy them for yourself.

Also, Anthropic didn't redact model reasoning until Mythos. OpenAI started with o1, but Claude had reasoning chains accessible for a long time. Which is why Anthropic was more targeted than OpenAI.


Replies

HarHarVeryFunnyyesterday at 11:50 AM

So we're meant to believe that only US companies have the intelligence and/or access to manpower to generate their own reasoning data? Does China have a population deficit? Maybe China has too high wages to pay people to generate reasoning data?

The US companies bootstrapped themselves from one model generation to the next, partly by using the previous generation to generate synthetic data, etc, and partly by paying people to hand generate training data for them. Why do you apparently assume that the Chinese can't do the exact same thing?!

Surely "coding performance" is by far the easiest thing to generate your own RLVF data for, since it has trivial verifiable rewards - does the code compile and do what you want.

show 1 reply
epolanskiyesterday at 11:26 AM

If your claim is so solid, you'll have no problem pointing out data or evidence.

show 1 reply