logoalt Hacker News

red2awnyesterday at 7:37 PM1 replyview on HN

The "distillation attacks" are mostly using Claude as LLM-as-a-judge. They are not training on the reasoning chains in a SFT fashion.


Replies

zozbot234yesterday at 7:45 PM

So they're paying expensive input tokens to extract at best a tiny amount of information ("judgment") per request? That's even less like "distillation" than the other claim of them trying to figure out reasoning by asking the model to think step by step.

show 1 reply