Distilled models are necessarily behind so long as models are progressing. Models are progressing. M...

nonethewiser • today at 1:21 AM • 2 replies • view on HN

Distilled models are necessarily behind so long as models are progressing. Models are progressing. Maybe it will be over some time in the future.

And Berkeley’s “False Promise of Imitating Proprietary LLMs” found imitation closes the style gap fast but there is a large capability gap.

https://arxiv.org/abs/2305.15717

Replies

lebovic • today at 1:29 AM

Curiously, this isn't always true.

For example, GLM 5.1 is more capable at pentesting than the model from which it is alleged to have been distilled [1].

Intuitively, this makes some sense: you can "distill" from multiple frontier models, and you can further post-train the distilled model. But I'm not sure exactly what happened with GLM 5.1.

[1]: https://dualuse.dev/posts/chinese-models-are-sometimes-bette...

➕ show 1 reply

Gigachad • today at 4:36 AM

I'm ok with having last months model at a tiny fraction of the price.

alt Hacker News

Replies