>We show that a variety of modern deep learning tasks exhibit a "double-descent" phenom...

noosphr • today at 12:38 PM • 2 replies • view on HN

>We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better.

Replies

smallerize • today at 2:53 PM

Does this mean that if your model is "overfitting", the solution is to train for even more epochs?

ForceBru • today at 1:43 PM

Right, isn't double descent one of the reasons why modern Extremely Large Language Models work at all? I think I heard somewhere that basically all today's "smart" (reasoning, solving math problems, etc) LLMs are trained in the "double descent" territory (whatever this means, I'm not entirely sure).

➕ show 2 replies

alt Hacker News

Replies