logoalt Hacker News

cherryteastaintoday at 2:52 PM1 replyview on HN

A related viewpoint is that overparametrization is good because the model is stranded when the Hessian has all positive/zero eigenvalues. If we treat the probability that a particular Hessian eigenvalue turns positive as a Bernoulli process, the chance of all eigenvalues going positive/zero exponentially decreases as the parameter count increases

[1] https://arxiv.org/abs/1406.2572


Replies

david-gputoday at 3:04 PM

You don't need billions of parameters for that, precisely because the risk of being stuck at a local minimum decreases exponentially with the number of parameters. Right?