logoalt Hacker News

appplicationtoday at 4:57 AM2 repliesview on HN

Not a mathematician so I’m immediately out of my depth here (and butchering terminology), but it seems, intuitively, like the presence of a massive amount of local minima wouldn’t really be relevant for gradient descent. A given local minimum would need to have a “well” at least be as large as your step size to reasonably capture your descent.

E.g. you could land perfectly on a local minima but you won’t stay the unless your step size was minute or the minima was quite substantial.


Replies

sdenton4today at 2:42 PM

The randomness (and exploration) encouraged by batch training also helps avoid 'real' minima, if they exist.

fc417fc802today at 5:57 AM

I believe what was meant was that assuming local minima of a sufficient size to capture your probe, given a sufficiently high density of those, you become extremely likely to get stuck. A counterpoint regarding dimensionality is made by the comment adjacent to yours.