| alt Hacker News

andbberger • today at 3:33 AM • 3 replies • view on HN

https://en.wikipedia.org/wiki/Universal_approximation_theore...

the better question is why does gradient descent work for them

Replies

The properties that the uniform approximation theorem proves are not unique to neural networks.

Any models using an infinite dimensional Hilbert space, such as SVMs with RBF or polynomial kernels, Gaussian process regression, gradient boosted decision trees, etc. have the same property (though proven via a different theorem of course).

So the universal approximation theorem tells us nothing about why should expect neural networks to perform better than those models.

➕ show 2 replies

hansvm • today at 3:18 PM

Interestingly, there exist problems which provably can't be learned via gradient descent for them.

fc417fc802 • today at 4:22 AM

I don't follow. Why wouldn't it work? It seems to me that a biased random walk down a gradient is about as universal as it gets. A bit like asking why walking uphill eventually results in you arriving at the top.

➕ show 1 reply