"why do neural networks work better than other models?" That sounds really interesting - any references (for a non specialist)?
https://en.wikipedia.org/wiki/Universal_approximation_theore...
the better question is why does gradient descent work for them
https://en.wikipedia.org/wiki/Universal_approximation_theore...
the better question is why does gradient descent work for them