This is a fascinating mathematical framework, but the post title might be a bit of an overreach. I ...

prideout • yesterday at 7:17 PM • 2 replies • view on HN

This is a fascinating mathematical framework, but the post title might be a bit of an overreach. I often wonder if "a theory of deep learning" could exist that could be stated succinctly and that could predict (1) scaling laws and (2) the surprising reliability of gradient descent.

Note that I said "predict" not "describe". It feels like we're still in the era of Kepler, not Newton.

Replies

sdenton4 • yesterday at 11:19 PM

I dunno... gradient descent is only really reliable with a big bag of tricks. Knowing good initializations is a starting point, but recurrent connections and batch/layer normalization go a very long way towards making it reliable.

➕ show 1 reply

jldugger • today at 4:26 AM

> but the post title might be a bit of an overreach.

Really? An essay that leads off with a Borges anecdote skewed grandiose. Oh my, how unprecedented!

alt Hacker News

Replies