logoalt Hacker News

dwrodriyesterday at 7:12 PM1 replyview on HN

Admittedly probably some aggrandized boasting here, but I think empirical verification of that Adam modification alone would be a meaningful contribution, unless that's prior work?


Replies

317070today at 4:57 AM

A theory that skips the parameter space, and understands grokking theory, comes up with an unexplained update rule, which notably works on a per-parameter level by dropping the updates for most parameters.

I suspect there is going to be a lot of handwaving to actually go from eNTK to that new update rule.

I also doubt it helps in the non-grokking regime, given the focus of the theory, which is where all the practical applications I have ever heard from live.

Don't get me wrong, I did enjoy reading this essay. It's well written and reasonably argumented without going into details.

show 1 reply