logoalt Hacker News

yorwbatoday at 7:12 AM0 repliesview on HN

The handwaving required is just to assume a diagonal preconditioner, and the optimal preconditioner under that constraint corresponds to the new update rule. (See section F of the paper.) And of course a diagonal preconditioner works on the per-paramer level.