logoalt Hacker News

kleiba2today at 9:05 AM0 repliesview on HN

> This exact characterization is possible because in output space, training dynamics can be understood through a locally linear differential equation along the realized path, where dominant eigenmodes of the evolving kernel equilibrate exponentially fast. Forcing an optimizer to slowly step through these solved directions is highly inefficient and suggests a path to analytically jump to the final network state.

But at what computational cost?