> This exact characterization is possible because in output space, training dynamics can be un...

kleiba2 • today at 9:05 AM • 0 replies • view on HN

> This exact characterization is possible because in output space, training dynamics can be understood through a locally linear differential equation along the realized path, where dominant eigenmodes of the evolving kernel equilibrate exponentially fast. Forcing an optimizer to slowly step through these solved directions is highly inefficient and suggests a path to analytically jump to the final network state.

But at what computational cost?

alt Hacker News