You can take the output of the matrix LSTM, which is going to be matrix for each token, and compute ...

big-chungus4 • today at 4:22 PM • 0 replies • view on HN

You can take the output of the matrix LSTM, which is going to be matrix for each token, and compute the SVD. To get better storage, we want U and V to be the same for all tokens, so that we can operate on the diagonal S matrix. But LSTM is likely highly nonlinear, U and V will be vastly different for different tokens.

alt Hacker News