logoalt Hacker News

big-chungus4today at 4:22 PM0 repliesview on HN

You can take the output of the matrix LSTM, which is going to be matrix for each token, and compute the SVD. To get better storage, we want U and V to be the same for all tokens, so that we can operate on the diagonal S matrix. But LSTM is likely highly nonlinear, U and V will be vastly different for different tokens.