If it can be made orthogonal, can you go a step further and diagonalize it? The storage and performa...

phkahler • today at 1:04 PM • 2 replies • view on HN

If it can be made orthogonal, can you go a step further and diagonalize it? The storage and performance improvement from that would be huge.

Replies

big-chungus4 • today at 4:22 PM

You can take the output of the matrix LSTM, which is going to be matrix for each token, and compute the SVD. To get better storage, we want U and V to be the same for all tokens, so that we can operate on the diagonal S matrix. But LSTM is likely highly nonlinear, U and V will be vastly different for different tokens.

bee_rider • today at 2:44 PM

I don’t know AI, but, weight matrices aren’t square in general, right? My first guess for something like this would be to take the SVD instead, since you can always do that, but I’m sure that’s been tried already.

alt Hacker News

Replies