logoalt Hacker News

kylemaxwelltoday at 4:21 PM1 replyview on HN

From the abstract, it looks like it's actually doing something deeper, updating weights in part of the model?


Replies

samsartortoday at 7:07 PM

The abstract and method sections only mention updating the SSM state during "sleep" (ie the same vectors that change after each token in stock Mamba) not any of the actual weight matrices. AFAICT this is just another attention compaction paper, with misleading tile? It is not very clearly written