Neither of these strike me as particularly groundbreaking. The first idea (as I understand it as r...

in-silico • yesterday at 8:44 PM • 0 replies • view on HN

Neither of these strike me as particularly groundbreaking.

The first idea (as I understand it as retrieving token ids rather than hidden states) is going to really struggle to do useful compositional reasoning and contextual recall.

The second idea has been been done a million times, with Linear Attention being maybe the first modern example. Hyena, state-space models, DeltaNet, and LaCT also lie in different regions of the performance-parallelizability spectrum of fixed-size models.

alt Hacker News