logoalt Hacker News

aesthesiayesterday at 9:38 PM0 repliesview on HN

Not quite an answer to your question, but you might find this interesting. The Olmo Hybrid paper has some results on relative complexity of problems that can be solved by transformers and RNNs. They don't look at size, just solvability, and find that the sets of problems solvable by the two architectures are incomparable. They actually use these results to inform their architecture design, which includes both attention and state space layers. (Specifically, they choose gated delta-nets with negative eigenvalues, which they show have greater expressivity than those without.)

https://arxiv.org/abs/2604.03444