logoalt Hacker News

cookiengineertoday at 4:10 AM0 repliesview on HN

In my opinion current research should focus on revisiting older concepts to figure out if they can be applied to transformers.

Transformers are superior "database" encodings as the hype about LLMs points out, but there have been promising ML models that were focusing on memory parts for their niche use cases, which could be promising concepts if we could make them work with attention matrixes and/or use the frequency projection idea on their neuron weights.

The way RNNs evolved to LSTMs, GRUs, and eventually DNCs was pretty interesting to me. In my own implementations and use cases I wasn't able to reproduce Deepmind's claims in the DNC memory related parts. Back at the time the "seeking heads" idea of attention matrixes wasn't there yet, maybe there's a way to build better read/write/access/etc gates now.

[1] a fairly good implementation I found: https://github.com/joergfranke/ADNC