>Experiments at Cactus showed that MLPs can be completely dropped from transformer networks, as l...

kgeist • today at 12:36 AM • 1 reply • view on HN

>Experiments at Cactus showed that MLPs can be completely dropped from transformer networks, as long as the model relies on external knowledge source.

Heh, what a coincidence, just today one of my students presented research results which also confirmed this. He removed MLP from Qwen and the model still could do transformation tasks on input but lost knowledge.

Replies

mlperson • today at 8:09 AM

Sounds very interesting!

alt Hacker News

Replies