logoalt Hacker News

kgeisttoday at 12:36 AM1 replyview on HN

>Experiments at Cactus showed that MLPs can be completely dropped from transformer networks, as long as the model relies on external knowledge source.

Heh, what a coincidence, just today one of my students presented research results which also confirmed this. He removed MLP from Qwen and the model still could do transformation tasks on input but lost knowledge.


Replies

mlpersontoday at 8:09 AM

Sounds very interesting!