No worries, happy to discuss anyway :) MoE (mixture of experts), is an architecture that forces sp...

dnhkng • today at 3:54 PM • 0 replies • view on HN

No worries, happy to discuss anyway :)

MoE (mixture of experts), is an architecture that forces sparsity (not all 'neurons' are active during the forward pass.

This is pretty much orthogonal to that; it works with dense and MoE models, by repeating 'vertical' sections of the transformer stack.

alt Hacker News