logoalt Hacker News

fraywingyesterday at 8:05 PM1 replyview on HN

So is this basically a task-specific MoA transformer arch with a DNN that helps make routing decisions? Trying to understand this.


Replies

yoevenyesterday at 11:31 PM

The other way round, task specific DNNs adapted to share the same vector space as omni-transformers with generalized vision, audio encoders.

E.g. For an OCR task, the first pass will be handled by the CNN, converted to shared tokens which the transformer can consume, correct any issues if needed and a decoder that can handle both the DNN and transformer output.