So is this basically a task-specific MoA transformer arch with a DNN that helps make routing decisio...

fraywing • yesterday at 8:05 PM • 1 reply • view on HN

So is this basically a task-specific MoA transformer arch with a DNN that helps make routing decisions? Trying to understand this.

Replies

yoeven • yesterday at 11:31 PM

The other way round, task specific DNNs adapted to share the same vector space as omni-transformers with generalized vision, audio encoders.

E.g. For an OCR task, the first pass will be handled by the CNN, converted to shared tokens which the transformer can consume, correct any issues if needed and a decoder that can handle both the DNN and transformer output.

alt Hacker News

Replies