Curious if this similarity comes more from the training data or the model architecture itself. Did they look into that?
They describe that both are important, and researched in the paper, within the opening paragraph.
They describe that both are important, and researched in the paper, within the opening paragraph.