As another poster above linked, it’s been shown to be effective since 2022: https://arxiv.org/abs/2203.05482
it works because Nex N2 is also a derivative of the original base Qwen model. If it was two completely unrelated models it wouldn't work.
it works because Nex N2 is also a derivative of the original base Qwen model. If it was two completely unrelated models it wouldn't work.