For a model that claims to focus on many languages, it's quite unreliable when it comes to simple questions like "how to say X in language Y" or "how to conjugate verb X in language Y". It keeps hallucinating words that do not exist, and when corrected, it only hallucinates a new lie.
it probably doesnt know what language each set of words is referencing.
i doubt they are including a lot of training data labeled with the language.
"how to say X in language Y" is a different task from saying X in language Y