Many languages (with a sizable speaker population) do not have machine translation to or from any other language.
The technique makes sense though, but in the training data stage mostly. BART-style translation models already represent concepts in latent space regardless of the input-output language sidestepping English entirely so you have something like:
`source lang —encoded into-> latent space —decoded into—> target lang`
Works great to get translation support for arbitrary language combinations.