Tokenizer efficiency varying by languages, by as much as up to 15x, is very well known and established
https://www.google.com/search?q=tokenizer+efficiency+by+language