Interesting, it doesn't seem intuitive at all to me.
My (wrong?) understanding was that there was a positive correlation between how "good" a tokenizer is in terms of compression and the downstream model performance. Guess not.