Then that means you need at least 4x the compute to achieve the same results as state of the art. Meaning that if I can train my frontier model with my normal tokenizer in 3 months, it will take you a year. When major releases across all competing providers are measured in months, there's simply no incentive to do that just to capture these fringe edge cases.
Yes, OK. But all the tutorials start with explaining how a tokenizer works. This is not necessary. And in fact makes the message of why a tokenizer is necessary not come across as well.