logoalt Hacker News

arjieyesterday at 8:54 PM0 repliesview on HN

Character-density and token-efficiency are different things. Latter is data and, therefore, tokenizer specific e.g. take GPT-5's tokenizer o200k_base and run mandarin text and its translation through. Some amount of the time en will beat zh. I just tested with news articles and wikipedia.

After all `def func():` is only 3 tokens on o200k_base.