logoalt Hacker News

mort96today at 10:24 AM1 replyview on HN

Hm? UTF-8 encodes all of ASCII with one byte per character, and is pretty efficient for everything else. I think the only advantage UTF-16 has over UTF-8 is that some ranges (such as Han characters I believe?) are often 3 bytes of UTF-8 while they're 2 bytes of UTF-16. Is that your use case? Seems weird to describe that as "all text files" though?


Replies

thaumasiotestoday at 10:26 AM

UTF-8 encodes European glyphs in two bytes and oriental glyphs in three bytes. This is due to the assumption that you're not going to be using oriental glyphs. If you are going to use them, UTF-8 is a very poor choice.

show 2 replies