logoalt Hacker News

kbolinoyesterday at 9:35 PM1 replyview on HN

Thanks to UTF-16, which came out after UTF-8, there are 2048 wasted 3-byte sequences in UTF-8.

And unlike the short-sighted authors of the first version of Unicode, who thought the whole world's writing systems could fit in just 65,536 distinct values, the authors of UTF-8 made it possible to encode up to 2 billion distinct values in the original design.


Replies

xigoitoday at 10:14 AM

Thanks to UTF-8, there are 13 wasted 1-byte sequences in UTF-8 :P

show 1 reply