It's not 2 million, it's a little over 1 million. The exact number is 1112064 = 2^16 - 2...

moefh • today at 4:54 AM • 1 reply • view on HN

It's not 2 million, it's a little over 1 million.

The exact number is 1112064 = 2^16 - 2048 + 16*2^16: in UTF-16, 2 bytes can encode 2^16 - 2048 code points, and 4 bytes can encode 16*2^16 (the 2048 surrogates are not counted because they can never appear by themselves, they're used purely for UTF-16 encoding).

Replies

chuckadams • today at 3:49 PM

Even with just 1 million codepoints, why did they feel the need for CJK unification? Was it so it would all fit in UCS-2 or something?

alt Hacker News

Replies