logoalt Hacker News

layer8yesterday at 8:39 PM2 repliesview on HN

The use of 8-bit extensions of ASCII (like the ISO 8859-x family) was ubiquitous for a few decades, and arguably still is to some extent on Windows (the standard Windows code pages). If ASCII had been 8-bit from the start, but with the most common characters all within the first 128 integers, which would seem likely as a design, then UTF-8 would still have worked out pretty well.

The accident of history is less that ASCII happens to be 7 bits, but that the relevant phase of computer development happened to primarily occur in an English-speaking country, and that English text happens to be well representable with 7-bit units.


Replies

necovektoday at 2:18 AM

Most languages are well representable with 128 characters (7-bits) if you do not include English characters among those (eg. replace those 52 characters and some control/punctuation/symbols).

This is easily proven by the success of all the ISO-8859-*, Windows and IBM CP-* encodings, and all the *SCII (ISCII, YUSCII...) extensions — they fit one or more languages in the upper 128 characters.

It's mostly CJK out of large languages that fail to fit within 128 characters as a whole (though there are smaller languages too).

cryptonectortoday at 5:20 AM

Many of the extended characters in ISO 8859-* can be implemented using pure ASCII with overstriking. ASCII was designed to support overstriking for this purpose. Overstriking was how one typed many of those characters on typewriters.