The use of 8-bit extensions of ASCII (like the ISO 8859-x family) was ubiquitous for a few decades,...

layer8 • yesterday at 8:39 PM • 2 replies • view on HN

The use of 8-bit extensions of ASCII (like the ISO 8859-x family) was ubiquitous for a few decades, and arguably still is to some extent on Windows (the standard Windows code pages). If ASCII had been 8-bit from the start, but with the most common characters all within the first 128 integers, which would seem likely as a design, then UTF-8 would still have worked out pretty well.

The accident of history is less that ASCII happens to be 7 bits, but that the relevant phase of computer development happened to primarily occur in an English-speaking country, and that English text happens to be well representable with 7-bit units.

Replies

necovek • today at 2:18 AM

Most languages are well representable with 128 characters (7-bits) if you do not include English characters among those (eg. replace those 52 characters and some control/punctuation/symbols).

This is easily proven by the success of all the ISO-8859-*, Windows and IBM CP-* encodings, and all the *SCII (ISCII, YUSCII...) extensions — they fit one or more languages in the upper 128 characters.

It's mostly CJK out of large languages that fail to fit within 128 characters as a whole (though there are smaller languages too).

cryptonector • today at 5:20 AM

Many of the extended characters in ISO 8859-* can be implemented using pure ASCII with overstriking. ASCII was designed to support overstriking for this purpose. Overstriking was how one typed many of those characters on typewriters.

alt Hacker News

Replies