logoalt Hacker News

graemepyesterday at 10:47 AM2 repliesview on HN

What does "every character" mean? Did it really need to include emojis, for example? Domino tiles? Alchemical symbols? A much smaller number of characters would have been sufficient for all but a tiny number of cases.


Replies

chmod775yesterday at 4:26 PM

> What does "every character" mean? Did it really need to include emojis, for example?

You may be too young to remember, but there was a time when a lot of software had their own way to encode emoji if they supported them. This sucked for interoperability - especially when using common protocols like SMS.

Some of these implementations were essentially find/replace and would turn various strings of characters commonly occurring in code into emoji. Someone reading your mail containing code on their portable device or other weird client would see parts of that code replaced by emoji. Maybe you had to format your code a certain way, inserting spaces tactically, to avoid accidentally ending up with an emoji. I'm glad we put that behind us for the most part.

Living in a world where you can just copy-paste some text containing emoji (or not) from one software into another is honestly great. Same for all these other symbols that may be embedded into text.

If a software has to come up with their own text-embeddable encodings to represent symbols (to allow for copy-paste or sharing) things often end up less than optimal.

wongarsuyesterday at 11:59 AM

I take "every character" to mean "anything that was represented in a reasonably common pre-unicode code page or character encoding, as well as anything that might come up in OCR output of text documents".

Emojis obviously got in from Japanese character encodings, and imho the world is off better for that. Though many of the extensions of the emoji set really don't seem to get what emojis are used for. Similarly, chess and shogi pieces as well as symbols from Western playing cards go in through previous encodings, and domino tiles got accepted based on being conceptually similar. A bit questionable imho.

On the other hand the Azimuth sign seems to satifsy the "would appear in OCR scans", based on being published in font catalogues. Even if nobody has come forward with a book it appears in, I don't think they made and advertised lead type characters for fun. It has to have had some use in printed publications of some type (probably scientific, from the surrounding context)