Unicode should be for visible characters. Invisible characters are an abomination. So are ways to hi...

WalterBright • today at 5:11 PM • 10 replies • view on HN

Unicode should be for visible characters. Invisible characters are an abomination. So are ways to hide text by using Unicode so-called "characters" to cause the cursor to go backwards.

Things that vanish on a printout should not be in Unicode.

Remove them from Unicode.

Replies

pvillano • today at 5:34 PM

Unicode is "designed to support the use of text in all of the world's writing systems that can be digitized"

Unicode needs tab, space, form feed, and carriage return.

Unicode needs U+200E LEFT-TO-RIGHT MARK and U+200F RIGHT-TO-LEFT MARK to switch between left-to-right and right-to-left languages.

Unicode needs U+115F HANGUL CHOSEONG FILLER and U+1160 HANGUL JUNGSEONG FILLER to typeset Korean.

Unicode needs U+200C ZERO WIDTH NON-JOINER to encode that two characters should not be connected by a ligature.

Unicode needs U+200B ZERO WIDTH SPACE to indicate a word break opportunity without actually inserting a visible space.

Unicode needs MONGOLIAN FREE VARIATION SELECTORs to encode the traditional Mongolian alphabet.

➕ show 1 reply

luke-stanley • today at 5:54 PM

So we need a new standard problem due to the complexity of the last standard? Isn't unicode supposed to be a superset of ASCII, which already has control characters like new space, CR, and new lines? xD

➕ show 1 reply

tetha • today at 7:43 PM

That ship has sailed, but I consider Unicode a good thing, yet I consider it problematic to support Unicode in every domain.

I should be able to use Ü as a cursed smiley in text, and many more writing systems supported by Unicode support even more funny things. That's a good thing.

On the other hand, if technical and display file names (to GUI users) were separate, my need for crazy characters in file names, code bases and such are very limited. Lower ASCII for actual file names consumed by technical people is sufficient to me.

➕ show 1 reply

WalterBright • today at 5:21 PM

Another dum dum Unicode idea is having multiple code points with identical glyphs.

Rule of thumb: two Unicode sequences that look identical when printed should consist of the same code points.

➕ show 4 replies

eviks • today at 7:39 PM

So you'd remove space and tab from Unicode?

moritzruth • today at 5:14 PM

greatidea,whoneedsspacesanyway

➕ show 1 reply

bawolff • today at 7:53 PM

Good luck with that given there are invisible characters in ascii.

Also this attack doesnt seem to use invisible characters just characters that dont have an assigned meaning.

abujazar • today at 5:14 PM

Invisible characters are there for visible characters to be printed correctly...

alt Hacker News

Replies