logoalt Hacker News

dpc_01234yesterday at 8:09 PM2 repliesview on HN

UTF-8 is a undeniably a good answer, but to a relatively simple bit twiddling / variable len integer encoding problem in a somewhat specific context.

I realize that hindsight is 20/20, and time were different, but lets face it: "how to use an unused top bit to best encode larger number representing Unicode" is not that much of challenge, and the space of practical solutions isn't even all that large.


Replies

Tuna-Fishyesterday at 8:20 PM

Except that there were many different solutions before UTF-8, all of which sucked really badly.

UTF-8 is the best kind of brilliant. After you've seen it, you (and I) think of it as obvious, and clearly the solution any reasonable engineer would come up with. Except that it took a long time for it to be created.

ivanjermakovyesterday at 8:13 PM

I just realised that all latin text is wasting 12% of storage/memory/bandwidth with MSB zero. At least is compresses well. Are there any technology that utilizes 8th bit for something useful, e.g. error checking?

show 1 reply