>The problem now doubles with the introduction of UTF-8. Your string size is in bytes and you nee...

AlienRobot • yesterday at 10:29 PM • 4 replies • view on HN

>The problem now doubles with the introduction of UTF-8. Your string size is in bytes and you need to track characters separately.

That isn't really a problem.

The problem with null-terminated strings is specifically what happens when you reach the end of the allocated array and there ISN'T a NULL character.

Every string function is designed to keep going until it finds the NULL character, so if a hacker gets rid of the NULL character, he can exploit pretty much any standard string manipulation function being used elsewhere in the program to manipulate whatever memory comes AFTER the string data structure.

No other data structure works like this. You can't mess this up in an array, because no function that manipulates arrays is just going to keep going until there is a null. That would be stupid because it would require users of the function to add a NULL to the end of their arrays before passing it to the function, so instead we just pass the size of the array to everything. Strings are the only data structure that assume there will be a NULL at end.

By the way, I read once that if you use UTF-32 every code point will be 4 bytes, constantly, but even then a single code point isn't necessarily a single character. Text is just complicated.

Replies

tredre3 • yesterday at 11:10 PM

> No other data structure works like this.

In C most data structures work like this, you keep going until you find NUL (character) or NULL (pointer). E.g. Strings, array of pointers, linked lists, etc. Of course you can add length to most of those, but it isn't the canonical/traditional way of doing things.

➕ show 1 reply

dare944 • today at 1:39 PM

> No other data structure works like this. You can't mess this up in an array, because no function that manipulates arrays is just going to keep going until there is a null.

This is patently false. Sentinel markers are used widely in array types. Consider GNU's getopt_long() function, a mainstay in GNU tools:

The argument longopts must be an array of [struct option] structures, one for each long option. Terminate the array with an element containing all zeros.

lelanthran • today at 8:39 AM

> Every string function is designed to keep going until it finds the NULL character, so if a hacker gets rid of the NULL character,

What sort of situation are you envisioning where a hacker can remove the sentinel (in the case of nul-termination) but not modify the length bytes (in the case of fat pointers)?

➕ show 1 reply

imtringued • today at 10:17 AM

If I zero out the destination buffer of a strcpy, and the string is longer than the destination buffer I will run into a buffer overflow problem despite every byte being a zero byte. The absence or presence of the zero byte doesn't seem to be the deciding factor.

alt Hacker News

Replies