logoalt Hacker News

jerftoday at 1:23 AM2 repliesview on HN

I wasn't a programmer in these days, so I don't know if there's some other major concern that would kill this, but I sometimes wonder about whether we could have / should have used variable-length integers. That is, something like, 0-127 byte strings get their length prefixed, 128 - 16383 get two bytes of prefix, and the probably-rare 16384 - 2097151 strings would end up with three, though proportionally by that point it's hardly anything. Or you could use the UTF-8 mechanism for packing the bytes, though that costs more and probably doesn't get anything we'd care about in the 1980s or 1990s.

It's a bit of extra code, yes. Not necessarily all that much, but some. On average it is only slightly more expensive than null termination, and considered as a proportion of the size of the strings themselves it's hardly anything. It's probably better than the strings getting hard-limited to 0-255, though, which was quite frequently a user-visible quirk.


Replies

Parodpertoday at 2:12 PM

You could start the encoding with two bytes, so that if the most significant bit of the first byte is 0, the length is that byte plus another. That gives you 32KiB strings with just a byte more. Short strings might suffer, but I think the overhead is reasonable.

The next level (110x xxxx) would give you 8MiB strings, which are going to be fine for most things.

senfiajtoday at 1:25 PM

32-bit int isn't too much overhead. Just 3 additional bytes. I bet it's almost always better than c style strings. In the vast majority of situations the price isn't that bad, considering you make strings much more secure and potentially faster in string manipulations.