logoalt Hacker News

mrlonglongyesterday at 9:30 PM15 repliesview on HN

the zero terminated string is I think is computing's biggest mistake. Pascal style strings were much safer.


Replies

layer8today at 8:47 AM

There is a middle ground that Visual Basic (and then COM) took, with the BSTR type: It’s still a pointer to a zero-terminated char array, but there is a length field immediately preceding the first pointed-to byte. This is still compatible with a C string (assuming no embedded null characters), but BSTR-typed functions can take advantage of the length value.

show 3 replies
BobbyTables2today at 4:30 AM

Partly agree but there would have been squabbling on the data type of the size, unless it was variable length. The latter would have had other issues too.

For a while, 16bit would probably have seemed too extravagant. Now 32bit would probably seem too small.

For a “strongly typed” language, C is pretty damn loose where would have mattered.

show 6 replies
smackeyackytoday at 2:44 AM

Zero terminated strings were the basis for an awful lot of useful software. Calling them the biggest mistake in computing is a bit OTT.

I haven’t programmed anything Pascal related for 30+ years but I dimly remember thinking at the time that I wished the string system wasn’t so hard to use.

show 4 replies
dmazzonitoday at 4:05 AM

255 characters ought to be enough for everybody, right?

show 1 reply
layer8today at 2:14 AM

Almost as bad as newline-terminated lines. ;)

show 1 reply
bsderyesterday at 11:43 PM

Zero terminated string is a special case of sentinel value termination.

And sentinel value terminations make a lot of sense when you have punch cards and fixed length records that you need to carve into pieces.

Nobody expected any decisions they were making in the 1960s and 1970s to have any bearing on computing a half-century later. They all expected to have their mistakes long papered over by smarter people at some point.

But we ALL make the mistake of underestimating inertia.

Conscattoday at 6:39 AM

Clang and GCC both let you use Pascal strings in C if you would like (with `\p`). But Pascal strings aren't that useful today because the maximum length is too short.

show 1 reply
jackbucksyesterday at 9:35 PM

It was definitely an interesting way to allocate pointers. I did once have a very large project where devs didnt understand this and resolved hundreds or more off by one and memory overwrites in C due to this feature.

But at the same time, I think blaming the software was kind of a cop out. Devs were in a hurry and simply didnt respect the rules. Given todays software engineer at large. Nerfing programming languages so they cant destroy things might not be a bad idea. But AI will nerf everything.

show 1 reply
sourcegrifttoday at 4:01 PM

Rust has "pascal style strings" (quotes because the concept is slightly different) so it's not a done deal

lelanthrantoday at 7:25 AM

> the zero terminated string is I think is computing's biggest mistake.

No. They had trade-offs to make, and sentinel-based sequences are a needed thing, even outside of strings.

The mistake was that ISAs never looked at what HLL needed, then add the necessary instructions (I posted more about this below).

Even NULL is not a big mistake, when looked at in context of the time in which it was developed.

mslayesterday at 10:25 PM

In addition to having to pick a size for the length counter and then, later, having to differentiate between lengths in bytes, codepoints, and glyphs, you can't subdivide a Pascal string using pointer arithmetic. To pass just the end of a string into a function, you have to either copy the tail of one Pascal-style string to another with a smaller size value, or your string has to be a struct with an integer and a pointer to the actual data instead of just an integer stuck on the beginning of the string. The first is a lot of copying in some cases, the second raises the specter of structs with invalid pointers. That's not to mention the potential problems that would cause with caches.

show 3 replies
fragmedetoday at 12:05 AM

compared to Von Newman versus Harvard architecture for LLMs? I think that's a far bigger mistake.

show 1 reply
themafiayesterday at 10:01 PM

> Pascal style strings were much safer.

The limitations were brutal. Initially you could only have 255 bytes in a string. The length of a string and the size of the allocation are now separate and you may need to think about that unused memory in your design. The problem now doubles with the introduction of UTF-8. Your string size is in bytes and you need to track characters separately.

If you want to create an array of strings you either need to specify the length of all strings and accept the memory overhead or have an array of pointers to strings. If you use an array of pointers you may end up choosing to use the 'nil' value as a sentinel that means "end of list." So we're right back where we started.

--

Because someone decided to downvote this HN has limited the speed at which I can reply. This site is tragic and I'm fully done with it now. You can spread propaganda and poorly sourced zeitgeist and be among friends but if you try to have a genuine conversation about programming languages you are made to be unwelcome immediately. Screw this.

--

> No other data structure works like this.

The linked list.

> You can't mess this up in an array

C happily decomposes arrays into pointers. You can erase your length information from the type. This was an intentional decision.

> Strings are the only data structure that assume there will be a NULL at end.

Which is why almost every string API has a version that allows you to specify the maximum length. The fact that you can use a NUL doesn't mean you have to. Which is why the concept of "sentinel values" is broadly used in many types of applications you haven't considered here.

show 5 replies
mikewarottoday at 5:09 AM

[dead]

dietr1chyesterday at 9:52 PM

I think it was NULL itself. It was a long way until we realised we don't want invalid values and could use the type system to help us use special values safely.

show 4 replies