logoalt Hacker News

zahlmanyesterday at 7:56 PM0 repliesview on HN

I've always understood "length" to mean what the author calls "count", and would never expect it to refer to byte size; as far as I can tell, it never did. Size is a design-time consideration; caring about it in the code is an exceptional case, for applications like (as you mention) serialization. So that's what deserves the dedicated term. "Length" refers specifically to a total number of elements in many languages preceding Rust.

For that matter, many languages, especially "object-oriented" ones, treat heterogeneous containers as the default. They might not even offer native containers that can store everything inline in a single contiguous allocation, except perhaps for strings. In which case, "number of bytes" is itself ambiguous; are you including the indirected objects or not?

"Count" is also overloaded — it commonly means, and I normally only understand it to mean, the number of elements in a collection meeting some condition. Hence the `.count` method of Python sequences, as well as the jargon "population count" referring to the number of set bits in an integer. Today, Python's integers have both a `.bit_count` and a `.bit_length`, and it's obvious what both of them do; calling either `.bit_size` would be confusing in my mental framework, and a contradiction in terms in the OP's.

I would disagree that even C's `strlen` refers to byte size. C comes from a pre-Unicode world; the type is called `char` because that was naively considered sufficient at the time to represent a text character. (Unicode is still in that sense naive, but it at least allows for systems that are acutely aware of the distinction between "characters" and graphemes.) But notice: C's "strings" aren't proper objects; they're null-terminated sequences, i.e. their length is signaled in-band. So that metadata is also just part of the data, in a single allocation with no indirection; the "size" of a string could only reasonably be interpreted to include that null terminator. Yet the result of `strlen` excludes it! Further, if `strlen` is used on a string that was placed within some allocated buffer, it knows nothing about that buffer.

(Similarly, Rust `str::len` is properly named by this scheme. It gives the number of valid 1-byte-sized elements in a collection, not the byte size of the buffer they're stored within. It's still ambiguous in a sense, but that's because of the convention of using UTF-8 to create an abstraction of "character" elements of non-uniform size. This kind of ambiguity is properly resolved either with iterators, like the `Chars` iterator in Rust, or with views.)

Also consider: C has a `sizeof` operator, influencing Python's `.__sizeof__()` methods. That's because the concept of "size" equally makes sense for non-sequences; neither "count" nor "length" does. So of course "length" cannot mean what the author calls "size".