Interestingly the article doesn't mention two-dimensional arrays and they're curios because they bring a certain asymmetry with them. It always tripped me over the most in C because I otherwise find the language very "symmetrical". It often feels like in design of this language the beauty of expressing certain things took priority over readability or safety which I admire in a way. But somehow not in the case of the two-dimensional arrays.
If you see a[i][j] it could mean two completely different things:
1) "a" is a continuous chunk of memory of N*M bytes, so it behaves as char*; a[i][j] == *(a + i*M + j)
2) "a" is an array of char* pointers that point to N completely distinct memory chunks of size M, so it behaves as char**; a[i][j] == *(*(a + i) + j)
With flat arrays the difference between an array as a variable and a pointer to the first element is literally negligible because you won't even see the difference in the assembly. This is why the automatic decay-to-pointer makes a lot of sense.
But that breaks completely with multiple dimensions. You definitely see the difference in the assembly because the memory layout is so different.
"breaks completely"
I rather would say it works nicely in auto-generating the complex indexing operation for n-dimensional arrays which makes it a lot more convenient and less error-prone to write such code. The compiler may also flatten a loop.
The array of pointer hack used previously to similate 2d arrays using an array to pointers to arrays should not be used outside of special algorithms, as it is error prone and slow.
As I recall, C# supports this in a completely sensible way by distinguishing a[i,j] and a[i][j]. If I understand right, in C, a[i][j] means what C# would spell a[i,j], which does seem rather surprising and inconsistent
For 1), you can just write (&a[i])[j] .
And just in case you have not come across this, C++ allows you you overload all the relevant operators here: [], *, ->
So, you really can't tell what's going on behind the scenes.
I wanted to pull my hair out seeing some 'enterprise' code use
state[i] = foo;
for some kind of logging where i was the severity level. There were even instances of state[i++], where the severity was incremental. I hope someone has rewritten that codebase with AI by now.I mean, just like with 1 dimensional arrays, it depends on the context.
Array memory is on the stack. The size of that array is actually not known at run time, its only known at compile time, where any reference to that length gets resolved by the compiled.
If your 2d array sits on the stack, then inferring memory layout is pretty easy. If you are dealing with pointer that was passed to a function, then you can't assume anything about data size or limits, which is why many functions that take pointers take a size parameter as well.
> If you see a[i][j] it could mean two completely different things:
> 1) ... a[i][j] == *((char*)a + i*M + j) // I added the char* cast to make it correct
> 2) ... a[i][j] == *(*(a + i) + j)
You may already understand this but: even in case (1), you still have
(It has to - that's what operator[] means in C.)It's just that, in this case, `a + i` is applying pointer arithmetic to char[M]* so it adds M * i bytes to a's address.
This is similar to how `a + i`, if a is int32_t*, will give you an address 4 * i bytes bigger than a.
Really the confusing part of this is that *(a + i), which is an array value i.e. has type char[M], decays to char* when you add an integer to it (or dereference it). This is a pretty crazy hack really. Imagine if, in C++, you could do this
Yuck.