for (int i = 0; i < 1000; i++) {
small_numbers[smlen] = numbers[i];
smlen += (numbers[i] < 500);
}
is much faster than the conventional version with a conditional branch: for (int i = 0; i < 1000; i++) {
if (numbers[i] < 500) {
small_numbers[smlen] = numbers[i];
smlen += 1;
}
}
Been staring at this for a bit, but my brain is not working properly today: could someone please explain how these to loops compute the same value for small_numbers[smlen]?They don't. After running, for the values in small_numbers from 0 to smlen-1 they are equivalent.
But if the last value of numbers[] is not smaller than 500, small_numbers[smlen] will contain that value for the first version whereas the second version does not write to small_numbers[smlen].
> "these two loops compute the same value"
At what sequence point? The branchless version writes to small_numbers[smlen], for any given value of smlen, potentially more than once; so there are observable points of time during the loop where the behavior is different. But after the loop, both contain the final write to small_numbers[i] for all 0 <= I < smlen; and the transient writes both don't change observed external behavior, and are apparently cheaper than fewer but conditional writes.
Writing to array[n] and not incrementing n means that the value just written is outside the "useful" range (from 0 to n-1) and will not be considered (it will be overwritten the next iteration).
I am rather thinking, if one is so much faster, and they are truly equal, why is the compiler too stupid to convert one into the other?
It only increments if the number was less than 500, effectively just saving the ones less than 500.
First version has a side effect of writing to small_numbers[0] always.
The compiler probability can't optimize that in the second version.
If it wrote unconditionally and incremented only in the if then I'd guess they would compile to the same thing.
numbers[i] < 500
is a conditional (true or false) that evaluates to 1 or 0 (in C)
Therefore smlen has either a 0 or a 1 added to it's value .. equivilent to only adding 1 if True.
Here is another perspective:
- the first one (branchless) use the condition to SAVE the correct value (< 500): it temporarily writes any current value to the same index i, always overwriting the previous value, effectively saving it (by moving forward to i+1) only when the value is right (small number). Downside of this simple function: the last value may be bigger than 500
- the second one use the condition to ADD the value, when it is 100% sure it is a correct small number