> "these two loops compute the same value"
At what sequence point? The branchless version writes to small_numbers[smlen], for any given value of smlen, potentially more than once; so there are observable points of time during the loop where the behavior is different. But after the loop, both contain the final write to small_numbers[i] for all 0 <= I < smlen; and the transient writes both don't change observed external behavior, and are apparently cheaper than fewer but conditional writes.
I think the small_numbers array would differ after the end of the loop if, for instance, numbers contained only numbers >= 500. Am I wrong?