That makes sense. LLVM could probably do better here by using the memory operand version:
https://godbolt.org/z/jeqbaPsMz
The memory operand version tends to be as slow or slower than the manual implementation, so LLVM is right to avoid it.
The memory operand version tends to be as slow or slower than the manual implementation, so LLVM is right to avoid it.