It's one more instruction only if you don't fuse those instructions in the decoder stage, but as the pattern is the one expected to be generated by compilers, implementations that care about performance are expected to fuse them.