Implementing rotate through carry like that was a really bad decision IMO - it's almost never by more than one bit left or right at a time, and this could be done much more efficiently than with the constant-time code which is only faster when the count is > 6.
Is the full microcode available anywhere?
Since the shifter is also used for bit tests, the 'most things are a 1-bit shift' might not be the case. Perhaps they did the analysis and it made sense.
I haven't published it yet as there are still some rough edges to clear up, but if you email me ([email protected]) I'll send you the current work-in-progress (the same one that nand2mario is working from).