It is definitely true across different chips. The best kernel to use will vary with what chip it is running on, which often implies that the underlying operations will be executed in a different order. For example, with floating point addition, adding up the same values in a different order can return a different result because floating point addition is not associative due to rounding.