I had a very similar setup. Really happy with the xarm 6 lite. I played around with the diffusion policy paper experiments and was thinking to buy a webcam as a top camera as well but I ended up buying two intel realsense ones because of the timestamp drift issues. How did you solve that? Or is camera feed syncing not necessary for your intended projects?
I timestamp everything twice: once with the hardware clock (if available, like for the realsense camera) and once within my robot stack once it gets read from the device (using `time.monotonic_ns()`). Both are stored and alignment can happen with either timestamp. I think the 2nd timestamp is actually more meaningful since ultimately I want to reconstruct the state that the policy would've seen; so if one modality is delayed I should actually include that effect during training.
That being said, I might switch to a realsense for the static tabletop camera as well; the realsense wrist is clearly much more reliable than the cheap Logitech C920 that I currently use.