I timestamp everything twice: once with the hardware clock (if available, like for the realsense camera) and once within my robot stack once it gets read from the device (using `time.monotonic_ns()`). Both are stored and alignment can happen with either timestamp. I think the 2nd timestamp is actually more meaningful since ultimately I want to reconstruct the state that the policy would've seen; so if one modality is delayed I should actually include that effect during training.
That being said, I might switch to a realsense for the static tabletop camera as well; the realsense wrist is clearly much more reliable than the cheap Logitech C920 that I currently use.
Both timestamps are useful in different ways. The early-as-possible hardware stamp is best for reasoning about causality, while the later-and-full-o-jitter middleware stamps are good for compensating for that inevitable jitter.
Time is one of the hard problems in robots, because they are inevitably but non-obviously distributed systems.
Robots are annoyingly, wonderfully difficult.