Originally CUDA hardware was designed without a specific memory model, after C++11, NVidia went into a multi year effort to redesign the hardware to match C++ memory model semantics.
CppCon 2017: "Designing (New) C++ Hardware”
https://www.youtube.com/watch?v=86seb-iZCnI