https://nvlabs.github.io/cuda-oxide/gpu-safety/the-safety-mo...
> A GPU kernel runs thousands of threads that all see the same memory at the same time. On a CPU, Rust prevents data races through ownership and borrowing – one mutable reference, no aliases, enforced at compile time. On a GPU, you have 2048 threads per SM, all launched from the same function, all pointing at the same output buffer. The borrow checker was not designed for this.
> cuda-oxide solves the problem in layers. The common case – one thread writes one element – is safe by construction, no unsafe required. The uncommon cases – shared memory, warp shuffles, hardware intrinsics – require unsafe with documented contracts. And the frontier cases – TMA, tensor cores, cluster-level communication – are fully manual, matching the complexity of the hardware they control.
That's.. not really Rusty. In Rust, we create new safe abstractions when the existing ones don't quite map to the problem at hand. See for example what's done in Rust for Linux
If it's not safe.. what's the point of Rust?
(it's okay to offer unsafe APIs for people that need to squeeze the last bit of performance, but this shouldn't be the baseline)
I compare this with userspace libs for APIs like io_uring and vulkan. designing safe APIs for them stuff is kind of hard (there's even some unsound attempts)
[dead]