logoalt Hacker News

the__alchemistyesterday at 6:13 PM1 replyview on HN

Does anyone know if this will let you share structs between host and device? That is the big thing missing so far with existing rust/CUDA workflows. (Plus the serialization/bytes barrier between them)


Replies

nihalpashamtoday at 2:12 AM

Yes, absolutely. That is one of the advantages of cuda-oxide being single-source Rust: the host and device code can refer to the same Rust types, and the compiler has enough information to make the device-side layout match what rustc chose on the host.

So the intended workflow is not “define a Rust struct on the host, define a matching CUDA C++ struct for the device, then serialize bytes between them.” It is much closer to “define `MyStruct` once in Rust, put a `DeviceBuffer<MyStruct>` on the GPU, and write kernels that take `&[MyStruct]`, `*const MyStruct`, etc.”

There are two important pieces under the hood:

1. At the kernel boundary, cuda-oxide scalarizes aggregate parameters where needed. For example, slices become pointer + length, and simple structs can be flattened into fields for launch ABI purposes.

2. For actual struct layout, we use rustc’s computed layout rather than assuming declaration order or a C ABI. That matters because Rust is allowed to reorder/pad `repr(Rust)` structs. The device lowering carries those offsets/padding through so field access on the GPU matches the host-side layout.

So for plain data structs, nested structs, numeric fields, arrays, etc., yes, this is very much the goal: share the type directly instead of maintaining a separate CUDA representation or crossing a bytes/serialization boundary.

The caveat is the usual one: this does not make arbitrary host-owned Rust heap graphs GPU-addressable. A `Vec`, `String`, `Box`, trait object, or host pointer still contains an address, and that address has to refer to memory the GPU can actually access. For those cases you still need device allocation, unified/HMM memory, or a GPU-friendly representation.

But for the common “I have a Rust data type and want kernels to consume/update arrays of it” case: yes, that is exactly the kind of friction cuda-oxide is meant to remove.