CUDA-oxide: Nvidia's official Rust to CUDA compiler

405 points • by adamnemecek • yesterday at 3:55 PM • 114 comments • view on HN

Comments

This is amazing.. ive been working with custom CUDA kernels and https://crates.io/crates/cudarc for a long time, and this honestly looks like it could be a near drop-in replacement.

im especially curious how build times would compare? Most Rust CUDA crates obv rely on calling CMake or nvcc, which can make compilation painfully slow. coincidentally, just last week i was profiling build times and found that tools like sccache can dramatically reduce rebuild times by caching artifacts - but you still end up paying for expensive custom nvcc invocations (e.g. candle by hugging face calls custom nvcc command in their kernel compilation): https://arpadvoros.com/posts/2026/05/05/speeding-up-rust-whi...

➕ show 2 replies

cyber_kinetist • yesterday at 5:33 PM

I'm quite interested in how they dealt with Rust's memory model, which might not neatly map to CUDA's semantics. Curious what the differences are compared to CUDA C++, and if the Rust's type system can actually bring more safety to CUDA (I do think writing GPU kernels is inherently unsafe, it's just too hard to create a safe language because of how the hardware works, and because of the fact that you're hyper-optimizing all the time)

➕ show 4 replies

raincole • yesterday at 6:38 PM

I wonder what it means for Slang[0]. Presumably the point is that people want to do GPU programming with a more modern language. But now you can just use Rust...

(Disclaimer: I like Slang a lot.)

[0]: https://shader-slang.org/

➕ show 2 replies

tiffanyh • yesterday at 6:01 PM

Re: Rust (and "safe" programming languages).

Does anyone have more details on NVIDIAs use of Spark/Ada?

All I can find is what's listed below:

https://www.adacore.com/case-studies/nvidia-adoption-of-spar...

➕ show 2 replies

alecco • yesterday at 11:14 PM

> directly to PTX

Weird. There's a recent NVIDIA MLIR that is quite good and fast. Or they could target the even easier and more recent/fashionable tile IR [1] used by CuTile [2] (a little bit higher level but significantly easier to target, only loses on epilogue fusion and similar).

[1] https://docs.nvidia.com/cuda/tile-ir/

[2] https://developer.nvidia.com/cuda/tile

debugnik • yesterday at 7:19 PM

> (em dash) no DSLs, no foreign language bindings, just Rust.

Official CUDA port and they couldn't even bother with the introductory paragraph.

Okay, I'll try to ignore it and read the docs. Hey a custom IR, this sounds interesti-

> MLIR’s implementation, however, is C++ with a side of TableGen, a build system that requires you to compile all of LLVM, and debugging sessions that make you question your career choices.

I can't take this industry seriously anymore.

➕ show 4 replies

rogermeier • yesterday at 7:00 PM

TileLang https://github.com/tile-ai/tilelang and stuff like Tile Kernels https://github.com/deepseek-ai/TileKernels will make CUDA obsolete one day.

➕ show 5 replies

the__alchemist • yesterday at 6:13 PM

Does anyone know if this will let you share structs between host and device? That is the big thing missing so far with existing rust/CUDA workflows. (Plus the serialization/bytes barrier between them)

➕ show 1 reply

foo-bar-baz529 • yesterday at 6:15 PM

One thing I’ve been wary about with Rust for CUDA is the bit of overhead that Rust adds that is usually negligible but might matter here, like bounds checks on arrays. Could it cause additional registers to get used, lowering the concurrency of a kernel?

the__alchemist • yesterday at 5:56 PM

Hell yea! I have been doing it with Cudarc (Kernels) and FFI (cuFFT). Using manual [de]serialization between byte arrays and rust data structs. I hope this makes it lower friction!

nextaccountic • yesterday at 8:24 PM

https://nvlabs.github.io/cuda-oxide/gpu-safety/the-safety-mo...

> A GPU kernel runs thousands of threads that all see the same memory at the same time. On a CPU, Rust prevents data races through ownership and borrowing – one mutable reference, no aliases, enforced at compile time. On a GPU, you have 2048 threads per SM, all launched from the same function, all pointing at the same output buffer. The borrow checker was not designed for this.

> cuda-oxide solves the problem in layers. The common case – one thread writes one element – is safe by construction, no unsafe required. The uncommon cases – shared memory, warp shuffles, hardware intrinsics – require unsafe with documented contracts. And the frontier cases – TMA, tensor cores, cluster-level communication – are fully manual, matching the complexity of the hardware they control.

That's.. not really Rusty. In Rust, we create new safe abstractions when the existing ones don't quite map to the problem at hand. See for example what's done in Rust for Linux

If it's not safe.. what's the point of Rust?

(it's okay to offer unsafe APIs for people that need to squeeze the last bit of performance, but this shouldn't be the baseline)

I compare this with userspace libs for APIs like io_uring and vulkan. designing safe APIs for them stuff is kind of hard (there's even some unsound attempts)

➕ show 1 reply

economistbob • yesterday at 6:18 PM

So, we have stainless, which means Linux code that never rusted. Now we need someone to make phosphorus so that we can turn rusty code into old iron. Then GPL fans can run Rust boxes, Stainless machines, or future proofed iron work horses.

All software can come on three editions. Stainless drivers that were never rusty, oxidized drivers that used Rust on existing code, and Iron editions which is where someone converted the Rust back to C using the new phosphoric tool...

Diversity can be our strength.

Making Iron C/c++ code can be called acid washing if it was rusted.

➕ show 1 reply

adamnemecek • yesterday at 5:48 PM

Here's the repo link https://github.com/NVlabs/cuda-oxide

TheMagicHorsey • yesterday at 7:06 PM

Oh lord. If this is the trend, I probably can't avoid improving my Rust language knowledge in the long term. I hate reading Rust so much right now. I guess I just have to get over that hump.

➕ show 1 reply

zghst • yesterday at 6:55 PM

AWESOME!

rowanG077 • yesterday at 5:29 PM

Personally I really don't want new GPU languages that do not have AD as a first class citizen. I mean rust is an improvement over C++ CUDA but still.

➕ show 6 replies

rvz • yesterday at 5:18 PM

This is a bit good for Rust if you want to use the language with CUDA. The problem is, it still doesn't really move the needle if you really don't like running closed source drivers and runtime binaries and care about open source.

Continuing from this discussion [0], this only makes it a Rust or a CUDA problem rather than a Python, CUDA and a PyTorch one if there bug in one of them.

Yet at the end of the day, it still uses Nvidia's closed source CUDA compiler 'nvcc' which they will never open source. A least Mojo promises to open source their own compiler which compiles to different accelerators with multiple backend support.

Unlike this...but uses Rust.

[0] https://news.ycombinator.com/item?id=48067228

➕ show 6 replies

_alphageek • today at 4:08 AM

[dead]

globalcostdata • yesterday at 6:19 PM

[flagged]

paufernandez • yesterday at 8:34 PM

This is solved by Mojo already, they must be rushing something to compete, since Mojo is in version 1.0beta1

whatever1 • yesterday at 5:34 PM

Why do we bother with programming languages today? Why not have the LLMs just write assembly code and skip the human readable part? We are not reviewing it anymore anyway.

➕ show 8 replies

alt Hacker News

CUDA-oxide: Nvidia's official Rust to CUDA compiler

Comments