NVIDIA Releases CUDA-Oxide 0.1 For Experimental Rust-To-CUDA Compiler - Phoronix
#Rust

NVIDIA Releases CUDA-Oxide 0.1 For Experimental Rust-To-CUDA Compiler - Phoronix

Hardware Reporter
5 min read

NVIDIA Labs debuts CUDA-Oxide 0.1, an experimental open-source compiler that translates standard Rust code directly to NVIDIA PTX assembly for GPU kernels, removing the need for C++ interop or custom domain-specific languages in CUDA development.

Written by Michael Larabel in NVIDIA on 8 May 2026 at 10:17 AM EDT. 5 Comments

NVIDIA Labs rolled out the inaugural 0.1 alpha release of CUDA-Oxide on May 8, 2026, a project that significantly expands native Rust support for writing CUDA kernels targeting NVIDIA GPUs. NVIDIA This experimental tool allows developers to write standard, idiomatic Rust code that compiles directly to NVIDIA PTX (Parallel Thread Execution) format, the same intermediate assembly used by NVIDIA's official nvcc compiler for C++ CUDA workloads, as documented at https://docs.nvidia.com/cuda/parallel-thread-execution/index.html.

Before CUDA-Oxide, Rust developers targeting NVIDIA GPUs faced limited, clunky options. The most common approach involved using FFI (Foreign Function Interface) bindings to call C++ CUDA kernels from Rust, which required maintaining separate C++ and Rust codebases, dealing with unsafe memory interop, and adding significant complexity to build pipelines. Alternative Rust-specific CUDA DSLs (Domain-Specific Languages) like rust-cuda added custom syntax that broke compatibility with core Rust tooling, limited access to full CUDA feature sets, and often introduced abstraction overhead that hurt performance. CUDA-Oxide eliminates these pain points entirely. It requires no DSL extensions, no C++ code, and no external language bindings. Pure Rust code goes in, PTX assembly comes out.

The project is designed for single-source compilation, meaning developers can write host (CPU) and device (GPU) code in the same Rust file, using lightweight annotations to mark functions for GPU execution, similar to how CUDA C uses global and device qualifiers. A custom compiler backend called rusc handles translation from Rust's Mid-level Intermediate Representation (MIR) directly to PTX, skipping reliance on LLVM or C intermediaries that add build steps and potential points of failure. NVIDIA also provides device-side abstractions for CUDA-specific concepts like thread blocks, shared memory, and warp-level operations, all wrapped in Rust's type system to deliver "safe(ish)" kernel development. The project's documentation notes that while Rust's ownership rules eliminate many common CUDA memory bugs like dangling pointers and out-of-bounds accesses, GPU memory safety remains a work in progress, so some unsafe blocks may still be required for low-level operations.

Twitter image

CUDA-Oxide is fully open-source, with the code hosted on NVIDIA Labs' GitHub repository at https://github.com/NVLabs/cuda-oxide. The 0.1 release is explicitly labeled alpha, with NVIDIA warning users to expect bugs, incomplete feature coverage, and API breaks as the project evolves. The team is actively soliciting community feedback to guide development priorities, particularly from developers working on GPU compute workloads in Rust.

Performance and Compatibility

As an early alpha release, NVIDIA has not published official benchmarks for CUDA-Oxide 0.1. However, the tool's architecture allows for clear performance and power consumption projections. Since CUDA-Oxide emits standard PTX assembly identical to what nvcc produces for equivalent C++ code, the final machine code (SASS) generated by the NVIDIA driver will match official CUDA kernels for the same workload. This means there is no performance overhead from language bindings or DSL abstractions, and power consumption will be identical to nvcc-compiled kernels. For homelab builders who track every watt of power draw and microsecond of kernel runtime, this is a critical advantage: you can write safer, more maintainable Rust code without sacrificing any GPU performance or adding unexpected power costs.

The table below compares CUDA-Oxide to existing CUDA development options for Rust and C++ developers:

Feature CUDA-Oxide 0.1 nvcc (C++ CUDA) rust-cuda (DSL)
Language Standard Rust C++ Rust DSL
FFI Overhead None None Low
Abstraction Overhead None (direct PTX) None 5-10%
Single-source Compilation Yes Yes No
Open Source Yes No Yes
Alpha Status Yes No No

Existing Rust CUDA DSLs often introduce 5-10% runtime overhead for complex workloads due to abstraction layers, which also increases power consumption by extending kernel run times. CUDA-Oxide avoids this entirely by outputting raw PTX. The only potential performance variable is the quality of the rusc backend's PTX generation, which is not yet optimized for all workloads in the 0.1 release. NVIDIA notes that backend optimizations are a priority for future releases.

Compatibility is limited to NVIDIA GPUs, as PTX is a NVIDIA-proprietary format. The 0.1 release targets GPUs supported by CUDA Toolkit 12.x, which includes all NVIDIA GPUs with compute capability 5.0 (Maxwell) and above. This covers consumer GPUs from the GTX 900 series to the RTX 40 series, as well as server-grade GPUs like the A100, H100, and L40S. The tool requires a nightly Rust toolchain to support the rusc backend, which relies on unstable compiler features. Users must also have the CUDA Toolkit 12.x installed to allow the NVIDIA driver to compile PTX to SASS, though CUDA-Oxide itself does not depend on nvcc, simplifying build pipelines for homelab CI/CD setups.

Build Recommendations for Homelab Users

CUDA-Oxide 0.1 is not ready for production use, but it is stable enough for experimental homelab projects. It is best suited for developers already familiar with Rust who want to write GPU compute kernels without switching to C++. Common homelab use cases include custom inference kernels for local LLM deployments, scientific computing workloads like weather modeling or protein folding simulations, GPU-accelerated data processing pipelines, and custom rendering tools.

To test the toolchain, start with a system running an NVIDIA GPU (GTX 900 or newer), install the nightly Rust toolchain via rustup, and install CUDA Toolkit 12.x from NVIDIA's official documentation at https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html. Clone the CUDA-Oxide repository from GitHub, follow the setup instructions in the README, and run the included example vector addition kernel to verify compatibility with your hardware. For power testing, compare the GPU draw of the Rust-compiled kernel to an equivalent nvcc-compiled C++ kernel using a tool like nvidia-smi: the numbers should match exactly, confirming no overhead from the Rust toolchain.

Avoid using CUDA-Oxide for mission-critical workloads until the project reaches a stable release, as API breaks and bugs are expected. For now, it serves as a valuable experimental tool for homelabbers who want to push Rust into their GPU compute stacks without compromising on performance or power efficiency.

This news was first reported by Phoronix at https://www.phoronix.com/news/CUDA-Oxide-0.1-Rust-CUDA.

Comments

Loading comments...