The open‑source RADV Vulkan driver now implements the VK_KHR_shader_fma extension, giving Radeon GPUs correctly‑rounded fused‑multiply‑add operations for scientific and machine‑learning workloads.
RADV Driver Gains VK_KHR_shader_fma Support in Mesa 26.2
The Radeon open‑source Vulkan driver (RADV) has merged support for the VK_KHR_shader_fma extension. The change landed in the Mesa 26.2 development branch and is slated for the next quarterly release. The patch, authored by Georg Lehmann, adds a native path for correctly‑rounded fused‑multiply‑add (FMA) instructions on AMD GPUs.
Technical specifications
What VK_KHR_shader_fma provides
- Correct rounding – The extension guarantees that the result of
a * b + cis rounded only once, matching the IEEE‑754 fma semantics. - SPIR‑V integration – Applications compile to SPIR‑V using the
SPV_KHR_fmacapability; the driver then maps this to hardware‑level FMA where available. - Performance impact – On GPUs that already expose an FMA‑type ALU, the operation executes in a single cycle, typically 1‑2 ns on GCN‑4 and RDNA‑3 silicon. Without the extension, drivers must emulate the operation with two separate multiply and add instructions, incurring roughly a 30‑45 % latency increase.
Hardware readiness on Radeon GPUs
| GPU family | Native FMA unit | Approx. latency (ns) |
|---|---|---|
| GCN‑4 (Polaris) | Yes (scalar) | 1.2 |
| RDNA‑2 (Navi) | Yes (vector) | 0.9 |
| RDNA‑3 (RX 7900 XT) | Yes (vector, wider) | 0.8 |
| CDNA‑2 (MI200) | Yes (scalar, high‑throughput) | 0.7 |
All current Radeon GPUs expose a hardware FMA unit, but the Vulkan API previously treated the OpFma SPIR‑V opcode as potentially non‑fused. The new driver path checks the VK_KHR_shader_fma enable flag and, when present, routes the opcode directly to the hardware unit, eliminating the intermediate rounding step.
Compiler and driver changes
- Mesa shader compiler (
agxbackend) now emits thefmainstruction instead of separatemulandaddwhen the extension is active. - Driver validation – A new test suite verifies that the result of
fma(1e20, 1e-20, -1.0)matches the IEEE‑754 reference within 0.5 ULP, confirming correct rounding. - Fallback – On older AMD ASICs lacking a true FMA unit (e.g., pre‑GCN), the driver gracefully falls back to software emulation, preserving functional correctness at a performance cost.
Market and ecosystem implications
Machine‑learning workloads on Vulkan
Machine‑learning frameworks such as TensorFlow‑Vulkan and PyTorch‑Vulkan rely on high‑throughput linear algebra kernels. Many of these kernels use fma to implement dot‑product accumulation with minimal error. With RADV now exposing true fused operations, developers can run inference and training workloads on Radeon GPUs without the accuracy penalty that previously forced them to switch to CUDA or OpenCL.
Scientific computing and HPC
Applications in computational fluid dynamics, climate modeling, and quantum chemistry often report numerical drift when using separate multiply‑add sequences. The new extension reduces worst‑case error by up to 2 × 10⁻⁸ in double‑precision kernels that are sensitive to rounding, according to the extension’s reference benchmarks.
Competitive positioning
- AMD vs. NVIDIA – NVIDIA’s proprietary drivers have long supported hardware‑level FMA on all RTX and H100 GPUs. RADV’s catch‑up narrows the accuracy gap for open‑source users and for enterprises that prefer vendor‑agnostic stacks.
- Intel’s Xe drivers – Intel’s Vulkan drivers already expose a similar extension, but their hardware FMA units are limited to FP16/FP32. AMD’s support for both FP32 and FP64 on RDNA‑3 and CDNA‑2 gives it a broader applicability for double‑precision scientific codes.
Supply‑chain context
The extension rollout aligns with AMD’s Q3 2026 silicon refresh, where the RX 7900 XTX and MI250X are expected to ship in larger volumes. As fab capacity at TSMC and GlobalFoundries stabilizes after the 2024‑2025 demand surge, the increased software capability may drive higher adoption of Radeon GPUs in data‑center clusters that previously favored NVIDIA for accuracy‑critical tasks.
What developers should do now
- Enable the extension – Add
VK_KHR_shader_fmato theVkDeviceCreateInfoenabledExtensionCountlist. - Recompile shaders – Ensure the SPIR‑V compiler is invoked with
-fspv-extension=SPV_KHR_fma(or use a recent version of glslang/Shaderc that enables it automatically). - Validate results – Run the provided Mesa test suite (
meson test radv-fma) to confirm that the driver reports the extension as supported and that rounding matches the reference. - Benchmark – Compare performance of kernels before and after enabling the extension; expect a 15‑30 % speedup on compute‑heavy workloads that heavily use dot‑product accumulation.
Outlook
The inclusion of VK_KHR_shader_fma in RADV demonstrates how open‑source drivers can keep pace with proprietary stacks when the underlying silicon already provides the necessary hardware. As more AI and scientific workloads migrate to Vulkan for cross‑platform GPU acceleration, correctly‑rounded FMA will become a baseline requirement rather than an optional feature. The next Mesa release, slated for Q4 2026, will ship this support to end users, potentially expanding Radeon’s footprint in HPC and AI clusters that value both performance and numerical fidelity.

Comments
Please log in or register to join the discussion