Valve contractor Natalie Vock's latest Mesa contribution slashes RT pipeline compilation time by 90% in Unreal Engine 4 titles, eliminating stuttering and delivering measurable frame rate gains on AMD Radeon GPUs under Linux.
The AMD Radeon Vulkan driver (RADV) is about to get a massive boost for ray-tracing performance in Unreal Engine 4 games, and the numbers are staggering. A new merge request from Valve contractor Natalie Vock demonstrates a 10x improvement in ray-tracing pipeline compilation time, transforming a 4-minute 20-second process into just 20 seconds in titles like Ghostwire Tokyo and The Callisto Protocol.
The Problem: Inlined Shaders in Hot Loops
For the past year, RADV's ray-tracing implementation has been catching up to both AMD's now-defunct AMDVLK driver and NVIDIA's mature RT stack. While raw performance improved, compilation stutter remained a persistent issue. The root cause was architectural: UE4 games compile hundreds of ray-tracing shaders at runtime, and RADV was inlining all of them directly into the hottest loops of the compilation pipeline.
Vock's analysis reveals the performance penalty: "Who knew that inlining hundreds of shaders into an incredibly hot loop might be bad for performance?!" The merge request addresses this by using function calls to separate any-hit and intersection shader compilation, keeping the compilation loop lean and efficient.
Real-World Impact: From Stutter to Smooth
The user experience transformation is immediate. In Ghostwire Tokyo, a Fossilize capture that previously took 4 minutes 20 seconds to replay now completes in 20 seconds. More importantly, the stuttering that plagued these titles whenever a new RT pipeline compiled is completely eliminated.
But the benefits don't stop at compilation speed. Runtime performance sees dramatic gains as well. Vock's testing on a Radeon RX 7900 XTX shows Ghostwire Tokyo frame rates jumping from ~30 FPS to ~40 FPS—a 33% improvement. This brings RADV's performance in the tested scene roughly in line with Windows driver results.

Game-Specific Results and Limitations
The improvements are title-dependent. Games heavily utilizing multiple any-hit and intersection shaders see the biggest benefits. However, titles like Cyberpunk 2077 remain unaffected because they use only a single any-hit shader at maximum, making the compilation overhead negligible.
This specificity highlights the targeted nature of the optimization. It's not a universal ray-tracing boost, but rather a surgical fix for a specific bottleneck that happens to plague many popular UE4 titles.
Technical Implementation: Function Calls Over Inlining
The core change separates shader compilation into distinct function calls rather than monolithic inline operations. This approach:
- Reduces loop overhead: Each shader compilation becomes a discrete operation
- Improves cache locality: Smaller, focused compilation steps use CPU caches more efficiently
- Enables parallelization: Function boundaries make it easier to optimize compilation scheduling
The code is already generating "really cool stuff" beyond the initial compilation speedup, with Vock noting that the function call architecture opens doors for additional optimizations.
Release Timeline and Community Impact
The merge request is currently under review. With Mesa 26.0's feature freeze imminent, this code may miss the Q1 release window and instead land in Mesa 26.1 during Q2 2026. For Linux gamers running AMD hardware, this represents a significant validation of Valve's ongoing investment in the open-source graphics stack.
The broader pattern here is clear: RADV is rapidly closing the gap with proprietary drivers through targeted, community-driven optimization. Each contribution like Vock's represents another piece of the puzzle, transforming Linux from a second-class gaming platform into a competitive alternative for cutting-edge graphics workloads.
For homelab builders and performance enthusiasts, this underscores the importance of measuring everything. The difference between inlined and function-call-based compilation wasn't obvious until someone profiled the actual bottleneck and measured the impact of the fix. That's the kind of data-driven approach that turns "good enough" into "competitive with Windows."
The 10x speedup isn't just a number—it's the difference between a playable experience and a frustrating one, and it demonstrates that open-source drivers can match proprietary performance through careful analysis and optimization.

Comments
Please log in or register to join the discussion