The Mesa NVK Vulkan driver has disabled larger page support after encountering MMU faults during Vulkan conformance testing, with a kernel patch already in development to address the race condition.
The Mesa NVK open-source Vulkan driver has temporarily disabled support for larger memory pages after encountering critical issues during Vulkan conformance testing. This development affects users running the latest Nouveau kernel driver with Linux 6.19, which had recently gained support for larger pages and associated compression capabilities.
The Larger Page Performance Promise
Larger memory pages were introduced to the Nouveau kernel driver in Linux 6.19, bringing with them the potential for significant performance improvements through memory compression. The Mesa NVK driver quickly adopted this feature, as larger pages can reduce translation lookaside buffer (TLB) pressure and improve memory access patterns for graphics workloads.
However, this promising enhancement has hit a roadblock. A bug report filed three weeks ago revealed that Vulkan conformance test suite (CTS) tests were experiencing MMU faults when using larger device pages. These faults were occurring at valid memory addresses, indicating a deeper synchronization issue within the memory management subsystem.
The Root Cause: A Race Condition
The issue stems from a race condition that occurs when transitioning between 64KB pages and 4KB pages. David Airlie, a prominent Linux graphics developer, has identified the specific problem in a patch posted to the Nouveau kernel driver.
The race condition manifests when unmapping a 4KB page, mapping a 64KB page, and then unreferencing the 4KB pages. During this sequence, the dual-page table handling can incorrectly set the Large Page Table Entry (LPTE) to SPARSE or INVALID states. If a valid LPTE has been mapped in the meantime, this reset operation would corrupt the valid entry.
The Proposed Fix
Airlie's patch addresses this by implementing tracking to determine if an LPTE has been validly referenced. The solution involves:
- Adding 32-bit tracking to monitor LPTE references
- Preventing reset operations when a valid LPTE has been referenced
- Handling cases where unref operations can be delayed, potentially leaving many outstanding references
Airlie notes in his patch message that this tracking increase to 32-bit is necessary because delayed unref operations can lead to unusual behaviors when many references are outstanding simultaneously.
Current Status and Timeline
While the patch provides a technical solution, its integration timeline remains uncertain. The Linux 6.19 kernel release is scheduled in approximately one and a half weeks, creating a tight window for review and inclusion. The patch has already been marked for CC (carbon copy) to ensure it can be back-ported to stable kernel releases once it lands in the mainline kernel tree.
In the meantime, NVK has disabled larger page usage entirely to prevent the MMU faults from affecting users. This means that until the fix is deployed, users won't benefit from the performance improvements that larger pages and compression support could provide.
Impact on Nouveau Performance
For users curious about how Nouveau and NVK perform with current upstream code, recent benchmarks comparing open-source Nouveau/Mesa drivers against NVIDIA's proprietary 580 Linux drivers across the GTX 980 through RTX 5080 range provide valuable context. These benchmarks, conducted in December, show the current state of open-source driver performance before the larger page enhancements.
Looking Forward
The temporary disabling of larger page support highlights the challenges in developing robust graphics drivers, particularly when dealing with complex memory management features. While the performance benefits of larger pages are significant, ensuring stability and correctness in the MMU handling is paramount.
The rapid identification and proposed solution by Airlie demonstrates the active development and quick response capabilities of the open-source graphics community. Once the patch is integrated and larger page support is re-enabled, users can expect to see the promised performance improvements return to NVK.
For now, NVK users should expect stable but potentially suboptimal performance until the larger page support is restored with the proper synchronization fixes in place.

Comments
Please log in or register to join the discussion