NVIDIA engineers have identified a severe ~2x performance regression in CPU workloads on their Vera Rubin platform, prompting urgent Linux kernel scheduler patches to fix SMT-aware asymmetric CPU capacity handling.
NVIDIA engineers have identified a critical performance regression affecting CPU-intensive workloads on their upcoming Vera Rubin platform, with Linux scheduler patches now under review to address a staggering ~2x performance drop. The issue stems from how the Linux kernel handles Simultaneous Multi-Threading (SMT) in asymmetric CPU capacity scenarios, particularly relevant for NVIDIA's next-generation hardware.
The problem was discovered by NVIDIA Linux engineer Andrea Righi, who found that when SMT is enabled on Vera Rubin, the kernel's scheduler fails to properly account for partially-busy SMT siblings. This leads to suboptimal thread placement decisions that can severely impact performance. According to Righi's analysis, the firmware exposes small frequency variations of approximately +/-5% as differences in CPU capacity, which triggers the SD_ASYM_CPUCAPACITY scheduling policy. However, this policy doesn't consider the state of SMT siblings, resulting in the observed performance degradation.
To address this, Righi has proposed a series of patches that modify the scheduler's behavior when SMT is active. The key change is that the scheduler will now prefer fully-idle cores over partially-idle ones, avoiding the scenario where threads are placed on cores that share execution units with busy siblings. This approach aims to maximize the available computational resources for each thread.
Righi evaluated several alternative approaches before settling on the SMT-aware solution. These included equalizing CPU capacities by exposing uniform values via firmware (ACPI/CPPC), normalizing capacities in the kernel by grouping CPUs within a small capacity window (+/-5%), or enabling asympacking. However, adding explicit SMT awareness to the SD_ASYM_CPUCAPACITY policy showed the best results in testing.
The patches are particularly important because they're not just a fix for Vera Rubin but represent a general improvement to the Linux scheduler. Righi notes that other platforms in the future may enable SMT with asymmetric CPU topologies, making this enhancement broadly applicable. The work demonstrates NVIDIA's commitment to optimizing Linux performance on their hardware, even at the scheduler level.
These patches are currently out for review on the Linux Kernel Mailing List (LKML) and are expected to be included in an upcoming mainline kernel release, possibly Linux v7.1. This timeline is crucial as it allows the fixes to be integrated before Vera Rubin hardware reaches data centers at scale, preventing widespread performance issues.
For Linux users and data center operators planning to deploy NVIDIA Vera Rubin systems, these scheduler improvements will be essential for achieving optimal CPU performance. The ~2x performance gap represents a significant difference in computational efficiency, and having these patches in place before hardware availability ensures a smoother transition to the new platform.
This development highlights the complex interplay between hardware design, firmware implementation, and operating system scheduling. As CPUs continue to evolve with features like asymmetric capacities and SMT, the Linux kernel must adapt to handle these scenarios optimally. NVIDIA's proactive approach in identifying and addressing these issues demonstrates the importance of hardware-software co-optimization in modern computing platforms.
The Vera Rubin platform, named after astronomer Vera Rubin, represents NVIDIA's continued investment in high-performance computing hardware. By addressing these scheduler issues early, NVIDIA is ensuring that Linux users will be able to extract maximum performance from their Vera Rubin systems when they become available, maintaining Linux's competitiveness in high-performance computing environments.
For developers and system administrators, this situation underscores the importance of staying current with kernel updates, especially when deploying new hardware platforms. The scheduler changes may require testing to ensure they provide the expected benefits in specific workloads and configurations.

Comments
Please log in or register to join the discussion