Linux Kernel Fix Corrects Five-Year-Old Page Fault Handling Bug That Could Cause Interrupt State Corruption

A subtle but critical bug in x86 page fault handling code that has persisted since Linux 5.8 in 2020 has been corrected. The issue, traced by Intel engineer Cedric Xing, involved improper interrupt disabling that could lead to inconsistent kernel state. The fix simplifies the logic by applying interrupt disabling unconditionally, resolving a complex interaction between address range checks and execution context.

A five-year-old bug in the Linux kernel's x86 page fault handling mechanism has been corrected, addressing a subtle issue that could lead to interrupt state corruption under specific conditions. The fix, developed by Intel engineer Cedric Xing and merged into the Linux 6.19 kernel, simplifies the interrupt disabling logic and resolves a fundamental misunderstanding in the original code's design.

The Core Problem: Interrupt State Asymmetry

The issue originated in the do_page_fault() function for x86 architecture, where the kernel handles memory access violations. The code contained logic to disable interrupts before entering page fault handling and re-enable them afterward, but the implementation had a critical flaw: it only attempted to restore interrupt state for user-space address faults, not kernel-space faults.

The original comment in the code stated: "User address page fault handling might have re-enabled interrupts. Fixing up all potential exit points of do_user_addr_fault() and its leaf functions is just not doable without creating an unholy mess or turning the code upside down."

However, as Xing explained in his patch, this reasoning was "subtly wrong." The confusion stemmed from conflating two separate concepts:

Address range (user vs. kernel addresses)
Execution context (user vs. kernel mode)

These two dimensions are independent. A user-space process accessing a kernel address (such as through a malicious pointer) would trigger __bad_area_nosemaphore(), which could re-enable interrupts during error handling. The original code only attempted to re-disable interrupts for user address faults, leaving kernel address faults with potentially corrupted interrupt state.

The Technical Details

The page fault handler follows this general flow:

Entry: Disable interrupts (if not already disabled)
Handle fault: Process the page fault, which may involve memory allocation, I/O, or other operations
Exit: Re-enable interrupts before returning

The problem occurred because certain code paths within the fault handler could re-enable interrupts, particularly during error handling. The original implementation attempted to track which paths might have done this and only re-disable interrupts for user address faults. This approach was incomplete and error-prone.

Xing's solution is elegantly simple: disable interrupts unconditionally before returning from the page fault handler, regardless of whether the fault was for a user or kernel address. This ensures consistent interrupt state and eliminates the need for complex tracking logic.

Historical Context and Impact

The problematic code was introduced in Linux 5.8 during the 2020 merge window through commit ca4c6a9858c2 ("x86/traps: Make interrupt enable/disable symmetric in C code"). This commit attempted to improve interrupt handling symmetry but inadvertently created the asymmetry it sought to prevent.

While the bug is subtle, its potential impact is significant. Inconsistent interrupt state could lead to:

Race conditions: Interrupts being enabled when they should be disabled, allowing unexpected hardware events to interrupt kernel execution
State corruption: Interrupts firing during sensitive operations that assume a particular interrupt context
Debugging complexity: Intermittent failures that are difficult to reproduce and diagnose

The fix has been backported to stable kernel series, ensuring that distributions and users running older kernels will receive the correction.

Broader Implications for Kernel Development

This fix highlights several important aspects of kernel development:

Comment accuracy: Comments that explain "why" code exists are crucial, but they must be accurate. The original comment's assertion that fixing all cases was "not doable" led to an incomplete solution.
Simplicity over complexity: The most robust solution often involves simplifying the logic rather than adding more complexity to handle edge cases. The unconditional interrupt disabling is simpler and more correct than the original tracking approach.
Cross-architecture considerations: While this fix is x86-specific, similar issues could exist in other architectures' fault handling code, suggesting a need for systematic review.
Long-lived bugs: This bug persisted for over five years, demonstrating how subtle issues can evade detection in complex systems. The fact that it was discovered by an Intel engineer suggests that hardware vendors are increasingly scrutinizing kernel code that interacts with their silicon.

Verification and Testing

The patch has been tested and reviewed by kernel maintainers, including Linus Torvalds, who merged it into the mainline kernel. The fix's simplicity makes it low-risk, as it doesn't introduce new logic but rather corrects an existing oversight.

For developers and system administrators, this serves as a reminder to:

Keep kernels updated, especially for long-running systems
Pay attention to kernel changelogs for subtle fixes that may affect stability
Consider that even mature, stable code can contain latent bugs

The Linux kernel's continuous improvement process, where engineers from various organizations (including hardware vendors like Intel) contribute fixes, ensures that such issues are eventually identified and corrected, maintaining the kernel's reliability for the millions of systems that depend on it.

Twitter image