A patch queued for Linux 6.20/7.0 will restrict modern CPU architectures to only two preemption modes - full and lazy - effectively retiring PREEMPT_NONE and PREEMPT_VOLUNTARY for x86, ARM64, RISC-V, POWER, LoongArch, and s390. This change aims to simplify the kernel scheduler and eliminate legacy workarounds that have accumulated over years of development.
The Linux kernel's scheduler is getting a fundamental cleanup that will affect every modern server, workstation, and embedded device running recent hardware. A patch from Intel's Peter Zijlstra, now sitting in the sched/core branch of the TIP tree, is poised to restrict preemption models on up-to-date CPU architectures to just two options: full preemption (PREEMPT_RT) and lazy preemption (PREEMPT_LAZY).

What's Actually Changing
Currently, the Linux kernel offers four preemption models:
- PREEMPT_NONE: No preemption at all, used for extreme throughput scenarios
- PREEMPT_VOLUNTARY: Limited preemption points, the historical default for many distributions
- PREEMPT_LAZY: Preemptible except for critical sections marked as "lazy"
- PREEMPT_FULL: Fully preemptible kernel, lowest latency
The patch effectively deprecates the first two for modern architectures. According to Zijlstra's commit message, PREEMPT_NONE will remain available only for "architectures that do not support preemption at all" - think legacy embedded platforms or specialized hardware. PREEMPT_VOLUNTARY will be limited to architectures that haven't yet implemented PREEMPT_LAZY support, with the explicit goal of eventually removing it entirely.
This leaves x86/x86_64, ARM64, RISC-V, POWER, LoongArch, and s390 with only PREEMPT_LAZY and PREEMPT_FULL.
Why This Matters: The Technical Debt
Zijlstra's patch addresses years of accumulated complexity. The original PREEMPT_RT (real-time) patchset introduced PREEMPT_LAZY to solve a specific problem: PREEMPT_RT suffered from "over-scheduling," where the aggressive preemption actually hurt performance compared to non-RT kernels. The real-time model was preempting too frequently, causing excessive context switches and cache invalidations.
More critically, the kernel has been accumulating "horrible hacks" to work around the limitations of non-preemptible code paths. Zijlstra specifically mentions:
- folio_zero_user(): This function can perform large memset() operations without preemption checks, but required special handling for Xen hypercalls that could run too long.
- Uncontrolled cond_resched() sprinkling: Developers have been adding conditional reschedule points throughout the kernel, often "cargo cult" style - copying patterns without understanding why they're needed, or as quick fixes for hard-to-reproduce workload issues.
The fundamental problem is that the kernel has been trying to support both throughput-optimized and latency-optimized scenarios through a patchwork of special cases rather than a coherent model.
The Lazy Preemption Compromise
PREEMPT_LAZY represents a middle ground that Zijlstra and the scheduler team believe is optimal for most modern workloads:
- Critical sections remain atomic: Code paths that absolutely cannot be interrupted (like interrupt handlers, scheduler internals, and memory management primitives) stay non-preemptible.
- Everything else is preemptible: Unlike PREEMPT_NONE or VOLUNTARY, there's no need for manual cond_resched() points in most code.
- Reduced overhead: Compared to PREEMPT_FULL, you avoid the overhead of preempting at every possible point while still maintaining reasonable latency.
For server workloads, this means better throughput than full preemption while avoiding the latency spikes that can occur when a CPU gets stuck in a long-running non-preemptible kernel path.
Impact on Distributions and Users
The patch notes that "Lazy has been the recommended setting for a while, not all distributions have managed to make the switch yet." This is a polite way of saying that many distributions have been slow to adopt PREEMPT_LAZY despite its availability.
For distribution maintainers: This change essentially forces the issue. If you're building for modern hardware, you'll need to choose between lazy and full preemption. The "safe" default of voluntary preemption is going away.
For server deployments: Most will want PREEMPT_LAZY. It provides the latency characteristics needed for responsive systems without sacrificing the throughput that matters for data center economics. Only specialized low-latency trading or real-time applications should need PREEMPT_FULL.
For developers: The elimination of cond_resched() and the assumption of fundamental preemptability will simplify code. You won't need to worry about adding manual preemption points or special-case handling for long-running operations.
What Could Go Wrong
Zijlstra explicitly states the patch is being kept "minimal in case of hard to address regressions that might pop up." This is acknowledging that despite the theoretical benefits, there's always the possibility of unexpected behavior:
- Hidden dependencies: Some code might have been relying on the timing characteristics of voluntary preemption.
- Performance regressions: Workloads that benefited from the specific scheduling patterns of PREEMPT_NONE might see changes.
- Hardware-specific issues: While the supported architectures are modern, there could be edge cases in specific CPU models or configurations.
Timeline and Next Steps
The patch is currently in the TIP tree's sched/core branch. Barring objections or discovered regressions, it's targeted for the Linux 6.20 cycle. Given the historical naming pattern, 6.20 might actually be branded as Linux 7.0, representing a significant milestone.
This isn't just version number theater - it represents a fundamental shift in how the kernel approaches the throughput-versus-latency tradeoff. By removing options that have become technical debt, the kernel team is betting that PREEMPT_LAZY can serve the vast majority of use cases while simplifying maintenance and reducing complexity.
For homelab builders and performance enthusiasts, this is something to watch. When 6.20/7.0 hits your distribution's kernel packages, you'll want to verify that your workloads perform as expected. Most should see either improvements or no change, but the elimination of PREEMPT_NONE and VOLUNTARY means there's no going back if you were one of the few who needed those specific configurations.
The kernel is getting simpler, but it's also getting more opinionated about what "good" scheduler behavior looks like on modern hardware.

Comments
Please log in or register to join the discussion