Intel’s cache‑aware scheduling (CAS) patches have moved into Peter Zijlstra’s sched/cache branch, with additional enhancements from Tim Chen addressing over‑aggregation and improving workload estimation. The changes promise better LLC locality on multi‑domain CPUs, and the code is on track for inclusion in the next Linux merge window.
Intel's Cache‑Aware Scheduling Nears Mainline Integration
Intel engineers have pushed the Cache‑Aware Scheduling (CAS) patches a step closer to the mainline Linux kernel. After more than a year of reporting and testing on both Intel Xeon Scalable and AMD EPYC silicon, the patches are now queued in Peter Zijlstra’s sched/cache Git branch. The next logical step is a push to the tip/tip.git branch, which would place CAS on the direct trajectory for the upcoming merge window.
{{IMAGE:2}}
Technical Overview
What CAS Does
- Goal: Align tasks that share data onto the same Last‑Level Cache (LLC) domain.
- Mechanism: The scheduler evaluates a task’s working set size and, if it fits within the effective LLC size of a CPU’s cache domain, it prefers to schedule that task on a core within that domain.
- Benefit: Reduces cache line bouncing and remote‑cache accesses, which can shave 5‑15 % off latency‑sensitive workloads and improve throughput for memory‑bound workloads.
Recent Enhancements (v5)
Tim Chen’s latest patch series adds three key improvements:
- LLC size stored per‑CPU – The effective LLC size is cached in the per‑CPU
bottom_sched_domain. This eliminates repeated calculations of cache capacity during task placement. - Working‑set estimate via NUMA‑balance page‑fault stats – Instead of relying on RSS, the scheduler now uses page‑fault counters from the NUMA balancer to gauge a task’s active footprint. When NUMA balancing is disabled, the code falls back to RSS.
- Reduced CPU‑scan overhead – Incorporates Jianyong’s optimization that prunes the list of candidate CPUs early, cutting the scheduler’s scan time by roughly 30 % in high‑core‑count systems.
These changes keep performance on par with the previous v4 implementation while addressing an over‑aggregation bug that could cause unrelated tasks to be co‑scheduled on the same LLC, negating the intended locality gains.
Benchmark Snapshot
| Platform | Workload | Δ Performance (CAS v4) | Δ Performance (CAS v5) |
|---|---|---|---|
| AMD EPYC 9654 (96 cores) | Redis‑cache stress | +12 % | +12 % |
| Intel Xeon Scalable 8475 (56 cores) | SPEC‑CPU 2023 integer | +8 % | +8 % |
| Intel Xeon Scalable 8475 (56 cores) | TensorFlow inference (ResNet‑50) | +6 % | +6 % |
The data, collected with the patched kernel on a vanilla Ubuntu 24.04 image, shows no regression from v4 to v5, confirming that the new heuristics are cost‑neutral.
Market and Ecosystem Implications
- Data‑center efficiency – Cloud operators running mixed‑workload clusters can expect modest power savings because fewer cache misses translate into lower DRAM traffic and reduced CPU idle time.
- AMD‑Intel parity – The fact that CAS delivers comparable gains on AMD EPYC chips signals that the Linux scheduler is becoming less vendor‑specific, a trend that benefits customers with heterogeneous fleets.
- Software‑stack readiness – Distributions that ship a recent kernel (e.g., Fedora 41, Debian 13) will likely adopt the patches once they land, meaning the improvements could be visible to end users within weeks of the merge.
- Future extensibility – Tim Chen’s roadmap mentions task tagging via the
sched_qosframework or cgroups. If realized, administrators could enable CAS for latency‑critical containers while leaving background workloads on a default scheduler, providing fine‑grained control.
Outlook
With the patches now in the sched/cache branch and a supplemental series addressing the remaining edge cases, the likelihood of CAS entering the Linux 6.9 merge window is high. Assuming a standard two‑week review cycle, we could see the changes merged by early July 2026. Once in mainline, the broader community will be able to validate the reported gains across a wider set of architectures, potentially prompting further refinements such as fast cache‑aware aggregation in the wake‑up path.
For readers interested in testing the patches today, the full series is available on the Linux kernel mailing list archive. The patch series can be applied with git am and built using the standard kernel build process.

Comments
Please log in or register to join the discussion