A significant restructuring of the Linux kernel's swap subsystem replaces complex XArray-based tracking with streamlined swap tables, yielding 5-20% performance gains while paving the way for future optimizations.
The Linux kernel's memory management subsystem relies heavily on its swap mechanism to handle anonymous memory—data structures and variables without inherent backing storage. When RAM becomes scarce, this subsystem writes infrequently used pages to swap files on slower storage devices. For decades, this critical function has been managed through increasingly intricate layers of complexity. Kernel developers Kairui Song and Chris Li recently spearheaded a modernization effort, with the first phase landing in Linux 6.18: the introduction of swap tables.
Prior to Linux 6.18, the swap subsystem employed a dual-layered approach to track page status across swap files. Swap files were segmented into 64MB chunks, each managed by an address_space structure containing an XArray—a radix-tree-like data structure storing pointers to resident pages or shadow entries for swapped-out content. Simultaneously, swap clusters (typically 2MB groupings) enabled CPU-local caching to reduce global lock contention. This design required navigating multiple indirections: locating the correct XArray within a swap file's array of address_space structures, then querying the XArray to determine a page's status.
The new architecture eliminates this complexity by consolidating tracking within existing swap clusters. Each swap_cluster_info structure now includes a dynamically allocated table array—one page-sized block holding swp_entry_t values (which encode swap file indexes and slot positions). This array replaces the XArrays entirely. When a page enters the swap cache, its status is recorded directly in the cluster's table using the familiar swp_entry_t format. The previous 64MB chunks and their associated address_space arrays are removed entirely, replaced by a single swap_space structure per swap file.
This consolidation yields concrete benefits. By binding status tracking to the naturally localized swap clusters (which CPUs already manage independently), the kernel reduces cross-core synchronization and eliminates costly XArray traversals. Benchmarks show throughput improvements of 5-20% for I/O-heavy workloads and build tasks, attributable solely to reduced lock contention and streamlined data access. Memory overhead also decreases: the per-cluster tables allocate only when needed, avoiding fixed allocations for underutilized swap areas.
The change exemplifies incremental but impactful kernel evolution. Swap tables build upon existing swap cluster infrastructure rather than reinventing it, demonstrating how targeted refactoring can extract significant performance from mature subsystems. While this marks a substantial improvement, it's merely phase one of Song's broader initiative. Subsequent work aims to further simplify swap handling, with patches already circulating for future kernel versions. These changes underscore Linux's capacity for continual optimization—even in foundational components once deemed too complex to touch.
Comments
Please log in or register to join the discussion