Experimental Linux Kernel Code Shows 34% Memory Access Boost with 1GB PUD-Level THPs

New experimental Linux kernel patches implementing 1GB Page Upper Directory (PUD) Transparent Huge Pages (THP) demonstrate significant performance improvements, reducing memory access times by 34% compared to current 2MB huge pages, though upstream developers urge caution due to complexity concerns.

Experimental Linux kernel patches implementing 1GB Page Upper Directory (PUD) Transparent Huge Pages (THP) are showing promising performance results, with benchmark tests demonstrating 34% faster memory access times compared to current 2MB huge pages. The work, posted by Usama Arif as a request for comments (RFC) patch series, aims to provide applications with the benefits of reduced Translation Lookaside Buffer (TLB) pressure without requiring the use of hugetlbfs.

The Problem with Current 1GB Huge Pages

While hugetlbfs already provides 1GB huge pages in Linux, it comes with significant limitations that make it unsuitable for many workloads:

Static Reservation: Requires pre-allocating huge pages at boot or runtime, taking memory away permanently
No Fallback: If a 1GB huge page cannot be allocated, hugetlbfs fails rather than falling back to smaller pages
No Splitting: Pages cannot be split when only partial access is needed, leading to memory waste
Memory Accounting: Memory is accounted separately and cannot be easily shared with regular memory pools

How PUD THP Solves These Issues

The proposed PUD THP approach integrates 1GB pages into the existing THP infrastructure, solving these limitations by:

Allowing dynamic allocation without pre-reservation
Providing fallback mechanisms when 1GB pages aren't available
Enabling page splitting for partial access scenarios
Integrating memory accounting with regular memory pools

Benchmark Results

Testing on an Intel Xeon Platinum 8321HC processor showed impressive results for a true random memory access workload using 4GB memory region with pointer chasing:

Metric	PUD THP (1GB)	PMD THP (2MB)	Change
Memory access	88 ms	134 ms	34% faster
Page fault time	898 ms	331 ms	2.7x slower

While page faulting 1GB pages is 2.7x slower due to the complexity of allocating large contiguous memory regions, the developers note this is a one-time cost for long-running workloads, with the 34% improvement in access latency providing significant ongoing benefit.

Upstream Concerns and Next Steps

Despite the promising benchmark results, upstream kernel developers have expressed caution about the patch series. Oracle engineer Lorenzo Stoakes highlighted several concerns:

The work appeared unexpectedly without prior discussion in the THP community
PUD THP requires pages that the page allocator can't easily provide, likely involving CMA (Contiguous Memory Allocator)
Questions about interaction with existing features like khugepaged, MADV_COLLAPSE, and mTHP
The THP codebase needs significant rework before adding major new features

Stoakes emphasized proceeding with caution, suggesting the RFC tag remain until concerns are addressed, and noting that the THP code base is in "dire need of rework" before adding major new features.

Technical Context

The development comes as Linux continues to evolve its memory management capabilities. The current THP implementation uses Page Middle Directory (PMD) level for 2MB pages, while this work proposes moving to PUD level for 1GB pages, representing a significant architectural change.

For system administrators and developers working with memory-intensive applications, this work could eventually provide a more flexible alternative to hugetlbfs while delivering better performance. However, given the upstream feedback, it may be some time before these patches are considered for mainline inclusion.

The patch series represents an interesting exploration of how Linux can better handle large memory pages in modern systems with terabytes of RAM, where TLB pressure becomes increasingly important for performance. As the discussion continues, the Linux community will need to balance the performance benefits against the complexity of implementation and maintenance.

LINUX KERNEL