#Vulnerabilities

Copy Fail: The Linux Kernel Vulnerability That Corrupted Memory Without Race Conditions

Tech Essays Reporter
5 min read

CVE-2026-31431 represents a profound vulnerability in the Linux kernel's cryptographic subsystem that has existed since 2017, allowing unprivileged users to deterministically write 4 bytes into any readable file's page cache, leading to complete system compromise.

In the ever-evolving landscape of Linux kernel security, vulnerabilities occasionally emerge that challenge our understanding of memory safety and system boundaries. CVE-2026-31431, named 'Copy Fail' by its discoverers at Theori, represents such an anomaly—a vulnerability that persisted undetected for eight years, affecting every major Linux distribution shipped since 2017, and yet operates through principles fundamentally different from its predecessors.

At its core, Copy Fail is a local privilege escalation vulnerability that exploits a logic flaw in the kernel's authencesn cryptographic template. Through a deterministic sequence of system calls involving AF_ALG sockets and splice(), an unprivileged user can write exactly 4 bytes into the page cache of any readable file. This capability, while seemingly limited, allows for complete system compromise through targeted corruption of critical binaries or configuration files.

The vulnerability's mechanics are particularly fascinating when compared to its more famous predecessor, Dirty COW (CVE-2016-5341). While Dirty COW relied on a race condition in the copy-on-write fault handler, Copy Fail operates without any race conditions whatsoever. It follows a deterministic path through the kernel's cryptographic subsystem, which lacks the virtual memory subsystem's understanding of page ownership and permissions. This distinction is crucial—it explains why memory safety tools like KASAN failed to detect the vulnerability despite years of testing.

The page cache, a system-wide in-memory cache of file data organized by (inode, offset) pairs, forms the foundation of this vulnerability. When multiple processes read the same file, they share the same physical pages in the cache. Corrupting these pages affects all readers of the file, regardless of their container, cgroup, or user namespace boundaries. This shared nature makes Copy Fail particularly dangerous in containerized environments, where containers often share filesystem layers with the host.

The vulnerability's exploitation chain involves several sophisticated kernel components. The scatterlist data structure, which represents discontiguous buffers in the kernel, plays a critical role. When combined with splice(), which enables zero-copy I/O by passing page references between file descriptors and pipes, the crypto subsystem gains direct access to page cache pages. The authencesn algorithm, designed for IPsec ESP with Extended Sequence Numbers, then performs a byte rearrangement that writes 4 bytes past the ciphertext boundary into what it believes is scratch space—but which, under specific conditions, becomes a page cache page.

The specific conditions required for this vulnerability to manifest reveal an elegant confluence of design decisions and optimizations. In 2017, an optimization was added to the AF_ALG AEAD interface to operate in-place by pointing both source and destination scatterlists to the same memory. This avoided an allocation and copy, improving performance. However, when combined with splice()—which passes page cache references into the crypto socket—the in-place optimization created a boundary problem. The authencesn algorithm writes to dst[assoclen + cryptlen], which in normal operation points to the authentication tag region of a kernel-allocated buffer. With the in-place optimization and splice(), this same offset now pointed to page cache pages from the source file.

What makes authencesn uniquely vulnerable is its specific handling of Extended Sequence Numbers in IPsec. Unlike other AEAD algorithms, authencesn rearranges bytes during decryption, using the destination buffer as scratch space. This rearrangement includes writing 4 bytes at dst[assoclen + cryptlen], which under normal conditions is harmless. However, when the destination scatterlist contains page cache pages from splice(), this write corrupts the page cache of the source file.

The exploit itself is remarkably elegant in its simplicity. A 732-byte Python script can corrupt /usr/bin/su in memory by repeatedly calling the write4 primitive, which overwrites 4 bytes at a time in the target file's page cache. After sufficient iterations, the binary's in-memory image contains attacker-controlled code, which executes when the binary is run. The chmod 4711 "mitigation" that circulated after disclosure proved ineffective, as the vulnerability affects any readable file, not just setuid binaries. Alternative targets include /etc/passwd, where overwriting a user's UID field can grant root privileges.

The patch, implemented in commit a664bf3d603d, is conceptually straightforward: revert the 2017 in-place optimization. By keeping source and destination scatterlists separate, the authencesn scratch write lands on harmless user memory rather than page cache pages. This fix, while simple in principle, represents a significant performance regression for a feature that had been optimized for eight years.

The implications of Copy Fail extend beyond the immediate technical details. It highlights the inherent tension between performance optimizations and security in complex systems. The kernel's crypto subsystem, designed for maximum flexibility and performance, naturally introduces edge cases that can be exploited when combined with other subsystems. The vulnerability also demonstrates the limitations of memory safety tools, which focus on detecting buffer overflows and use-after-frees rather than ownership violations.

For containerized environments, Copy Fail presents a particularly stark reminder of the blurred boundaries between containers and hosts. When containers share filesystem layers with the host, page cache corruption affects both container and host processes simultaneously. This makes traditional container security models insufficient protection against vulnerabilities that exploit shared kernel resources.

The discovery of Copy Fail after eight years of silent operation raises questions about the effectiveness of current security testing methodologies. The vulnerability required a specific combination of rarely-used features (AF_ALG with authencesn, splice(), and the in-place optimization) to manifest, explaining why it remained undetected. This suggests that kernel security testing must evolve to better explore the interaction between subsystems rather than focusing on individual components in isolation.

As we reflect on Copy Fail, we recognize it not merely as a technical curiosity but as a profound lesson in systems security. It demonstrates how seemingly innocuous optimizations, when combined with other features, can create vulnerabilities that defy conventional security models. The vulnerability also underscores the importance of maintaining a healthy skepticism toward performance optimizations in security-critical code paths.

For system administrators and security professionals, the response to Copy Fail is clear: patch promptly. For kernel developers, the vulnerability serves as a reminder that security must be considered at every layer of the system, particularly in subsystems as complex as the crypto API. And for the broader security community, Copy Fail represents a fascinating case study in how vulnerabilities can hide in plain sight, waiting for the right combination of conditions to manifest.

The official Copy Fail landing page and Xint Code Research Team's technical writeup provide additional details for those interested in exploring the vulnerability further. The PoC repository offers practical insights into the exploitation technique, while the kernel commits (vulnerability introduction) and a664bf3d603d (fix) document the technical evolution of the issue.

Comments

Loading comments...