Copy Fail: From Pod to Host – How a 4‑Byte Page‑Cache Write Breaks Container Isolation
#Vulnerabilities

Copy Fail: From Pod to Host – How a 4‑Byte Page‑Cache Write Breaks Container Isolation

Trends Reporter
5 min read

Copy Fail (CVE‑2026‑31431) lets an unprivileged container rewrite any cached file on a Linux node with a deterministic 4‑byte write. By hijacking the shared page cache, attackers can poison shared image layers, compromise co‑located pods, or escape to host root. The article explains the underlying kernel bug, demonstrates two attack paths, and outlines detection and mitigation strategies.

Copy Fail: From Pod to Host – A Deep Dive

Published by Xint, May 19 2026


Why this vulnerability matters

When a container runs on a Kubernetes node it still relies on the host kernel’s page cache – a shared in‑memory representation of file data. The newly disclosed CVE‑2026‑31431, dubbed Copy Fail, gives an attacker a reliable 4‑byte write primitive that directly mutates those cached pages. Unlike classic kernel exploits that require code execution inside the kernel, Copy Fail merely rewrites data that any process can later read. The result is a powerful, portable privilege‑escalation primitive that works across container boundaries without touching the underlying disk.


The mechanics – a quick technical sketch

  1. Trigger point – The bug lives in the IPSec ESP extended‑sequence‑number handling code (authencesn). This code is reachable from user space via AF_ALG sockets, the interface to the kernel’s cryptography subsystem.
  2. Splice‑based trick – An attacker creates a zero‑length pipe, calls splice(2) into the AF_ALG socket, and convinces the kernel that it is decrypting a normal packet. The kernel, however, ends up operating on a mutable reference to a page‑cache folio instead of a disposable buffer.
  3. Four‑byte overwrite – The malformed ESP sequence shuffles bytes such that the kernel writes exactly four bytes of attacker‑controlled data into the targeted folio.
  4. Shared visibility – Because every file descriptor that maps the same i_mapping (the inode’s address space) points to the same folio, any process – even in a different container – will see the modified bytes on its next read.

The key insight is that containers do not have separate page caches; they only have separate mount namespaces. When two containers share a lower‑layer image file (e.g., a Python module in python:3.12‑slim), they also share the underlying folio. Copy Fail writes directly into that shared folio, bypassing overlayfs copy‑up and leaving the on‑disk inode untouched.

Copy Fail: From Pod to Host. - Xint


Two practical attack scenarios

1. Cross‑container poisoning

Threat model: An attacker controls a pod (or can create a pod) but has no privileged capabilities on the node.

Steps:

  1. Identify a file that lives in a lower‑layer shared across many workloads – a Python module, a common shared library (libc.so.6), or a binary used by a sidecar.
  2. Use the Copy Fail primitive to write four bytes that either corrupt the file or inject a small payload. By chaining multiple 4‑byte writes, larger patches are possible.
  3. When any other pod on the same node accesses the file, the poisoned bytes are served from the page cache, causing the victim workload to execute attacker‑controlled code.

Why it’s stealthy: The on‑disk image never changes, so image‑registry scanners (Trivy, Clair) and offline hash‑based integrity tools see nothing. Only a runtime EDR that hashes in‑memory pages could spot the modification.

2. Container‑to‑host escape

Threat model: An unprivileged container wants a root shell on the host.

Chain (inspired by the Dirty Pipe escape):

  1. Force a host‑side runc executionkubectl exec or a pod restart causes the node’s runc binary to be invoked from inside the container via a bind‑mount.
  2. Locate the host runc process from inside the container (its /proc/<pid>/exe points to the host inode).
  3. Poison the runc ELF header using the 4‑byte write primitive. The cached pages now contain a tiny malicious ELF payload.
  4. Wait for the next runc exec – the host will map the poisoned pages and run the attacker’s code as root.

The exploit works on every kernel version from the 2017 in‑place commit (72548b093ee3) up to the 2026 fix (a664bf3d603d). The PoC yields a reverse shell that appears under the host’s container‑runtime state directory, confirming that the attacker has escaped the container namespace.

Compromised Python Site Packages


Detection – what actually catches the attack?

Detection method Effectiveness
Image‑registry scanning (Trivy, Clair) ❌ No change on disk
Agent‑less disk or snapshot scanners ❌ Files unchanged on storage
Traditional file‑integrity tools (AIDE, Tripwire) ❌ Hashes match on‑disk inode
Runtime EDR that hashes resident pages ✅ Can detect altered folios
Execve monitoring (argv, child processes) ⚠️ Only sees post‑write behavior
Seccomp profile blocking socket(AF_ALG, …) ✅ Removes the primitive entirely
gVisor / runsc, Kata Containers, Fargate micro‑VMs ✅ Separate kernel & page cache
Patched host kernel (a664bf3d603d) ✅ Root‑cause fix

Mitigation checklist for operators

  1. Apply the kernel patch – Pull the latest node‑image or run an in‑place update that includes commit a664bf3d603d.
  2. Block AF_ALG – Add socket(AF_ALG, …) to the pod seccomp deny list. Most workloads do not need kernel‑level crypto sockets.
  3. Consider stronger isolation – For multi‑tenant clusters, run untrusted workloads in gVisor, Kata, or Fargate‑style micro‑VMs where the page cache is not shared.
  4. Deploy runtime‑aware EDR – Tools that can hash in‑memory pages of long‑running processes (e.g., python3 after import) are currently the only practical detection method.
  5. Audit layer reuse – Limit the number of high‑value base images shared across tenants, or enforce node‑affinity policies that prevent a malicious pod from landing on the same node as a high‑value workload.

Community resources


Closing thoughts

Copy Fail shows that shared kernel resources can be as dangerous as shared memory. The page cache, a performance optimisation, becomes an invisible attack surface when a deterministic write primitive is available. The vulnerability does not require a full kernel exploit, does not leave disk artefacts, and works across namespace boundaries. Operators should treat the page cache as part of the trusted computing base and apply the mitigations above while the ecosystem moves toward stronger isolation primitives.


Stay tuned for more vulnerability research from the Xint team.

Comments

Loading comments...