The foundational assumption that GPU memory is securely isolated between processes has been shattered by the discovery of LeftoverLocals (CVE-2023-4969), a critical vulnerability exposing sensitive data across virtual boundaries. Researchers from Trail of Bits demonstrated how residual data lingering in GPU local memory—a high-speed scratchpad—can be extracted by malicious actors co-located on the same hardware, potentially stealing AI model weights, inference outputs, or proprietary training data.

How the Attack Breaches GPU Sandboxes

LeftoverLocals exploits a hardware oversight: GPUs often fail to fully clear local memory regions between kernel executions. Attackers can exploit this by:
1. Triggering a victim process (e.g., an ML inference job) to run on the GPU.
2. Launching a malicious kernel immediately afterward.
3. Reading uninitialized local memory, capturing fragments of the victim's data.

# Simplified proof-of-concept pseudocode
malicious_kernel():
    local_data = allocate_local_memory()
    # Read residual data from previous kernel execution
    stolen_data = read_local_memory(local_data)
    exfiltrate(stolen_data)

Widespread Impact Across the AI Stack

Testing confirmed vulnerabilities in GPUs from:
- Apple: M2 series (including M2 Ultra)
- AMD: Radeon RX 7900 XT, Radeon Pro W6800
- Qualcomm: Adreno 740

Crucially, frameworks like PyTorch and TensorFlow leverage these affected hardware layers. As Trail of Bits notes:

"This vulnerability breaks the primary isolation guarantee for GPU workloads... An attacker can register a GPU process and read the leftover data from the previously executed kernel."

Cloud providers (AWS, Azure, GCP) and edge devices are particularly vulnerable due to multi-tenancy. Shared research clusters and consumer AI applications are also at risk.

The Supply Chain Blind Spot

LeftoverLocals highlights a systemic failure: hardware security assumptions permeating unchecked into critical software infrastructure. Developers using PyTorch or TensorFlow implicitly trusted underlying GPU isolation—a trust now proven misplaced. Mitigations require vendor-specific patches and firmware updates, leaving systems exposed until fully patched.

Beyond Patching: A Call for Hardware-Aware Security

While vendors work on fixes, this flaw forces a paradigm shift:
- MLOps teams must audit isolation controls in GPU-dependent pipelines.
- Cloud architects should reassign vulnerable instances or enforce strict single-tenant usage.
- Framework developers need deeper hardware collaboration to validate memory sanitization.

The discovery underscores that AI's rapid adoption has outpaced scrutiny of its foundational layers. As GPUs become the new CPUs for computational workloads, their security model must evolve beyond performance-centric design—or risk leaving the door open to silent, large-scale data exfiltration.

Source: Trail of Bits Research & Hacker News Discussion (https://news.ycombinator.com/item?id=45124138)