Linux Memory Permissions: How /proc/self/mem Bypasses Virtual Memory Protections
#Regulation

Linux Memory Permissions: How /proc/self/mem Bypasses Virtual Memory Protections

Tech Essays Reporter
6 min read

This article examines the fascinating 'punch through' semantics of Linux's /proc/*/mem interface, which allows writes to memory marked as unwritable. By exploring the implementation details, we discover how the kernel sidesteps hardware memory protection mechanisms through clever virtual memory manipulation, revealing the nuanced relationship between operating systems and hardware.

The relationship between an operating system and the hardware it runs on has always been a delicate balance of cooperation and control. At first glance, memory protection appears to be a fundamental boundary that even the kernel must respect. Yet Linux implements a fascinating quirk in its /proc/*/mem interface that challenges this assumption. This interface allows writes to memory marked as unwritable, a behavior that seems to violate one of the most fundamental principles of memory protection.

The Paradox of /proc/self/mem

The /proc filesystem provides a window into the kernel's internal workings, with each process having its own directory containing information about that process. Among these files is /proc/*/mem, which maps directly to the process's virtual memory space. What makes this interface particularly interesting is its "punch through" semantics—writes performed through this file succeed even when the destination memory is marked as unwritable.

This behavior isn't a bug but an intentional feature actively used by sophisticated projects like the Julia JIT compiler and the rr debugger. These tools leverage this capability to implement advanced features that would otherwise be impossible or significantly more complex. The Julia compiler, for example, can modify compiled code in memory, while rr can implement its debugging capabilities by patching executable code on the fly.

Demonstrating the Behavior

The article provides a compelling demonstration of this behavior through a simple C program that maps a read-only page and then attempts to write to it using /proc/self/mem. The code first allocates memory with PROT_READ permissions, creating a page that should be unwritable. It then uses a memwrite function that opens /proc/self/mem, seeks to the address of the read-only page, and attempts to write data to it.

Featured image

The program then attempts an even more audacious operation: modifying the actual code of a libc function (getchar) by writing a breakpoint instruction (0xcc) to its location. When the modified getchar is subsequently called, the program receives a SIGTRAP, confirming that the write was successful and that the executable code was actually modified.

This demonstration raises profound questions about the nature of memory protection in a system. If the kernel can bypass its own memory protection mechanisms, what does that mean for the security model of the operating system? And to what extent can hardware actually constrain the kernel's access to memory?

The Hardware Perspective: Memory Protection Mechanisms

To understand how this is possible, we must examine the hardware mechanisms designed to protect memory. On x86-64 processors, two key controls exist:

  1. Write Protect (CR0.WP): When set, this bit inhibits supervisor-level procedures (the kernel) from writing to read-only pages. When clear, it allows the kernel to write to read-only pages regardless of user/supervisor bit settings.

  2. Supervisor Mode Access Prevention (SMAP) (CR4.SMAP): This feature disables the kernel's ability to read or write userspace memory entirely, designed to hinder security exploits that populate userspace with malicious data to be read by the kernel.

Interestingly, CR0.WP is typically enabled at boot and remains set for the system's lifetime. When the kernel attempts to write to a read-only page with this bit set, a page fault is triggered. However, as the article points out, this is more of a tool to facilitate Copy-on-Write than a meaningful security constraint on the kernel itself.

The Implementation: Sidestepping the MMU

The true revelation comes from examining the implementation of /proc/*/mem in the Linux kernel. The write operation is ultimately handled by mem_rw(), which uses access_remote_vm() for the actual writes. This function performs a clever three-step process that completely bypasses the Memory Management Unit (MMU) restrictions:

  1. Virtual to Physical Translation: access_remote_vm() calls get_user_pages_remote() to translate the destination virtual address to its corresponding physical frame. This function walks the page tables manually in software, effectively replicating what the MMU does in hardware.

  2. Mapping into Kernel Space: Once the physical frame is identified, kmap() maps it into the kernel's virtual address space with writable permissions. On 64-bit x86 systems, this is straightforward since all physical memory is mapped via the linear mapping region of the kernel's address space.

  3. Performing the Write: Finally, copy_to_user_page() executes the actual write using a simple memcpy operation. Since the destination is now mapped in the kernel's address space with writable permissions, the write proceeds without issue.

The critical element that enables the "punch through" semantics is the FOLL_FORCE flag passed to get_user_pages_remote(). This flag causes the access validation logic within get_user_pages() to ignore write permissions and allow the lookup to proceed. As the article notes, this is the sole source of the punch-through behavior.

Implications for System Security

This implementation reveals something fundamental about memory protection in computer systems: memory permissions are associated with virtual addresses, not physical frames. The same physical memory can be mapped with different permissions in different address spaces, and the kernel, being in complete control of the virtual memory subsystem, can remap physical frames with whatever permissions it desires.

This has significant implications for system security. While hardware mechanisms like CR0.WP and SMAP can impose constraints on the kernel, these constraints are ultimately superficial. The kernel can always sidestep them by manipulating the virtual memory subsystem directly. This doesn't represent a vulnerability per se—rather, it reflects the fundamental design principle that the kernel must have ultimate control over the system's memory.

Broader Perspective: Kernel-Hardware Relationship

The behavior of /proc/*/mem offers a window into the nuanced relationship between operating systems and hardware. While hardware can impose constraints, these constraints are always at the discretion of the kernel. The kernel can choose to honor these constraints or bypass them as needed.

This design philosophy makes sense when we consider that the kernel's primary responsibility is system functionality and stability, not security enforcement. Security mechanisms are tools that the kernel can use when appropriate, but they are not absolute barriers that the kernel cannot overcome.

The article wisely notes that this discussion focuses on basic system operation without considering more complex scenarios like virtualization or Intel SGX. In these contexts, the relationship between kernel and hardware becomes even more intricate, with additional layers of protection and constraints.

Conclusion

The /proc//mem interface serves as a perfect illustration of the delicate balance between hardware constraints and software flexibility. While memory protection mechanisms exist at the hardware level, they are ultimately tools that the kernel can choose to honor or bypass as needed. The implementation of /proc//mem demonstrates that the kernel's control over virtual memory is absolute—it can translate virtual addresses to physical frames, remap those frames with arbitrary permissions, and operate on them as it sees fit.

This behavior isn't a flaw in the system design but rather a reflection of the fundamental principle that the kernel must have ultimate control over system resources. The punch-through semantics of /proc/*/mem enable powerful features in sophisticated software while maintaining the kernel's ability to manage memory as needed.

As we continue to develop more complex systems with nested virtualization and specialized hardware security features, understanding these fundamental relationships between software and hardware will become increasingly important. The /proc/*/mem interface serves as an excellent starting point for this exploration, revealing the elegant dance between operating systems and the hardware they run on.

For those interested in exploring this topic further, the article provides several excellent references including discussions on the Linux Kernel Mailing List and analyses of related vulnerabilities like DirtyCow. These resources offer additional perspectives on memory protection mechanisms and their implications for system security.

Linux Kernel Documentation provides comprehensive information about memory management and the proc filesystem.

The Intel® 64 and IA-32 Architectures Software Developer's Manuals offer detailed information about the hardware mechanisms discussed in this article.

Comments

Loading comments...