The Linux System Call Advantage: Why Stability Matters in Kernel Interfaces

For a long time, many developers felt there was something unique about Linux. Something different that set it apart from other operating systems, though they couldn't quite pinpoint what it was. Only when they learned about Linux's system call interface did the pieces finally fall into place.

The Problem: System Calls in Other Operating Systems

Programs typically interface with the operating system kernel through libraries provided by the OS, most commonly libc. Through these libraries, applications gain access to system functions like read, write, and countless others. POSIX-compliant systems provide standardized interfaces, while Windows offers the Win32 API with its many DLLs and functions.

These operating systems generally consist of tightly coupled kernel and user space components, developed and distributed as a single unit. The user space libraries are the officially supported means of interacting with the system. Direct kernel interface is discouraged, and applications are expected to use the provided system libraries, forcing them to depend on and link against these libraries.

While it's technically possible to interface with the kernel directly in many systems, the kernel interface is typically unstable and subject to change without notice. Software that bypasses the standard libraries and talks to the kernel directly risks breaking with system updates or kernel revisions.

"Then there's the whole 'reinventing libc' insanity, even on macOS (where no such ABI stability was guaranteed, but they did it anyway, and that ended up with a macOS update breaking all Go apps). On Windows they can't get away with that, so they use Cgo instead."

— marcan_42, Hacker News, Sept 4, 2021

This instability creates significant challenges for cross-platform development. The Go programming language, for example, has encountered these issues firsthand on macOS.

"As I understand it, Go currently has its own syscall wrappers for Darwin. This is explicitly against what Apple recommends, precisely because they're not willing to commit to a particular syscall ABI. This leads to issues like #16570, and although we've been lucky in that things have generally been backward-compatible so far, there's no guarantee that it'll continue to happen. It doesn't seem inconceivable to me that we'd at some point end up having to specify 'build for macOS 10.13+' vs. 'build for 10.12 and below', for example."

— copumpkin, golang/go GitHub issue #17490, Oct 17, 2016

Some operating systems take this restriction even further. OpenBSD has implemented system call origin verification, a security mechanism that only allows system calls originating from the system's libc. In this environment, not only is the kernel ABI unstable, but normal programs are explicitly prohibited from interfacing with the kernel directly.

"The eventual goal would be to disallow system calls from anywhere but the region mapped for libc, but the Go language port for OpenBSD currently makes system calls directly. Switching Go to use the libc wrappers (as is already done on Solaris and macOS) is something that De Raadt would like to see. It may allow dropping the main program text segment from the list of valid regions in the future."

"It is an ABI break for the operating system, but that is no big deal for OpenBSD. De Raadt said: we here at OpenBSD are the kings of ABI-instability. He suggested that relying on the OpenBSD ABI is fraught with peril: 'Program to the API rather than the ABI. When we see benefits, we change the ABI more often than the API. I have altered the ABI. Pray I do not alter it further.'"

— Jake Edge, OpenBSD system-call-origin verification, LWN, December 11, 2019

Linux's Unique Approach: Stability at the Kernel Level

One of the most distinctive features of the Linux kernel is its stable kernel-userspace interface. Unlike virtually every other kernel and operating system, Linux guarantees stability at the binary interface level.

"This interface matches much of the POSIX interface and is based on it and other Unix based interfaces. It will only be added to over time, and not have things removed from it."

— torvalds/linux, Documentation/ABI/stable/syscalls, 2006-06-21

This stability is rooted in Linux's identity as a kernel rather than a complete operating system. As an independent component, it must provide a stable interface to user space software if anything is to be built upon it.

While many people debate whether Linux constitutes a complete operating system, there's no question that Linux functions as a platform that can be safely built upon directly. There's no inherent need to depend on additional components—not even libc.

How Linux System Calls Work

Processor instruction set architectures include special instructions designed for calling the kernel. These instructions cause the processor to switch to kernel mode and execute code at a predefined location within the kernel.

When making a system call, at least one parameter must be provided: the system call number (often referred to as NR). Linux uses this number as an index into a table of function pointers to identify the specific function being called. Any additional arguments are passed directly to this function.

These parameters are passed to the kernel via registers, and the kernel returns a result value in a register as well. The specific registers used for parameters and the return value define the Linux system call calling convention.

This calling convention is stable, allowing user space programs to use it without fear of breakage. It's defined at the instruction set level, making it programming language agnostic. Any user space program written in any language can make use of this interface. Typically, programs call libc functions that implement this calling convention, but that's not a requirement. A compiler could directly emit code following this convention, or a JIT compiler could generate the appropriate code at runtime.

The journalists at LWN have written detailed articles about the implementation of Linux system calls that are definitely worth reading:

The calling convention is documented in the Linux man pages:
- syscall.2
- syscalls.2

Implementing a System Call Function

To make a system call, parameters must be placed in the appropriate registers, the system call instruction must be executed, and the return value must be collected from the return register. System calls support a maximum of six arguments.

Since the registers and system call instruction vary by architecture, separate functions are needed for each architecture. Despite this, it's straightforward to write a C function that can make any system call.

Here's an example of a system call function for the x86_64 architecture:

long
linux_system_call_x86_64(long number,
                         long _1, long _2, long _3,
                         long _4, long _5, long _6)
{
    register long rax __asm__("rax") = number;
    register long rdi __asm__("rdi") = _1;
    register long rsi __asm__("rsi") = _2;
    register long rdx __asm__("rdx") = _3;
    register long r10 __asm__("r10") = _4;
    register long r8  __asm__("r8")  = _5;
    register long r9  __asm__("r9")  = _6;

    __asm__ volatile
    ("syscall"
        : "+r" (rax),
          "+r" (r8), "+r" (r9), "+r" (r10)
        : "r" (rdi), "r" (rsi), "r" (rdx)
        : "rcx", "r11", "cc", "memory");

    return rax;
}

All parameters and the return value are of type long, which in this context essentially means "register." All values passed to the kernel must fit in registers, and typically long is register-sized. This means all arguments must either be simple values or pointers to more complex structures.

The function ensures all arguments are placed in the appropriate registers by assigning them to local variables annotated with inline assembly directives that tell the compiler which register to use. The register keyword doesn't actually do anything—it's included merely to make the code's intent clear.

The x86_64 architecture contains the aptly named syscall instruction, which switches to kernel mode and enters the kernel entry point. Other architectures use different instructions; for example, aarch64 uses svc #0 instead.

The compiler is informed via the extended inline assembly construct that this instruction has 7 inputs, 1 output, and that it clobbers certain registers, the carry bit, and memory. The 7 inputs consist of the system call number and the six parameter registers. The output is the return value, which is placed in rax, overwriting the system call number.

After the system call has been made, all that remains is to return the result. This may be a valid value or a negated errno constant. Various libcs normalize these error values and place them in a global or thread-local errno variable. When using Linux system calls directly, this normalization step isn't necessary.

The Impact of Stability

Linux's commitment to a stable kernel-userspace interface has profound implications for the entire ecosystem. This stability enables:

  1. Long-term software support: Applications and libraries can be maintained for decades without requiring constant updates to accommodate kernel changes.

  2. Alternative implementations: Projects like musl libc, Bionic (used in Android), and various embedded system libraries can exist alongside glibc without fear of incompatibility.

  3. Language innovation: Programming languages like Go, Rust, and others can implement their own system call interfaces without relying on libc, enabling more efficient or specialized implementations.

  4. Security research: Researchers can develop security tools that interact directly with the kernel without worrying about API changes breaking their work.

  5. Containerization and virtualization: Technologies like Docker, Kubernetes, and various virtual machine monitors benefit from a stable interface to the underlying kernel.

This stability doesn't mean Linux is stagnant. New system calls are added regularly, but existing ones remain compatible. This forward-compatibility, combined with the ability to bypass libc entirely, has made Linux the preferred platform for everything from embedded systems to the world's largest supercomputers.

Conclusion

Linux's approach to system calls represents a fundamental design choice that has shaped the entire Linux ecosystem. By providing a stable binary interface at the kernel level, Linux has created a platform where innovation can flourish without the constant fear of breakage. This stability has enabled the development of a vast ecosystem of software, from small embedded applications to massive cloud infrastructure.

While other operating systems prioritize tight integration between kernel and user space, Linux's separation of concerns has proven to be a powerful advantage. Developers can choose their preferred abstraction layer—whether it's a full-featured libc, a minimal implementation, or direct kernel access—without sacrificing compatibility or stability.

As the computing landscape continues to evolve, with new architectures, security challenges, and application domains emerging, Linux's stable system call interface will remain a cornerstone of its success. It's a testament to the power of thoughtful design and long-term thinking in software development.

This article is based on content from Matheus Moreira's blog.