The Fragile Foundation: Why seccomp Falls Short in System Security

A critical examination of seccomp's fundamental flaws in syscall filtering and the emerging alternatives that promise more robust security boundaries.

The Linux kernel's seccomp (secure computing) mode was introduced with the noble intention of allowing processes to restrict the system calls they can make, thereby creating security boundaries. However, a closer examination reveals that this mechanism, despite its widespread adoption in containers and security-sensitive applications, suffers from inherent flaws that make it unsafe for reliable security enforcement. This analysis explores the fundamental problems with seccomp and examines alternative approaches that offer more promising paths forward.

The Promise and Pitfalls of seccomp

Seccomp allows processes to apply filters to system calls, either through a blacklist approach (blocking specific dangerous syscalls) or a whitelist approach (only allowing specific syscalls). The concept appears straightforward: if a process shouldn't perform file operations, block file-related syscalls; if it shouldn't network, block network syscalls. Yet this apparent simplicity masks deep complexities that undermine seccomp's reliability as a security mechanism.

The Moving Target of System Calls

The most fundamental issue with seccomp is the instability of system call interfaces. What appears as a simple function call in user space often translates to different system calls depending on libc implementation, kernel version, architecture, and even runtime conditions. Consider the humble open() function: on some systems with certain libc versions, it might map directly to the open syscall, while on others it could become openat. Similarly, select() might manifest as pselect6 under certain conditions.

This dynamism creates a fragile security model. A carefully constructed seccomp filter that works perfectly on the developer's machine might fail completely when deployed to production, especially when container images are distributed across different environments. The Linux manpage's recommendation to use allow-list approaches because they are "more robust and simple" becomes almost comical when considering the 300+ syscalls in a modern system, with more being added regularly. Maintaining an accurate whitelist across different environments is practically impossible.

The Hidden Dependencies of Normal Operations

Beyond the syscall mapping problem lies a more subtle issue: the surprising syscall dependencies of seemingly simple operations. The author provides a compelling example: calling printf() triggers the newfstatat syscall, which might be completely unrelated to the apparent functionality. This creates a situation where security policies can break normal program flow in unpredictable ways.

Consider a security-sensitive application that drops privileges early in its execution and applies a restrictive seccomp filter. If that application later encounters an error and attempts to log it using printf(), the operation might fail if newfstatat was blocked by the filter. The result is a silent failure—no error logs, no indication of what went wrong, creating a debugging nightmare that compounds the original security problem.

These hidden dependencies mean that syscall filtering cannot be applied in isolation; it must consider the entire execution path and order of operations. This makes seccomp filters incredibly complex and fragile, requiring deep knowledge not just of intended functionality, but of all possible execution paths and their syscall footprints.

The Impossibility of Meaningful Grouping

Perhaps the most damning critique of seccomp is its inability to express meaningful security policies at an appropriate level of abstraction. Security administrators rarely think in terms of individual system calls; they think in terms of capabilities: "this process should only be able to communicate via existing file descriptors," or "this process should not be able to access the filesystem after initialization."

Implementing even these seemingly simple policies with seccomp requires enumerating dozens of individual syscalls. For basic network communication, one might need to allow pselect6, select, poll, ppoll, write, pwrite64, writev, pwritev, read, pread64, pread, preadv, close, sendfile, sendto, sendmsg, sendmmsg, recvfrom, recvmsg, and recvmmsg—and this list assumes a static environment. In reality, the exact syscalls needed can change dynamically based on libc implementation, kernel features, and even runtime decisions.

This granularity mismatch makes seccomp filters both overly permissive (missing edge cases) or overly restrictive (breaking legitimate functionality), with little middle ground where true security can be achieved.

Alternative Approaches: Capabilities Over Syscalls

The fundamental flaw in seccomp is its low-level approach to security. Rather than thinking about what processes should be able to do, it forces administrators to think about which specific syscalls should be allowed or blocked. This is akin to security by obscurity—relying on the complexity of syscall interfaces rather than establishing clear security boundaries.

OpenBSD's pledge() and unveil() mechanisms offer a more promising paradigm. These operate at a higher level of abstraction, defining capabilities rather than syscalls. For example, a process might pledge("stdio", "rpath"), indicating it only needs standard I/O and read access to specific paths. The system then ensures the process cannot exceed these capabilities, regardless of the specific syscalls involved.

This approach has several advantages:

Stability: Capabilities change far less frequently than syscall implementations
Simplicity: Policies are expressed in terms of functionality, not implementation details
Comprehensiveness: Capabilities naturally group related syscalls and operations
Clarity: Security boundaries are meaningful to both administrators and developers

The author mentions arping, which uses pledge("stdio", "") to restrict itself to only standard I/O operations. After this pledge, the process can still print output to the user and exit with appropriate status codes, but cannot perform file operations, network access, or system modifications—precisely the security boundary intended.

Linux's Evolving Security Landscape

For Linux users, the situation remains challenging. While OpenBSD has embraced capability-based security, Linux has pursued multiple approaches with varying degrees of success:

SELinux: Offers mandatory access control but with extreme complexity and a steep learning curve
AppArmor: Provides simpler mandatory access control but still operates at a relatively low level
seccomp-bpf: An extension to seccomp using BPF for more flexible filtering, but still suffers from the fundamental issues of syscall granularity
Landlock: The most promising recent development, aiming to provide sandboxing at a higher level of abstraction

Landlock represents a potential path forward for Linux, offering filesystem-based sandboxing that could eventually expand to other resources. However, as the author notes, Linux has a history of multiple generations of security mechanisms getting it wrong, and Landlock is still in early development.

For now, the author suggests unshare() as a practical approach for dropping access to the outside world, though acknowledging its limitations compared to more comprehensive solutions.

Practical Recommendations

Given the current state of Linux security mechanisms, organizations must make pragmatic choices:

Use seccomp with extreme caution: Recognize its limitations and avoid relying on it as the primary security boundary. Use it as one layer in a defense-in-depth strategy.
Prefer capability-based approaches: When possible, leverage mechanisms like SELinux or AppArmor that operate at higher levels of abstraction.
Minimize privileged surfaces: Design applications with minimal privileged operations, even before applying syscall filters.
Monitor filter effectiveness: Implement runtime monitoring to detect when seccomp filters interfere with legitimate functionality.
Track Landlock development: Keep an eye on Landlock's evolution as a potential future solution for Linux.

Conclusion

Seccomp, despite its widespread adoption, suffers from fundamental flaws that make it unreliable as a primary security mechanism. The instability of syscall interfaces, the hidden dependencies of normal operations, and the inability to express meaningful security policies at appropriate levels of abstraction combine to create a security model that is both complex and fragile.

The future of secure sandboxing likely lies in capability-based approaches like OpenBSD's pledge() and unveil(), or higher-level mechanisms like Landlock. Until these mature on Linux, administrators must use seccomp as just one component of a broader security strategy, recognizing its limitations while working to minimize the attack surface of their applications through careful design and minimal privilege principles.

The path forward requires shifting our focus from syscall filtering to capability boundaries—defining what processes should be able to do, rather than which specific implementation details they should be allowed to invoke. Only by elevating the level of abstraction in our security models can we build systems that are both secure and maintainable.

#Seccomp #Linux #Sandboxing #capabilities #Landlock