For decades, setuid binaries have represented both essential functionality and persistent security liabilities in Linux systems. These executables—programs like passwd, sudo, and su that temporarily elevate user privileges—have repeatedly served as attack vectors for privilege escalation exploits. Now, a paradigm shift is underway leveraging the kernel's no_new_privs feature to neutralize this threat entirely.

The Dual Problem Space

Two seemingly unrelated issues converge here:
1. UID/GID Drift: Image-based distributions (e.g., container hosts) require strict /usr directory ownership by root:root—complicated by scattered setuid binaries owned by other users.
2. Setuid Exploit Surface: Vulnerabilities in privileged binaries enable attackers to hijack elevated rights—a risk amplified by complex legacy tools like sudo.

Disabling setuid elegantly resolves both. By deprecating these binaries, distributions eliminate ownership exceptions and shrink the attack surface.

How no_new_privs Works

When enabled (via systemd using NoNewPrivs=yes), this kernel flag persists across processes and prevents execve() from granting new privileges:
- Ignores setuid/setgid bits
- Disregards file capabilities
- Blocks LSM profile relaxation post-exec

Critically, it doesn’t restrict legitimate privilege changes via syscalls like setuid(). This enables controlled delegation without ambient risk.

Architectural Overhaul: Replacing Setuid Binaries

Instead of monolithic privileged binaries, functionality shifts to tightly scoped IPC services. Systemd-managed, socket-activated services now handle tasks like password changes or privilege delegation:

# Example systemd unit hardening for pwaccessd (shadow data service)
[Service]
RestrictAddressFamilies=AF_UNIX
RestrictSUIDSGID=true
ProtectSystem=strict

Key replacements:
- Authentication: pam_unix_ng (replacing pam_unix) + pwaccessd service
- Account Management: account-utils tools (chage, passwd) communicating via varlink
- Command Execution: run0 (systemd's sudo/su alternative) with transient units

Bridging Compatibility Gaps

Legacy script dependency on sudo/su is addressed via wrapper scripts that translate commands to run0 calls. For example:

# run0-sudo wrapper pseudocode
if [[ $1 == "-i" ]]; then
    run0 --shell
else
    run0 "$@"
fi

Polkit rules simulate classic sudo policies, though limitations remain around command-line argument validation (see upstream issues).

Current State and Open Challenges

openSUSE Tumbleweed/MicroOS offer working implementations via the disable-setuid package. However, critical hurdles persist:

  1. Containers: no_new_privs breaks setuid binaries inside containers. A BPF LSM-based solution is proposed for dynamic NNP toggling.
  2. SELinux/Policy Gaps: Policies for new services (pwaccessd, newidmapd) require refinement.
  3. Edge Binaries: Exceptions like newgrp and gpasswd need alternatives.

The Binary Inventory

Thorsten Kukuk's analysis catalogs setuid/file-capability binaries in openSUSE:

Binary Package Status
sudo sudo ❌ (use run0)
polkit-agent-helper-1 polkit ✅ (patched)
unix_chkpwd pam ❌ (use pam_unix_ng)
ping iputils ❌ (capabilities)

Full inventory available here

Path Forward

While containerized setuid remains unresolved, the elimination of host-level setuid binaries marks significant progress. By leveraging kernel primitives like no_new_privs and rearchitecting privilege delegation through minimal services, Linux distributions can finally mitigate a half-century-old attack vector—without sacrificing functionality.

Source: Thorsten Kukuk, Enabling no_new_privs/NoNewPrivs, disabling setuid on Linux