Nucleus and the Quiet Bet That Containers Got Too Heavy for AI Agents

A new Rust runtime called Nucleus throws out Docker images, registries, and persistent state in exchange for kernel-level isolation and a 12ms cold start. It's an opinionated answer to a question more teams are starting to ask out loud: do we actually need all of Docker to run an ephemeral agent?

A pattern has been forming at the edges of the container world for a while now, and a project named Nucleus makes it explicit. The pitch is blunt: it is not a Docker replacement, it does not pull images, it does not have a registry, and it does not keep persistent state by default. What it does instead is wrap Linux kernel primitives, namespaces, cgroups, seccomp, Landlock, and pivot_root, around two workloads that have been straining the traditional model: short-lived AI agent sandboxes and long-running, declaratively built NixOS services.

The headline number people will fixate on is cold start. Nucleus claims 12ms versus roughly 500ms for Docker. That gap is real and it is also somewhat unfair, because the two tools are not doing the same work. Docker's startup time includes resolving and mounting layered image filesystems, wiring up its daemon and networking model, and managing a writable union mount. Nucleus skips all of that by backing the container filesystem with tmpfs, pre-populated with context files, or mounting a pre-built Nix store closure read-only. You are not comparing two runtimes so much as two philosophies about what a container even is.

The agent angle is the tell

The most interesting thing about Nucleus is who it says it is for. The default mode is "agent mode," described as ephemeral, fast-startup sandboxes for AI agent workloads. This is a category that barely existed two years ago and is now driving real infrastructure decisions. When you are spinning up a sandbox to let a coding agent run untrusted tool calls, build a test binary, and throw the whole environment away seconds later, the economics invert. Image layer caching and registry distribution, the features Docker spent a decade perfecting, become dead weight. What you want is the fastest possible path to an isolated process and the strongest possible guarantee it cannot touch anything it shouldn't.

Nucleus leans hard into that second part. By default it drops all capabilities, applies a small seccomp allowlist, and can use up to eight namespaces. It integrates gVisor as a first-class runtime rather than an optional add-on, giving you an application kernel boundary for genuinely untrusted code. There is even a workflow for running a coding agent against a stable /workspace with --workspace-exec, mounting provider CLI configs like ~/.aws read-only into a private home directory, and passing a pinned toolchain rootfs so the agent never depends on mutable host binaries.

That last detail is worth sitting with. The project assumes you do not trust the thing you are running, which is a reasonable assumption for an autonomous agent executing model-generated commands.

A sharp line between the agent toy and the production tool

Here is where Nucleus does something more honest than most projects in this space. It openly admits agent mode is not hardened. The documentation states plainly that in default agent mode, seccomp and Landlock failures are warn-and-continue, chroot fallback is available, bridge DNS defaults to public resolvers like 8.8.8.8, and cgroup creation failures are non-fatal. In other words, the fast default is best-effort.

For anything serious, the project pushes you toward two stricter tiers. Strict agent mode makes isolation fail closed: no degraded security, no chroot fallback, mandatory seccomp enforcement, mandatory cgroup limits, and required user namespace mapping, all without forcing you to build a full production rootfs. Production mode goes further still, demanding a Nix-built reproducible root filesystem, rootfs attestation, explicit memory limits, deny-by-default egress, health checks, and a mini-init that reaps zombies and forwards signals. The service-mode comparison table reads like a checklist of every container security mistake teams have made over the years, with each row showing how the strict tiers close the hole.

This tiering is a more useful framing than the usual "secure by default" marketing. Nucleus is saying: fast and loose for throwaway agents, locked down and auditable for production, and it draws the boundary in the CLI itself rather than in a best-practices doc nobody reads.

The Nix bet, and where it narrows the audience

Production Nucleus is deeply, almost uncompromisingly, tied to Nix and NixOS. The model is that Nix builds the root filesystem, a NixOS module declares the service, and Nucleus mounts a pinned, reproducible closure at runtime. There is a nucleus.lib.mkRootfs helper for minimal service closures, rootfs attestation via a .nucleus-rootfs-sha256 manifest, and a full NixOS module that generates nucleus-<name>.service systemd units with journald logging and sd_notify readiness.

For teams already living in NixOS, this is a coherent and genuinely appealing story. Your service topology, security policy, and root filesystem are all declarative and pinned, and runtime inputs become auditable and repeatable. The separation Nucleus draws between Nix (what the service is) and external policy files (what the process may do at the kernel level, expressed as seccomp JSON, capability TOML, and Landlock TOML, each SHA-256 pinned) is a clean idea. A security engineer can review and tighten a syscall allowlist without triggering an application rebuild.

The counter-perspective writes itself: this is a narrow door. Nix has a famously steep learning curve, and a runtime whose production path assumes flakes, store closures, and NixOS modules is not going to displace Docker in shops that have never touched any of it. Nucleus does not pretend otherwise. It explicitly does not support macOS, Windows, BSDs, or 32-bit Linux, and requires a 6.x kernel. This is infrastructure for people who have already made several upstream commitments, not a general-purpose tool.

Formal verification as a credibility signal

One claim stands out from the usual feature list. Nucleus says its state machines are formally verified using TLA+ and the Apalache model checker, with model-based property tests generated from those specifications and a composed system-level model checking cross-subsystem ordering and authorization. There is a formal/tla/ directory and a tla-connect mapping between TLA+ states and Rust.

Formal methods showing up in a container runtime is uncommon, and skepticism is healthy here. Model checking proves properties about the model, not necessarily about the thousands of lines of unsafe-adjacent Rust that actually call into the kernel. A verified state machine for container lifecycle does not prove the seccomp filter is correct or that a mount flag was applied. Still, the presence of this work signals a team that treats isolation correctness as something to reason about rather than assert, which is more than most projects in this category offer.

The PostgreSQL numbers and the honesty of noise

The benchmark section includes a detail that builds rather than erodes trust. Running PostgreSQL 18 under pgbench, Nucleus posts numbers at or slightly above bare metal: 105,965 TPS for the Nucleus worker against 100,222 for bare-metal worker on read-only, and 1,757 versus 1,490 TPS on the mixed TPC-B workload. Rather than crowing about beating bare metal, the documentation tells you to treat occasional wins as benchmark noise rather than a guaranteed speedup, and notes the test uses host bind-mounted pgdata and host networking specifically to isolate the steady-state cost of the isolation layer.

The takeaway is the one that matters: namespace and cgroup isolation, applied carefully, costs effectively nothing at steady state. That has been broadly true for a while, but seeing it measured with this much methodological caution is refreshing in a space full of cherry-picked graphs.

Where this fits

Nucleus is not going to show up in a tutorial telling beginners to docker run their first web app. It deliberately drops the image-and-distribution half of Docker, the half most people think of as the entire point. What it offers instead is a single binary that does hardened, auditable, single-host isolation with optional Compose-equivalent orchestration via a TOML dependency DAG, plus deep Nix reproducibility.

The broader signal is what makes it worth watching. The rise of autonomous agents is creating demand for a runtime shape that the container ecosystem never optimized for: disposable, untrusted, millisecond-fast, and aggressively sandboxed. Firecracker microVMs, gVisor, and now Nucleus are all circling the same need from different angles. Whether the answer ends up being a stripped-down runtime like this one, a microVM, or something Docker itself grows into, the assumption that every isolated workload needs a layered image and a daemon is the consensus quietly coming up for review. Nucleus is one team's bet on which parts you can throw away, and it has the unusual decency to tell you exactly where its fast path stops being safe. You can read the full design and the service-mode breakdown in the project README.

#containers #sandbox #AI_Agents #Nix #Security