Stripe's autonomous coding agents, called minions, now handle over 1,300 pull requests weekly through a sophisticated blend of cloud dev environments, custom agent harnesses, and deterministic workflows.
Stripe's autonomous coding agents, called minions, have evolved from handling 1,000 to over 1,300 pull requests weekly—all without human-written code. In this deep dive, we explore how Stripe built these one-shot, end-to-end coding agents that operate within their existing developer infrastructure.
The Foundation: Devboxes as Agent Homes
The key to minions' success lies in Stripe's standardized developer environment called devboxes. These are AWS EC2 instances that contain source code and run services under development. Unlike traditional long-lived developer machines, devboxes follow the "cattle, not pets" philosophy—they're standardized, easy to replace, and engineers often run multiple simultaneously.
What makes devboxes perfect for autonomous agents is their isolation and predictability. Each devbox provides:
- Clean working directories that prevent agents from interfering with each other
- Full isolation from privileged or sensitive machines
- The same power as a developer's shell but with appropriate constraints
- Standardization that enables parallel execution
The "hot and ready" standard ensures devboxes are available within 10 seconds through proactive provisioning. This includes cloning repositories, warming caches, and starting code generation services—creating an environment that's immediately ready for coding tasks.
Custom Agent Harness: From Goose to Minions
While devboxes were built for human engineers, the agent harness was custom-built for minions. Stripe forked Block's goose—one of the first widely used coding agents—and adapted it to work within Stripe's LLM infrastructure. The focus shifted from human-supervised tools to fully unattended operation.
The absence of human supervision allowed for unique optimizations. Without interruptibility or confirmation prompts, minions can operate with full permissions within their quarantined devbox environment. This eliminates the need for safety confirmations since any mistakes are confined to a single devbox's blast radius.
Blueprints: The Orchestration Breakthrough
Stripe's most fundamental innovation is the "blueprint"—a hybrid orchestration primitive that combines workflow determinism with agent flexibility. Unlike traditional workflows (fixed graphs of steps) or simple agent loops, blueprints allow certain nodes to run deterministic code while others invoke agent loops.
In the minion blueprint:
- Agent nodes like "Implement task" or "Fix CI failures" have wide latitude for decision-making
- Deterministic nodes like "Run configured linters" or "Push changes" execute code without LLM involvement
- The overall structure resembles a state machine mixing both approaches
This design saves tokens and CI costs by handling predictable tasks deterministically while giving agents flexibility where needed. The compounding effect of "putting LLMs into contained boxes" creates system-wide reliability improvements.
Context Engineering at Scale
Large codebases present unique challenges for autonomous agents. Stripe addresses this through:
Rule Files: Rather than global rules that would overwhelm context windows, minions use scoped rule files that attach automatically as agents traverse directory structures. Stripe standardized on Cursor's rule format and modified their harness to read these alongside homegrown formats.
MCP Integration: The Model Context Protocol became the industry standard for networked tool calls. Stripe built Toolshed—a centralized internal MCP server with nearly 500 tools for internal systems and SaaS platforms. Different agents request only relevant subsets of tools, with minions receiving an intentionally small default set.
Security controls ensure minions can't perform destructive actions, but the devbox isolation provides the first line of defense by running in QA environments without access to real user data or production services.
Automated Iteration and Feedback
While designed for one-shot success, minions incorporate automated feedback loops. Stripe's three million tests provide validation, but the team "shifts feedback left" by running linters and other checks locally before pushing to CI.
Key optimizations include:
- Pre-push hooks that fix common lint issues in under a second
- Deterministic lint nodes within the blueprint that loop locally before pushing
- One iteration against full CI after the first push
- A second chance to fix failing tests locally before human review
This balanced approach avoids the diminishing returns of unlimited CI iterations while maintaining quality standards.
The Human Connection
Stripe's investment in human developer productivity created the foundation for successful AI agents. The same properties that make devboxes effective for engineers—parallelism, predictability, isolation—proved equally valuable for minions. This alignment between human and agent needs demonstrates how infrastructure built for one purpose can unlock unexpected capabilities.
Minions have already transformed software engineering at Stripe, and the team continues to improve them by blending industry standards with internal tooling. The result is a system where autonomous agents can reliably handle complex coding tasks at scale, changing the landscape of software development.
For engineers interested in working with or on minions, Stripe is actively hiring to continue this work at the intersection of AI and developer productivity.

Comments
Please log in or register to join the discussion