From Monolith to Swarm: Building Effective Multi-Agent Systems at Shopify
#AI

From Monolith to Swarm: Building Effective Multi-Agent Systems at Shopify

Rust Reporter
6 min read

Shopify's Staff Engineer Paulo Arruda shares insights from developing a sophisticated multi-agent orchestration system that transformed how the company approaches AI tasks, reducing processing times from hours to minutes.

From Monolith to Swarm: Building Effective Multi-Agent Systems at Shopify

Featured image

In the rapidly evolving landscape of artificial intelligence, organizations are constantly searching for effective ways to leverage AI capabilities without succumbing to complexity. Paulo Arruda, Staff Engineer at Shopify, recently shared his journey building multi-agent systems from scratch at QCon AI 2025, revealing how the e-commerce giant transformed its AI approach from simple chat tools to a sophisticated swarm of specialized agents.

The Genesis: From Chat Tools to Specialized Agents

Shopify's AI journey began in 2023 with the adoption of GPT-3.5, quickly expanding to include contracts with all major AI providers and internal tools like LibreChat, VS Code with Copilot, and Cursor. By 2024, however, a significant portion of engineers remained skeptical or underutilized these tools due to bad experiences or simple lack of time.

"We had a significant portion of the engineers that were not using AI day-to-day," Arruda explained. "Folks are busy, they can't try, or they tried it once with GPT-3.5, and they had a bad experience, so skepticism and all those things came into play."

The turning point came with Tobi Lütke's company-wide email in April 2025, which energized AI experimentation across Shopify's approximately 6,000 employees. This cultural shift created fertile ground for innovation in AI orchestration.

The Problem: AI Slop and Testing Challenges

Arruda's initial focus was addressing a growing concern: AI-generated code in pull requests. As developers began using AI more extensively, the risk of "AI slop"—subpar code generated by AI systems—slipping through review processes increased.

"You have PRs with 1,500 lines, and then at the beginning, you're going to have the folks who will make a lot of effort to review it," Arruda noted. "Over time, if AI works well enough times, things start slipping through the cracks."

His solution approach evolved through several iterations:

  1. Initial Experiment: Building dependency graphs between files using GPT summaries, which proved too costly for Shopify's massive Rails monolith with hundreds of PRs daily.
  2. Claude Code Adoption: The arrival of Claude Code's research preview in February 2025 changed everything, demonstrating that agentic search outperformed traditional indexing approaches.
  3. The Breakthrough: During a hackday in May 2025, Arruda discovered that multiple Claude Code instances working together could solve problems that single instances couldn't.

The Emergence of Swarm Architecture

This discovery led to the creation of what would become Swarm (later SwarmSDK), a multi-agent orchestration framework that transformed how Shopify approached complex AI tasks.

"I noticed this pattern, that what they were moving from is this idea that they had one LLM on LibreChat with massive prompts," Arruda explained. "Then, you have too many unrelated tokens, too many instructions, the LLM gets lost, and the result was very poor."

The key insight was treating agents as lean, narrow-focused tools rather than generalists:

  1. Tree Structure: Agents organized in hierarchical, tree-like formations where each agent specializes in a specific domain.
  2. YAML Configuration: Simple configuration files allowing non-developers to set up specialized agents.
  3. Multi-Provider Support: Integration with multiple AI providers (Claude, Gemini, o3-pro) for improved results.

Measurable Success: From Hours to Minutes

The implementation of Swarm architecture yielded dramatic improvements across various Shopify teams:

  1. Theme Reviews: Reduced from 22 hours to 7-20 minutes by breaking down compliance criteria into separate agents.
  2. Candidate Role Assessment: Streamlined internal hiring processes to under an hour.
  3. Q2 Research: Enabled 15 specialized agents to efficiently research internal documentation across business functions.
  4. Vendor Evaluation: Created systems to validate vendor claims against extensive documentation.

"Once we broke down each one of those review criteria into separate agents using Claude Swarm, then we were able to reduce that time between 7 and 20 minutes," Arruda shared. "That was a big time-saving."

Challenges and Lessons Learned

Despite these successes, Arruda encountered several challenges that shaped the evolution of Swarm:

  1. Accessibility: The initial implementation was too developer-centric, requiring YAML configuration and command-line knowledge.
  2. Duplication: Multiple teams building similar systems in isolation, leading to fragmented AI adoption.
  3. Workflow Needs: Users required deterministic workflows alongside agentic capabilities.
  4. Multi-Provider Complexity: Integrating multiple AI providers proved technically challenging.

From these challenges emerged key principles:

  1. Start with your own pain: Build solutions to problems you personally experience.
  2. Agents as experts: Treat agents as lean, narrow-focused tools rather than generalists.
  3. Empower everyone: Build tools that enable AI enthusiasts across the organization, rather than creating centralized "AI SWAT teams."

The Future: Context Engineering and llm-fuse

Looking ahead, Arruda predicts 2026 will be the year of making agents useful at scale through context engineering. The critical challenge he identifies is addressing "context bloat" in systems like Model Context Protocol (MCP).

"The way MCP is used today is that it just adds a bunch of tools," Arruda explained. "You add a bunch of MCP servers to your client, and then it loads a bunch of tools, and those tools come with their own descriptions and their parameters. The description of those parameters, and all of those a lot of times are not very relevant to the task you're doing."

His proposed solution is "llm-fuse," an adapter layer that exposes data sources through a filesystem abstraction:

  1. Tool Abstraction: Standardizing tools like Read, Grep, Glob, Search, Write, Edit, and Delete.
  2. Data Translation: Creating adapters that translate these tool calls to various data sources.
  3. Memory Management: Implementing a "Defrag" tool to optimize and organize agent memories over time.

"If you control those tools, Read, Grep, Glob, Search, and search there is like a vector search plus a keyword search with some nice ranking," Arruda elaborated. "If you control those tools, Write, Edit, Delete, and I'll talk about Defrag soon, you can create an adapter layer between those tools and the storage where the information is."

This approach, still in prototype stage, aims to maximize precision and recall by ensuring every token in an agent's context window is relevant to the task at hand.

Conclusion: The Path Forward

Arruda's journey at Shopify demonstrates that effective multi-agent systems require more than just technical implementation—they demand cultural shifts, user-centered design, and continuous iteration. By moving from monolithic AI approaches to specialized agent swarms, Shopify has achieved significant efficiency gains while maintaining human oversight.

As organizations continue to explore AI orchestration, the lessons from Shopify's experience provide valuable guidance: start small, focus on specific problems, empower domain experts, and design systems that evolve with user needs. The future of AI in enterprise lies not in replacing human intelligence, but in creating systems that augment and extend our capabilities through thoughtful, specialized collaboration.

For organizations looking to implement similar solutions, Arruda's SwarmSDK (GitHub repository) offers a starting point, though the principles he outlined can be applied to various multi-agent architectures.

The presentation underscores a critical insight: as we move into 2026, the most successful AI implementations will be those that solve specific problems exceptionally well, rather than attempting to create general-purpose systems that do everything adequately.

Comments

Loading comments...