Theme Systems at Scale: How To Build Highly Customizable Software
#Dev

Theme Systems at Scale: How To Build Highly Customizable Software

Serverless Reporter
7 min read

Guilherme Carreiro of Shopify details the architectural decisions behind their theme system, explaining how they balance extreme customizability with platform stability. The presentation covers the use of Liquid as a safe DSL, performance optimization via native extensions, and the critical role of JSON schemas in bridging the gap between developers and non-technical users.

Featured image

Shopify's theme system is a masterclass in building platforms that serve both technical and non-technical users at massive scale. During a recent presentation at QCon London, Guilherme Carreiro, a Staff Developer at Shopify, unpacked the architecture that allows millions of merchants to create unique storefronts while maintaining platform stability and performance, even during peak events like Black Friday Cyber Monday (BFCM), when the system handles nearly 60 million requests per minute.

The core challenge is one of balance: enabling developers to write any kind of template for a storefront while ensuring that non-technical merchants can easily customize their stores. This requires a system with clear boundaries, performance guarantees, and intuitive tools for collaboration.

The Foundational Layer: Liquid as a Safe DSL

The first building block is Liquid, Shopify's own domain-specific language (DSL) for templating. Liquid was created two decades ago to solve a specific problem: allowing third-party developers to write templates for Shopify stores without giving them unfettered access to the backend.

As Carreiro explains, the initial approach using Ruby's ERB (Embedded Ruby) templates posed significant risks. ERB allows any Ruby code to execute within a template, which can lead to performance issues like the N+1 query problem and, more critically, poses a security and stability risk to the platform. "When you build a platform where folks upload the templates to create these stores, you're exposing your platform to have this very slow experience," Carreiro notes.

Liquid was designed with two key principles: safety and simplicity.

  1. Safety through Boundaries: Liquid is an "allow list" of operations. It can perform conditions, iterations, and data transformations via filters (e.g., {{ product.price | money }}), but it cannot execute database queries, modify state, or consume system resources. This creates a secure sandbox.
  2. Simplicity: Liquid is designed to be as simple as HTML, making it accessible to designers and less experienced frontend developers.

To efficiently load data into templates without performance degradation, Shopify uses the Drop pattern. A Drop is a Ruby object that acts as a low-level cache and a controlled interface. Instead of exposing an entire product object, Shopify exposes a ProductDrop with only the necessary methods. This prevents developers from accidentally triggering expensive operations and provides a memoization layer for performance.

Liquid also enforces strict resource limits to prevent abuse. For example, the render_length_limit restricts the size of rendered output, and the assign_score_limit caps the number of variables a template can create. These limits ensure that no single theme can monopolize server resources.

Finally, Shopify maintains strict backward compatibility. Carreiro emphasizes the importance of being strict about what is valid Liquid syntax. He contrasts two parsing modes: lax and strict. While a lax parser might be more forgiving of minor syntax errors, it effectively signs a contract with developers that certain invalid code is acceptable, making it impossible to evolve the language later. Shopify's strict parsing ensures that only valid code is rendered, preserving the ability to improve and extend Liquid over time.

Bridging the Gap: Schemas and State Management

While Liquid provides the engine for rendering, it doesn't solve the problem of customization for non-technical users. This is where schemas come in. Schemas are JSON structures defined within Liquid files that describe the configurable properties of a section or block.

For example, a product section might have a schema property called image_position with allowed values of left or right. This schema is the shared language between the theme developer and Shopify's editor. The developer writes the Liquid logic to use this property, and the merchant uses the editor's UI to select a value, which is then persisted as JSON data.

Theme Systems at Scale: How To Build Highly Customizable Software - InfoQ

This schema-driven approach is central to the entire request lifecycle. When a buyer visits a storefront, the system loads the theme's file system and composes the page state from multiple JSON sources:

  1. Global Settings: A settings_schema.json file defines global theme settings (e.g., color palette, typography).
  2. Page-Specific State: For a product page, a product.json file defines the sequence of sections to render and their individual settings.
  3. Section and Block State: Each section and block can have its own settings, defined in their respective schema files.

This architecture elegantly separates concerns. The Liquid templates define the structure and logic, while the JSON files manage the state and configuration. It allows merchants to collaborate with developers by manipulating the state through a user-friendly editor, without touching the underlying code.

Performance at Scale: Runtimes and Native Extensions

Rendering millions of unique, complex storefronts requires a highly optimized runtime. Shopify runs on Kubernetes on Google Cloud Platform (GCP) with a multi-tenant architecture. The rendering component is isolated from the core application that handles state persistence (like saving theme edits) to ensure that high traffic to storefronts doesn't impact the primary databases.

A key performance optimization technique highlighted by Carreiro is the use of native extensions. Shopify's backend is primarily a Ruby application, but for performance-critical operations like parsing and rendering Liquid templates, they offload this work to code written in Rust (or C/C++).

Native extensions offer several advantages:

  • Garbage Collection Control: By managing memory manually in native code, they can avoid the pauses associated with Ruby's garbage collector, which is crucial for predictable latency.
  • Raw Speed: Native code often executes faster than high-level language code.
  • Reusability: They can leverage existing, highly-optimized libraries (e.g., a JSON parser) instead of rewriting them in Ruby.

However, Carreiro warns that native extensions come with complexity and trade-offs. The Foreign Function Interface (FFI) calls between Ruby and the native code have overhead. Marshalling data back and forth (e.g., converting a Ruby integer to a native integer) is not free. He illustrates this with a performance comparison: partially extracting a function to a native extension can sometimes be slower than keeping it in pure Ruby if the function requires many small FFI calls. The sweet spot is to batch operations and minimize the number of cross-boundary calls.

The Developer Experience: Tooling is Culture

A robust ecosystem requires robust tools. Without them, developers will struggle to write efficient, correct code. Shopify invests heavily in tooling to guide developers toward best practices.

  • Linters: The Shopify Theme Check linter identifies common issues like inefficient Liquid patterns (e.g., parsing block scripts) or referencing unavailable objects, providing immediate feedback in the editor.
  • Profiling: The shopify theme profile CLI command generates a report showing how each part of a template renders, helping developers identify performance bottlenecks.
  • Language Server: The Liquid Language Server Protocol (LSP) implementation provides context-aware autocompletion, hover information, and diagnostics. This required building a custom, error-tolerant parser using OhmJS that understands both Liquid and HTML, allowing it to provide useful feedback even on incomplete or broken code.

Carreiro notes that while the backend rendering parser is optimized for speed and doesn't need to understand HTML structure, the developer tooling parser must be more sophisticated to offer a rich editing experience. This highlights a critical lesson: when you create a DSL, you must also create the tools to support it.

Applying These Principles to Your Domain

The Shopify theme system offers a blueprint for any platform seeking to offer deep customizability. The key architectural components are:

  1. A Safe DSL (Liquid): Create a restricted language that allows expression without compromising platform stability or performance.
  2. Schemas as a Bridge (JSON): Use declarative schemas to define the configurable aspects of your components, creating a shared contract between developers and end-users.
  3. Optimized Runtimes (Native Extensions): Identify performance-critical paths and consider offloading them to lower-level languages, but be mindful of the overhead of crossing language boundaries.
  4. Comprehensive Tooling (Linters, LSPs): Invest in tools that make it easy for developers to do the right thing, providing fast feedback and reducing the learning curve.

The most transferable concept is the use of schemas. As Carreiro concludes, "The schemas is where we establish bridges with non-technical folks." By exposing the most granular elements of your application through a schema-driven interface, you can invite users to participate in the customization process, making your software more adaptable and valuable to a broader audience.

Comments

Loading comments...