QCon London 2026: Shielding the Core: Architecting Resilience with Multi-Layer Defenses

SeatGeek's Anderson Parra reveals how to architect systems that survive traffic stampedes through multi-layer defenses, combining caching, rate limiting, and resource isolation to prevent cascading failures.

At QCon London 2026, Anderson Parra, Staff Software Engineer at SeatGeek, delivered a compelling presentation on "Shielding the Core: Architecting Resilience with Multi-Layer Defenses." His talk addressed a critical challenge facing modern distributed systems: surviving massive traffic spikes that can overwhelm even well-designed infrastructure.

Parra began by framing the problem through SeatGeek's operational context, describing what he called a "traffic stampede" - not the traffic itself, but the timing when it arrives faster than systems can adapt. He illustrated this with several failure signals, including the Noisy Neighbor Problem in multi-tenant systems and the Scaling Gap, which represents the dangerous period when scaling lags behind demand.

The Three Pillars of Core Shielding

Parra's defense strategy rests on three fundamental principles:

Absorb the Burst - Handle sudden traffic spikes before they reach core systems Control the Flow - Apply fairness, rate limits, and admission control Protect the Core - Keep critical services stable during demand spikes

This approach manifests through a three-layer defense system that SeatGeek has implemented.

Edge Shield: The First Line of Defense

The Edge Shield acts as the initial barrier, with three key responsibilities:

Cache: Serves requests without hitting origin servers
Queue: Absorbs sudden traffic bursts
Filter: Detects bots and invalid traffic

The cache serves as a resilience mechanism that addresses several critical issues: fewer cache responses as failures increase, more origin traffic when cache hits decrease, and cascading failures during traffic spikes. Parra emphasized that combining cache with rate limiting fundamentally changes system behavior - services remain stable, caches warm up more safely, and origin load decreases significantly.

SeatGeek also implements a Virtual Waiting Room that absorbs traffic and controls flow during extreme spikes.

Gateway Shield: Managing Legitimate Access

The Gateway Shield focuses on controlling legitimate access patterns:

Rate Limit: Controls request rates to prevent overload
Fair Access: Protects legitimate users through differentiated policies
Validation: Rejects invalid traffic

The rate limiting mechanism includes a Rate Limit Gate that protects the platform from overload. During normal traffic, client requests flow normally, but high spikes trigger HTTP 429 (Too Many Requests) responses. Traffic sources include humans (legitimate ticket buyers) and automated agents (sophisticated bots and distributed automation).

SeatGeek's Fair Access Policy implements rate limits by user/account and by API key, with IP-based limits as a fallback mechanism.

Platform Shield: Resource Isolation and Prioritization

The Platform Shield operates at the infrastructure level with three core responsibilities:

Resource Isolation: Applies CPU limits, schedules priorities, prevents noisy neighbors
Prioritization: Protects critical paths
Observability Signals: Utilizes queue metrics, CPU saturation, and scaling signals

Parra illustrated the importance of isolation with a compelling scenario involving three services (A, B, C). Without isolation, when Service A experiences a CPU spike, Service B suffers increased latency, and Service C eventually collapses. With proper CPU limits on Service A, all three services maintain stability.

The Signal Flow: Early Detection and Response

A critical insight from Parra's presentation was the importance of signal mapping. The flow works as follows:

Spike in traffic → Increase in queue size (signal) → Horizontal Pod Autoscaler (HPA) reaction → Increase in capacity (more pods) → Decrease in queue size

These signals originate from all three defense layers. Parra emphasized that resilient systems depend on early signals - every system needs signals to provide faster queue size drainage and prevent cascading failures.

Four Core Principles

Parra distilled his approach into four guiding principles:

Composition: Resilience is layered, not singular
Protect the Core: Preserve critical paths at all costs
Observe Pressure: Signals reveal stress before failure
Controlled Failure: Fail gracefully when necessary

The best signals appear before failure, giving systems time to react. Parra's concluding statement captured the essence of his message: "Internet stampedes are inevitable; system collapse, however, is not."

This multi-layer defense approach represents a mature evolution in system architecture, moving beyond simple scaling to intelligent, signal-driven resilience that can withstand the unpredictable nature of modern internet traffic patterns.

#resilience #traffic spikes #rate-limiting #Caching #resource-isolation