Building Distributed Queues on Object Storage: A Journey from CAS to HA Broker

Turbopuffer engineers replaced their sharded job queue with a distributed system built on a single JSON file in object storage, achieving 10x lower tail latency through stateless brokers and atomic updates.

Distributed systems often succumb to complexity through layered abstractions, but occasionally we rediscover elegance in foundational primitives. At Turbopuffer, we recently rearchitected our indexing job queue – replacing a sharded system vulnerable to head-of-line blocking with a solution leveraging object storage's intrinsic properties. The result? A FIFO queue with at-least-once guarantees, 10x lower tail latency, and architectural simplicity worthy of examination.

The Object Storage Philosophy

Why build atop services like Google Cloud Storage? Object storage provides predictable behavior, near-infinite scalability, and operational simplicity. As Dan Harrison notes, "We know how it behaves, and as long as we design within those boundaries, we know it will perform." This philosophy guided our four-phase evolution.

Phase 1: Atomic Simplicity

The minimal viable queue is a single queue.json file. Clients (pushers/workers) implement compare-and-set (CAS) semantics:

Read current state
Modify locally (append jobs or mark ○→◐)
Write back only if object revision unchanged

This provides strong consistency through object storage's atomic writes. As shown in Turbopuffer's metrics, this handles ~1 request/second – GCS's default rate limit – but fails at scale.

Phase 2: Group Commit for Throughput

Object storage writes exhibit high latency (~200ms). Our solution: group commit. Instead of per-request writes:

Buffer incoming requests in memory
Flush batched operations post-write

This shifts the bottleneck from write latency to network bandwidth (~10GB/s). However, client contention remains – N clients still fight to update one file.

Phase 3: The Stateless Broker

Enter the broker – a stateless process mediating all access to queue.json. Clients interact solely with the broker, which:

Aggregates requests in memory buffers
Executes batched CAS writes
Blocks clients until writes confirm

The broker becomes the sole writer, eliminating contention. A single instance handles thousands of connections since it only coordinates – object storage bears the I/O load.

Phase 4: High Availability Mechanics

Two failure modes required mitigation:

Broker Failure

Clients start new broker after timeout
Broker address stored in queue.json
CAS prevents dual brokers: Loser detects revision mismatch

Worker Failure

Workers send heartbeats (◐→◐(♥))
Broker writes heartbeats to queue.json
Timed-out jobs (○) get reassigned

These patterns create a self-healing system without persistent coordinator state.

Implications and Tradeoffs

This architecture exemplifies infrastructure minimalism:

✅ Pros: Predictable scaling, no dedicated queue services, leverages cloud durability
⚠️ Limits: Single-broker throughput (~5 writes/sec), in-memory job constraints

For Turbopuffer's workload (sub-1GB queues), it delivers 200ms p99 latency across 100B+ vectors. The approach echoes our WAL design – batching writes to object storage creates efficiency.

Counterpoint: When Not to Use

This isn't a universal solution. Workloads requiring:

Sub-millisecond latency
Millions of writes/second
Job states exceeding memory limits

...would need alternative approaches like sharded brokers or dedicated queue systems. Yet for mid-scale, durable workflows, it demonstrates how object storage's constraints foster innovation.

As cloud primitives mature, we're reminded that distributed systems needn't always mean distributed complexity. Sometimes a single JSON file – with careful engineering – is enough.

Explore Turbopuffer's vector database or dive into their systems blog.

#Infrastructure #Cloud #backend #DevOps