Turbopuffer engineers replaced their sharded job queue with a distributed system built on a single JSON file in object storage, achieving 10x lower tail latency through stateless brokers and atomic updates.

Distributed systems often succumb to complexity through layered abstractions, but occasionally we rediscover elegance in foundational primitives. At Turbopuffer, we recently rearchitected our indexing job queue – replacing a sharded system vulnerable to head-of-line blocking with a solution leveraging object storage's intrinsic properties. The result? A FIFO queue with at-least-once guarantees, 10x lower tail latency, and architectural simplicity worthy of examination.
The Object Storage Philosophy
Why build atop services like Google Cloud Storage? Object storage provides predictable behavior, near-infinite scalability, and operational simplicity. As Dan Harrison notes, "We know how it behaves, and as long as we design within those boundaries, we know it will perform." This philosophy guided our four-phase evolution.
Phase 1: Atomic Simplicity
The minimal viable queue is a single queue.json file. Clients (pushers/workers) implement compare-and-set (CAS) semantics:
- Read current state
- Modify locally (append jobs or mark ○→◐)
- Write back only if object revision unchanged
This provides strong consistency through object storage's atomic writes. As shown in Turbopuffer's metrics, this handles ~1 request/second – GCS's default rate limit – but fails at scale.
Phase 2: Group Commit for Throughput
Object storage writes exhibit high latency (~200ms). Our solution: group commit. Instead of per-request writes:
- Buffer incoming requests in memory
- Flush batched operations post-write
This shifts the bottleneck from write latency to network bandwidth (~10GB/s). However, client contention remains – N clients still fight to update one file.
Phase 3: The Stateless Broker
Enter the broker – a stateless process mediating all access to queue.json. Clients interact solely with the broker, which:
- Aggregates requests in memory buffers
- Executes batched CAS writes
- Blocks clients until writes confirm
The broker becomes the sole writer, eliminating contention. A single instance handles thousands of connections since it only coordinates – object storage bears the I/O load.
Phase 4: High Availability Mechanics
Two failure modes required mitigation:
Broker Failure
- Clients start new broker after timeout
- Broker address stored in
queue.json - CAS prevents dual brokers: Loser detects revision mismatch
Worker Failure
- Workers send heartbeats (◐→◐(♥))
- Broker writes heartbeats to
queue.json - Timed-out jobs (○) get reassigned
These patterns create a self-healing system without persistent coordinator state.
Implications and Tradeoffs
This architecture exemplifies infrastructure minimalism:
- ✅ Pros: Predictable scaling, no dedicated queue services, leverages cloud durability
- ⚠️ Limits: Single-broker throughput (~5 writes/sec), in-memory job constraints
For Turbopuffer's workload (sub-1GB queues), it delivers 200ms p99 latency across 100B+ vectors. The approach echoes our WAL design – batching writes to object storage creates efficiency.
Counterpoint: When Not to Use
This isn't a universal solution. Workloads requiring:
- Sub-millisecond latency
- Millions of writes/second
- Job states exceeding memory limits
...would need alternative approaches like sharded brokers or dedicated queue systems. Yet for mid-scale, durable workflows, it demonstrates how object storage's constraints foster innovation.
As cloud primitives mature, we're reminded that distributed systems needn't always mean distributed complexity. Sometimes a single JSON file – with careful engineering – is enough.
Explore Turbopuffer's vector database or dive into their systems blog.

Comments
Please log in or register to join the discussion