Designing a Playback Resume System at Scale (It’s Not Just a Timestamp)

Building a playback resume system for millions of users requires navigating distributed systems trade-offs between consistency, latency, and cost—far beyond simple timestamp storage.

At first glance, tracking video playback positions seems trivial—just store a user ID, video ID, and timestamp. But when millions of users press play simultaneously, switch devices mid-stream, or trigger writes every few seconds, the problem rapidly escalates. Seems easy enough What appears as a simple key-value store quickly becomes a distributed systems challenge involving caching strategies, conflict resolution, and deliberate CAP theorem trade-offs.

Defining the Core Problem

We're designing a system that allows users to resume videos from their last position across devices—not rebuilding the entire streaming pipeline. Key constraints:

Must handle millions of concurrent writes
Resume latency <150ms
Acceptable eventual consistency (1–2 seconds)
Independent per-profile watch history

Crucially, we prioritize availability over strong consistency during network partitions. If replicas fall slightly out of sync, a user resuming 1 second earlier is preferable to system downtime. This trade-off anchors our architecture.

Data Modeling Nuances

Instead of a basic user_id, we use a composite key: (account_id, profile_id, video_id). This reflects real-world usage where household accounts have multiple profiles—each with independent progress tracking. We store:

position (in seconds)
updated_at (server-generated timestamp)
device_id

The updated_at field enables conflict resolution via last-write-wins logic. While clock synchronization introduces complexity, server-side timestamps mitigate drift risks.

Okay... this escalated

Scaling the Firehose

Assume 10M daily users with 3M concurrently watching. With 30-minute sessions and position updates every 10 seconds, we’d face:

540 million writes/day
6,250 writes/second

Smart checkpointing reduces this load:

Update only on >15-second position changes
Trigger writes on pause or app backgrounding
Apply periodic 60-second fallback checkpoints

This optimization cuts writes by 3–5x, reducing database pressure and costs.

Hybrid Architecture: Caching as a Force Multiplier

A database-only approach (e.g., DynamoDB/Cassandra) works for MVPs but fails at scale due to read latency and cost. Our solution layers Redis over the database:

HLD for Playback resume system

Write Flow

Client sends POST /playback/update with position data
Service performs conditional DB write: UPDATE IF new.updated_at > existing.updated_at
Update Redis cache
Emit analytics event (optional)

Conditional writes ensure idempotency and prevent stale overwrites. We prioritize durability: writes hit the database before Redis. If Redis crashes, the DB remains source-of-truth.

Read Flow

Check Redis for GET /playback/resume
On cache miss: fetch from DB → repopulate Redis

99% of reads should hit cache, keeping latency under 150ms. Brief stale reads during replication fall within our consistency tolerance.

Failure Handling and Conflict Resolution

Fallbacks

Read timeouts: Bypass Redis and query DB directly
Write failures: Use exponential backoff with bounded retries. Drop non-critical checkpoints rather than block playback

Multi-Device Conflicts When a TV and phone update simultaneously:

Compare updated_at timestamps
Latest position wins

We accept minor inconsistencies (e.g., 5-second jumps) because availability trumps perfect synchronization. UX guardrails smooth edge cases:

Ignore regressions under 10 seconds
Cap backward jumps beyond 5 minutes
Prompt users: "Resume from 11:11?"

Production Hardening

Storage Lifecycle

Set TTLs (e.g., 180 days) for inactive entries to prevent unbounded growth

Hot Partition Prevention Partition keys (account_id#profile_id, video_id) distribute load evenly. Avoid video_id-only sharding—trending content would overload single partitions.

Capacity Planning Autoscale databases to handle write bursts. Monitor for throttling during peak hours (e.g., prime-time streaming).

Conclusion

This system exemplifies distributed design trade-offs:

CAP: Availability over strong consistency
Latency: Caching enables sub-150ms reads
Cost: Checkpoint optimization reduces write amplification
UX: Guardrails mask tolerable inconsistencies

What looks like simple state persistence is actually a carefully balanced symphony of databases, caches, and conflict resolution—proving that at scale, no timestamp is just a timestamp.

Heroku