A deep dive into the distributed system architecture behind TikTok, examining how it handles billions of users, millions of daily uploads, and sub-200ms feed latency while balancing consistency, availability, and performance trade-offs.
Designing TikTok at Scale: Distributed Systems and Trade-offs in a Short-Video Platform

When we think about TikTok, we see a simple app that lets users create and watch short videos. Beneath this deceptively simple interface lies one of the most complex distributed systems ever built. Designing a platform that serves 1+ billion monthly active users with 34 million videos uploaded daily, while maintaining a 167ms feed latency (P99) and 26 Tbps peak egress bandwidth, requires architectural decisions that trade off between competing priorities.
This deep dive examines the distributed systems architecture that powers TikTok, focusing on the trade-offs engineers made to achieve scale while maintaining performance. We'll explore the ingestion pipeline, recommendation system, database strategy, and the unique architectural patterns that make TikTok work at this unprecedented scale.
The Scale Challenge
First, let's understand the operational TikTok must support:
- Monthly active users: 1B+
- Videos uploaded per day: 34 million (400/second peak)
- Target feed latency (P99): ~167ms
- Peak egress bandwidth: ~26 Tbps
- High availability: 99.99% uptime
These numbers aren't just impressive—they represent fundamental constraints that shape every architectural decision. At this scale, problems that are trivial at small magnitudes become existential threats. A 1% error rate means 10 million failed requests per day. A 100ms latency increase translates to 100 years of wasted user time daily.
Requirements: What Must the System Do?
Before drawing any boxes, we must define what the system must accomplish—and what can be imperfect on day one.
Functional Requirements
- Upload and transcode short videos: Accept video uploads, process them into multiple formats/resolutions, and store them efficiently.
- Serve personalized "For You" feed: Deliver a highly personalized stream of videos to each user with sub-200ms latency.
- Social interactions: Support likes, comments, shares, and follows with real-time propagation.
- Search: Enable searching for videos and creators across the platform.
- Live streaming: Support real-time video streaming with low latency.
Non-Functional Requirements
- High availability: 99.99% uptime translates to just over 52 minutes of downtime per year.
- Feed latency: P99 latency under 200ms for feed requests.
- Horizontal scalability: The system must scale by adding more machines, not by making individual machines more powerful.
- Global CDN video delivery: Videos must load quickly regardless of user location.
- Consistency model: Strong consistency where needed (user profiles), eventual consistency for others (like counts).
High-Level Architecture

The TikTok architecture splits into four major domains: ingestion (upload pipeline), serving (read path), recommendation (ML feed), and social graph. These domains interact through well-defined interfaces and asynchronous communication patterns.
The architecture follows these principles:
- Edge-first design: Push computation and storage as close to users as possible
- Asynchronous communication: Use message buses for non-critical paths to decouple services
- Multi-tier caching: Cache at multiple levels (edge, CDN, application, database)
- Hybrid consistency: Apply appropriate consistency models based on business requirements
- Independent scalability: Each domain scales based on its specific workload
Ingestion Pipeline
The ingestion pipeline handles video uploads, transcoding, and metadata storage. This domain faces massive write throughput and requires efficient storage of large binary objects.
Key components:
- Upload API: Accepts chunked multi-part uploads (5MB chunks) to handle unreliable mobile connections
- Deduplication: SHA-256 hashing before writing to avoid storing duplicate videos
- Transcode workers: GPU-powered processing to create multiple resolutions (360p, 720p, 1080p) and formats (H.264, HEVC)
- Thumbnail extraction: Generates thumbnails and still frames for ML features and previews
- Metadata storage: MySQL/Vitess stores video metadata, creator info, and processing status
Serving Domain
The serving domain handles read requests for videos, profiles, and feeds. It must balance cache hit rates with fresh content delivery.
Key components:
- API Gateway: Handles authentication, rate limiting, and routing
- Feed Service: Merges personalized content, social graph data, and recommendations
- Object Storage: Stores original and transcoded video files
- CDN/Edge: Delivers video content from edge locations closest to users
Recommendation Engine
The recommendation engine is TikTok's secret sauce. It processes user behavior and video content to create the highly personalized "For You" feed.
Key components:
- Two-tower neural network: Separate towers for user state and video features
- Feature store: Redis + Cassandra for storing user embeddings and video metadata
- Real-time signals: Processes watch time, completion rates, and engagement metrics
- Ranking models: Apply contextual factors (time of day, location, device) to recommendations
Social Graph
The social graph manages relationships between users, content interactions, and activity feeds.
Key components:
- Graph database: Stores follower/following relationships
- Timeline services: Manage user feeds and activity streams
- Notification service: Delivers real-time updates for interactions
Key Components in Detail
CDN + Edge PoPs
The CDN is TikTok's most critical performance component. Approximately 70% of video traffic is served directly from edge nodes in 150+ cities, completely bypassing the origin servers. This design choice reduces load on origin systems and dramatically improves user-perceived performance.
The implementation uses:
- Anycast routing: Routes users to the nearest Point of Presence (PoP)
- Manifest invalidation: Video manifest files are invalidated within seconds when content goes viral
- Edge caching: Popular videos are cached at edge locations with appropriate TTLs
- Protocol optimization: HTTP/2 and QUIC for efficient transport
Upload Pipeline
The upload pipeline demonstrates how TikTok handles unreliable mobile connections while ensuring data integrity:
- Chunked upload: Videos are split into 5MB chunks that can be uploaded independently
- Resume capability: Failed uploads can resume from the last successful chunk
- Deduplication: SHA-256 hashing prevents storing duplicate videos
- Asynchronous processing: Transcoding happens after upload completion via worker queues
- Multi-format output: Creates 360p, 720p, 1080p, and HEVC variants for different network conditions
This pipeline handles 400+ uploads per second at peak, with each upload potentially generating multiple transcoded versions.
Recommendation Engine
The recommendation engine uses a sophisticated two-tower neural network architecture:
- Tower 1: Encodes user state including watch history, device type, time of day, location, and explicit preferences
- Tower 2: Encodes video features including visual embeddings, audio analysis, caption text, and metadata
- Scoring: Dot product between user and video embeddings produces relevance scores
- Ranking: Real-time signals (trending content, friend activity) refine the initial scoring
The model runs online for top-k retrieval, then a secondary ranker applies business rules before the final feed is assembled. This hybrid approach balances machine learning accuracy with real-time relevance.
Feed Assembly Strategy
TikTok's feed assembly differs significantly from platforms like Twitter or Instagram by using a hybrid approach:
Celebrity/high-follow accounts: Fan-out on write (posts pushed to follower inboxes eagerly)
- Read path is O(1) — just read the pre-computed inbox
- Fast feed assembly at serving time
- Massive write amplification when a celebrity posts
Regular accounts: Fan-out on read (merged at request time)
- No write amplification on post
- Storage-efficient
- Slower feed assembly if following thousands of accounts
The feed service merges both approaches, injects ML-recommended videos, and applies diversity rules to avoid content repetition. The final feed is cached in Redis with a 300s TTL to balance freshness with performance.
Kafka Message Bus
All write events (upload complete, like, follow, watch-complete) are published to Kafka topics, creating an event-driven architecture:
- Decoupling: Services communicate asynchronously, reducing dependencies
- Scalability: Consumers can be scaled independently based on workload
- Durability: Events are persisted, allowing for replay and recovery
- Ordering: Topics are partitioned by user_id to ensure ordered processing per user
Key topics include:
- Video upload events
- User interaction events (likes, comments, shares)
- Social graph updates
- Watch completion events
Database Strategy
TikTok's database strategy reflects the different consistency and performance requirements of various data types:
MySQL / Vitess
- Use cases: User profiles, video metadata, social graph relationships
- Why: ACID compliance for critical data, sharded by user_id for horizontal scaling
- Challenges: Handling write-heavy workloads at scale
- Solution: Vitess provides sharding and connection pooling for MySQL clusters
Redis Cluster
- Use cases: Counters (likes, views), session tokens, feed cache
- Why: Sub-millisecond reads for high-traffic data
- Challenges: Memory limitations for large datasets
- Solution: Cluster deployment with data partitioning
Cassandra
- Use cases: Watch history, timelines, notification logs
- Why: Wide-row reads, high write throughput, eventual consistency
- Challenges: Complex query patterns
- Solution: Careful data modeling to optimize for access patterns
This multi-database approach allows TikTok to optimize for specific access patterns while maintaining appropriate consistency guarantees for each data type.
Key Design Trade-offs
Fan-out on Write vs Read
The classic dilemma in social feed systems becomes more complex at TikTok's scale:
Fan-out on write advantages:
- O(1) read path for follower feeds
- Fast feed assembly at serving time
- Predictable performance during traffic spikes
Fan-out on write disadvantages:
- Massive write amplification for popular accounts
- Storage overhead for inboxes
- Complex recovery when systems fail
Fan-out on read advantages:
- No write amplification on post
- Storage efficiency
- Simpler recovery from failures
Fan-out on read disadvantages:
- Slower feed assembly for users with many follows
- Complex query patterns at serving time
- Potential for inconsistent feeds during high load
TikTok's hybrid approach balances these trade-offs by applying fan-out on write only to accounts with millions of followers, while using fan-out on read for regular users.
Eventual vs Strong Consistency
Different data types require different consistency models:
Eventual consistency (acceptable):
- Like/view counts can lag by seconds without user impact
- Recommendation freshness
- Trending topics
Strong consistency (required):
- User authentication tokens
- Billing and payment events
- Profile updates
TikTok segments these into separate storage tiers with appropriate consistency guarantees, accepting complexity for throughput on hot paths.
Push vs Pull for Notifications
Notification delivery presents another trade-off between immediacy and resource utilization:
WebSocket push (real-time):
- Immediate delivery for likes and comments
- Higher resource usage for persistent connections
- Complex connection management at scale
Pull-based batch (delayed):
- Resource-efficient for less critical notifications
- Simpler implementation
- Acceptable delay for weekly summaries and suggested follows
Back-of-Envelope Estimates
Understanding the scale requires basic calculations that inform architectural decisions:
Storage Requirements
- 34M uploads/day × 8 MB × 3 resolutions = ~816 TB/day of new video
- With 3x replication over 5 years = ~4.4 EB total raw storage
- This necessitates object storage with tiered pricing (hot/warm/cold storage)
Feed Read Traffic
- 500M DAU × 20 feed refreshes/day / 86,400 sec = ~115,000 feed reads/sec
- With 95% Redis cache hit rate → recommendation backend sees ~5,750 rps
- Requires massive caching layers and pre-computation
Bandwidth Requirements
- 500M users × 45 min × 2 Mbps (720p) / 86,400 = ~26 Tbps peak egress
- This exceeds the capacity of most ISPs, requiring direct peering agreements
- Explains why TikTok operates its own backbone in many regions
What Makes TikTok's Architecture Special?
Several architectural patterns distinguish TikTok from other large-scale social platforms:
1. Aggressive Edge Caching
Most platforms use CDNs as a performance optimization. For TikTok, the CDN is the entire delivery strategy:
- 70% of traffic served directly from edge nodes
- Video manifests invalidated within seconds of going viral
- Custom edge computing for transcoding and format adaptation
- This reduces load on origin systems and improves performance
2. Real-time ML Feedback Loops
Unlike platforms that optimize primarily for social graph traversal, TikTok inverts the paradigm:
- Algorithm is the product, not just a feature
- Video trajectory decided in first 30 minutes based on completion rate
- New creators can go viral without existing followers
- Real-time signal processing enables rapid content adaptation
3. Microservice Isolation
The architecture maintains strict boundaries between domains:
- Upload, serving, recommendation, and social graph are independently deployable
- Each domain scales based on specific workload characteristics
- Failure in one domain doesn't cascade to others
- Enables gradual rollout of changes and reduced blast radius
4. Hybrid Feed Computation
By combining pre-computation (for popular accounts) with real-time merging (for regular users), TikTok achieves both performance and efficiency:
- Celebrity feeds are pre-computed and stored
- Regular user feeds assembled at request time
- ML recommendations injected into both streams
- This hybrid approach optimizes for different user patterns
Lessons for System Designers
Designing systems at TikTok's scale teaches several valuable lessons:
Edge is everything: At this scale, pushing computation and storage to the edge isn't optional—it's mandatory for performance.
Trade-offs are inevitable: Every design decision involves balancing competing priorities. The key is making explicit, informed trade-offs rather than pretending they don't exist.
Asynchronous communication is essential: Synchronous coupling creates bottlenecks at scale. Event-driven architectures enable independent scaling and fault isolation.
Consistency isn't binary: Different data types require different consistency models. Segmenting storage by consistency needs optimizes both performance and correctness.
Caching is a first-class concern: At this scale, caching isn't an optimization—it's a fundamental requirement. Design for cache hits from the beginning.
Measure everything: Back-of-the-envelope calculations catch architectural problems before they become operational crises. Understand your numbers.
Conclusion
Designing TikTok at scale represents one of the most challenging distributed systems problems in the industry. The architecture demonstrates that success comes not from any single technology choice, but from a holistic approach that balances competing requirements through explicit trade-offs.
The most significant insight is that TikTok's architecture prioritizes the recommendation pipeline as the core product, rather than treating it as a feature layered on top of social connections. This fundamental shift in priorities drives every architectural decision, from edge caching to database strategy.
As engineers designing systems at scale, we can learn from TikTok's approach: start with clear requirements, make explicit trade-offs, and design for the failure modes that will inevitably occur at scale. The systems we build today will face challenges we can't yet imagine—preparing for those unknowns is what separates systems that merely work from those that endure.
For more insights into large-scale system design, explore resources on distributed systems patterns and real-world case studies from companies operating at similar scales. Understanding these architectures helps us make better decisions in our own projects, regardless of current size.

Comments
Please log in or register to join the discussion