How to Design YouTube: CDNs, Transcoding, and the Hot Video Problem

Building a video streaming platform at YouTube's scale requires solving unique challenges around massive file uploads, global content delivery, and the economics of serving both viral hits and long-tail content.

If you read my previous post about designing a News Feed system, you might be wondering: what makes a video streaming platform any different? While a news feed handles text and small image payloads, streaming 4K video globally is an entirely different beast. Without any delay, let's break down the system architecture in a structured manner.

System Requirements

Functional Requirements:

Users can upload and post videos
Users can watch/stream videos smoothly
Users can like and comment on videos
Users can subscribe to other creators
Videos must be available in multiple qualities (240p, 480p, 720p, 1080p, 4K) depending on the user's internet speed

Non-Functional Requirements:

Low Latency: Video playback should start in < 2 seconds
Highly Scalable: Must support up to a billion users
High Availability: The system must remain accessible (favoring availability over strict consistency)
Eventual Consistency: It is perfectly fine if a user's subscriber count takes a few seconds to update globally

Core Entities

User
Video
Like
Comment

Core API Endpoints

Keeping it RESTful, our core endpoints would look something like this:

POST /v1/videos (Request upload URL and submit metadata)
GET /v1/videos/{video_id} (Fetch video stream and metadata)

High-Level Architecture

The diagram above illustrates the high-level architecture of our system:

Load Balancer / API Gateway: Distributes incoming traffic evenly across our stateless backend servers to prevent any single point of failure
Blob Storage (Amazon S3): We cannot store massive 10GB+ video files in a traditional SQL or Key-Value database. Instead, the actual video files are stored in object storage like S3
DynamoDB (Metadata Store): Since we only need to store the metadata of the video (Title, Uploader ID, S3 URL, Likes) and don't require strict ACID properties, a highly scalable Key-Value database like DynamoDB is the perfect fit
Transcoding Pipeline (Chunkers): To stream video seamlessly, we don't just send one massive file. We pass the uploaded video to a background service that chunks it into 3-second segments and transcodes it into different resolutions (1080p, 720p, 240p)

Step-by-Step Data Flow

To really understand this architecture, let's walk through the exact lifecycle of the two most important actions in our system: uploading a video and watching a video.

1. The Write Path (Uploading a Video)

When a creator uploads a new video, here is exactly what happens behind the scenes:

Request Permission: The client app sends a request to our API Gateway to upload a video
Pre-signed URL Issued: The API Server responds with a secure, temporary Pre-signed S3 URL
Direct Upload: The client bypasses our servers and uploads the massive video file directly into our "Raw Videos" S3 bucket
Event Triggered: Once S3 finishes receiving the file, it fires an event directly into our Message Queue (e.g., Kafka or RabbitMQ)
Transcoding Pipeline: Our background Transcoding Workers pick up the event, pull the raw video from S3, and convert it into various resolutions (1080p, 720p, etc.) and chunks
Final Storage & DB Update: The workers save the processed chunks into a "Transcoded Videos" S3 bucket and update DynamoDB with the final metadata (URLs, formats available, uploader ID)

2. The Read Path (Streaming a Video)

When a user clicks on a thumbnail to watch a video, speed is everything:

Fetch Metadata: The client requests the video details from the API Server
Cache Check: The server checks Redis. If the video is popular, the metadata (title, S3/CDN URLs) is instantly returned. If it's a cache miss, it fetches it from DynamoDB, updates Redis, and returns it to the client
Stream Request: The client's video player uses the returned URL to request the actual video chunks from the closest CDN edge server
Video Delivery: If the CDN has the chunks (Cache Hit), the video plays instantly. If not (Cache Miss), the CDN fetches the chunks from our Transcoded S3 bucket, caches them locally for the next user, and streams them to the client

The Hard Parts: Trade-Offs & Bottlenecks

1. The Upload Bottleneck (Bypassing the API)

Many of you might be wondering: Why are we storing the video directly in S3 instead of sending it through our API Gateway? If millions of users try to upload 10GB video files directly through our backend API servers, the network I/O will immediately crash our system. Instead, we use Pre-signed URLs. The client asks our API for permission, the API grants a secure, temporary S3 URL, and the client uploads the heavy video chunks directly to S3, bypassing our servers entirely.

2. Achieving < 2s Latency (The Power of the CDN)

You might instantly think that using a Redis cache is the perfect way to decrease video load times. But caching a 4K video in Redis isn't practical. To achieve zero-buffering streaming globally, we use a CDN (Content Delivery Network). The transcoded video chunks are copied to edge servers all around the world. If a user in India watches a video uploaded in the US, the CDN serves the video from a server right down the street from them, effectively eliminating latency. Coupled with Adaptive Bitrate Streaming (ABR), the video player automatically switches between quality chunks (e.g., dropping from 1080p to 480p) if the user's internet speed drops, ensuring the video never stops to buffer.

3. Scaling to a Billion Users

Scaling this system horizontally is remarkably straightforward:

Amazon S3 provides virtually infinite storage capacity
Our backend API servers are stateless, meaning we can simply spin up more instances behind the Load Balancer as traffic increases
DynamoDB partitions data automatically, though we could implement consistent hashing if we needed to scale a custom database cluster

4. The "Hot Video" Problem

Imagine a massive creator uploads a video and a million users try to access it within 2 seconds. Our CDN will easily handle the load of serving the actual video file. But what about our DynamoDB instance? A million simultaneous reads for the video's metadata (Title, View Count, Likes) will cause database throttling. To solve this, we introduce Redis. We cache the metadata of highly popular videos in Redis with multiple read replicas, completely shielding our main database from the viral traffic spike.

5. The "Long-Tail" Problem: CDN Cost vs. Performance

We established that pushing videos to a CDN provides a latency-free experience. But CDNs are incredibly expensive. YouTube has billions of videos, but 80% of the daily traffic comes from only 20% of the videos (the viral hits and new releases). The remaining 80% are "long-tail" videos—perhaps a tutorial uploaded 5 years ago that gets 2 views a month.

The Trade-Off: Should we cache every single video in our CDN? No. Pushing dead, unwatched videos to expensive edge servers worldwide would bankrupt the company. Instead, we use an intelligent eviction policy. We aggressively cache the "hot" 20% of videos in the CDN. For the "long-tail" videos, we accept a slightly higher latency and stream them directly from our S3 storage, saving millions of dollars in infrastructure costs.

Conclusion

Designing a video streaming platform is a masterclass in decoupling and asynchronous processing. By keeping heavy media out of our API servers, utilizing a background event-driven transcoding pipeline, and intelligently routing traffic between CDNs for hot videos and S3 for long-tail content, we can build a resilient system capable of entertaining a billion users without a single moment of buffering.

If you were building this, what message queue would you choose for the transcoding pipeline? RabbitMQ, Kafka, or AWS SQS? Let me know your thoughts down in the comments!

#Video Streaming #CDN #transcoding #Scalable Architecture #Cloud Storage