Etsy Completes Massive 1000-Shard MySQL Migration to Vitess After Five-Year Effort
#Infrastructure

Etsy Completes Massive 1000-Shard MySQL Migration to Vitess After Five-Year Effort

Infrastructure Reporter
4 min read

Etsy has successfully migrated its 425 TB, 1000-shard MySQL infrastructure to Vitess, replacing proprietary sharding logic with vindexes and eliminating the single-point-of-failure index database.

The Etsy engineering team has completed a five-year migration of its massive MySQL sharding infrastructure to Vitess, moving approximately 425 terabytes of data across 1,000 shards and serving 1.7 million requests per second through the open-source database clustering system.

Featured image

The migration represents one of the largest production deployments of Vitess to date, with Etsy engineers executing approximately 2,500 pull requests and migrating 6,000 queries during the transition. The effort culminated in replacing Etsy's proprietary shard management system with Vitess's vindexes, which define how application data maps to database shards and route queries across them.

The Challenge of Legacy Sharding

Since around 2010, Etsy has operated a sharded MySQL architecture to store most of its production data. The company's original approach used proprietary sharding logic where each database table had a corresponding model in an internal object-relational mapping (ORM) layer. For sharded tables, a unique ID field called the shardifier ID determined which shard stored each record.

"While most models use shop_id or user_id as a sharding key, overall more than 30 different IDs were used," explained Ella Yarmo-Gray, senior software engineer at Etsy. "Record-to-shard mappings were stored in a single (unsharded) 'index' database."

This architecture, while providing scalability and limiting outage impacts to small portions of traffic, created significant operational challenges. Scaling operations were slow and manual, the index database became a single point of failure, and developers had to manage sharding complexity themselves.

Why Vitess?

Etsy chose Vitess to address these challenges while maintaining MySQL compatibility. The open-source system, originally developed by YouTube and now a Cloud Native Computing Foundation project, provides horizontal scaling capabilities for MySQL databases.

The migration strategy involved introducing Vitess as a layer between the ORM and the database, routing queries through it while the ORM continued specifying target shards. This approach allowed Etsy to test Vitess's capabilities without immediately moving data.

Custom Vindexes for Complex Sharding

The most significant technical challenge was Etsy's random, non-algorithmic shard mappings. "Since the ORM's shard mappings are random and not algorithmic, using one of these out-of-the-box would require re-sharding all of our data – a process that would be manual and likely take years," Yarmo-Gray noted.

Instead of using Vitess's built-in vindexes, Etsy engineers wrote custom vindexes that ported their existing shard logic into Vitess. This approach enabled testing how vindexes worked in Etsy's environment without the complexity and risk of moving data immediately.

The custom vindexes effectively translated Etsy's proprietary sharding logic into Vitess's framework, allowing the company to maintain its existing data distribution while gaining Vitess's operational benefits.

Migration Strategy and Execution

The migration involved redesigning parts of Etsy's data model to support better sharding, selecting appropriate shard keys, and gradually moving production traffic to the new environment while verifying data consistency.

"Five years, approximately 2,500 pull requests and 6,000 queries later, we have successfully migrated Etsy's shard management to Vitess vindexes!" Yarmo-Gray concluded. "Despite the work we put in to streamline the migration process, it was still a challenge to replace the database infrastructure for a codebase of Etsy's scale and age."

Documentation and Knowledge Sharing

Throughout the migration, the Etsy engineering team has published a series of detailed articles documenting their experience. The "Sharding Payments with Vitess" series covers various aspects of the migration, including:

  • Challenges of migrating data models
  • Cut-over efforts for high-traffic systems
  • Evaluation of cutover risks
  • Lessons learned from the multi-year process

These articles provide valuable insights for other organizations considering similar migrations, particularly those with complex, legacy sharding architectures.

Impact and Benefits

The migration to Vitess has eliminated the single-point-of-failure index database and hidden shard complexity from developers. By moving shard routing logic into Vitess's vindexes, Etsy has gained the ability to perform resharding operations and shard previously unsharded tables without requiring manual intervention.

The successful completion of this migration demonstrates Vitess's maturity for handling large-scale, production MySQL workloads and provides a blueprint for other organizations facing similar challenges with legacy sharded database architectures.

For organizations considering similar migrations, Etsy's experience highlights the importance of careful planning, incremental migration strategies, and the value of custom solutions when dealing with complex legacy systems. The five-year timeline, while substantial, reflects the careful approach needed when migrating critical infrastructure for a large-scale e-commerce platform serving millions of users daily.

Comments

Loading comments...