Dropbox and GitHub Collaborate to Cut Monorepo Size from 87GB to 20GB
#DevOps

Dropbox and GitHub Collaborate to Cut Monorepo Size from 87GB to 20GB

DevOps Reporter
2 min read

Dropbox engineers partnered with GitHub to uncover inefficiencies in Git’s delta compression that caused their backend monorepo to swell to 87 GB. By refining repacking strategies and adjusting delta window and depth settings, they reduced the repository to 20 GB, slashing clone times from over an hour to under 15 minutes and boosting CI performance.

Featured image

Dropbox’s backend monorepo serves as a central integration point for services and shared libraries. As the codebase grew, teams reported clone operations that exceeded one hour and CI pipelines that slowed due to repeated fetch and rebuild cycles. Initial investigations ruled out large binaries or accidental commits as the primary cause. Instead, the growth rate outpaced what typical development activity would generate, pointing to how Git stored the data rather than what was stored.

Git reduces storage by creating packfiles that use delta compression: it finds similarities between objects and stores only the differences. At Dropbox’s scale, the default heuristics for choosing which objects to compare produced suboptimal packfiles. The delta window (how many recent objects Git examines) and delta depth (how long a chain of deltas can become) were not tuned for the repository’s particular access patterns, leading to redundant data and inflated packfile sizes.

To address this, Dropbox engineers treated the version‑control system as production infrastructure. They ran detailed analyses of object storage patterns, identified cases where delta chains were unnecessarily long, and experimented with alternative repacking configurations. Adjusting the delta window to a larger value allowed Git to find better matches across a broader set of objects, while limiting delta depth prevented excessively long chains that hurt dereference performance.

Because clone and fetch operations rely on server‑side packing managed by GitHub, the Dropbox team worked closely with GitHub engineers to apply these tuning parameters on the hosted side. Changes were first validated in mirrored environments that replicated production traffic, ensuring that the new packfile layout did not break existing workflows or introduce latency.

Following the rollout, the monorepo size dropped from 87 GB to approximately 20 GB—a reduction of about 77 percent. Clone times fell from over an hour to under 15 minutes, and CI pipelines saw faster execution because less data needed to be transferred and processed during each build. The reduced footprint also lowered the risk of hitting repository‑size limits and shortened onboarding time for new engineers.

Dropbox Collaborates with GitHub to Reduce Monorepo Size from 87GB to 20GB - InfoQ

The effort highlighted several lessons for teams managing large monorepos:

  • Treat Git as critical infrastructure; storage behavior directly affects developer velocity.
  • Regularly inspect packfile statistics (e.g., using git count-objects and git verify-pack) to detect abnormal growth.
  • Experiment with git repack --window=<n> --depth=<m> values in a controlled setting before applying them broadly.
  • Engage with your Git hosting provider early when tuning server‑side parameters, as they control the packing process for clone and fetch.

Author photo

Leela Kumili, Lead Engineer at Starbucks, covers topics such as platform engineering, distributed systems, and developer productivity. She focuses on building scalable, cloud‑native platforms and improving workflows through LLM‑based tools.

Comments

Loading comments...