Overview

In ML, code versioning (like Git) is not enough. You also need to know exactly which dataset and which hyperparameters were used to produce a specific model file.

Why it Matters

  • Reproducibility: Being able to recreate a model's results exactly.
  • Rollback: Quickly switching back to a previous version if a new model performs poorly in production.
  • Auditing: Tracking the history of a model for compliance and debugging.

Tools

  • DVC (Data Version Control)
  • MLflow
  • Weights & Biases

Related Terms