Overview

The holdout method is the standard way to prepare data for machine learning. By 'holding out' some data from the training process, you can get a realistic sense of how the model generalizes.

Common Splits

  • 80/20: 80% training, 20% testing.
  • 70/15/15: 70% training, 15% validation, 15% testing.

Limitations

If the dataset is small, the holdout method can be 'unlucky'—the specific data points in the test set might not be representative of the whole. In these cases, Cross-Validation is preferred.

Related Terms