Overview

In many real-world problems, data has too many features (the 'curse of dimensionality'), which can lead to overfitting and slow training. Dimensionality reduction simplifies the data while retaining its essential characteristics.

Two Main Approaches

  1. Feature Selection: Choosing a subset of the original features (e.g., removing irrelevant or redundant columns).
  2. Feature Extraction: Creating new, fewer features that are combinations of the original ones (e.g., PCA, Autoencoders).

Benefits

  • Improved model performance and generalization.
  • Reduced storage and computational requirements.
  • Easier data visualization and interpretation.

Related Terms