Overview

Under-sampling addresses imbalanced datasets by removing instances from the majority class until the classes are more balanced.

Methods

  • Random Under-sampling: Randomly deleting instances from the majority class.
  • Near-Miss Algorithm: Selecting majority class instances that are close to minority class instances.

Risk

Under-sampling can lead to the loss of valuable information contained in the majority class, potentially resulting in a less accurate model.

Related Terms