Overview
Under-sampling addresses imbalanced datasets by removing instances from the majority class until the classes are more balanced.
Methods
- Random Under-sampling: Randomly deleting instances from the majority class.
- Near-Miss Algorithm: Selecting majority class instances that are close to minority class instances.
Risk
Under-sampling can lead to the loss of valuable information contained in the majority class, potentially resulting in a less accurate model.