Overview
Over-sampling is used to address imbalanced datasets, where one class (e.g., 'Fraud') is much rarer than the other (e.g., 'Legitimate'). Models trained on imbalanced data often ignore the minority class.
Methods
- Random Over-sampling: Duplicating existing instances of the minority class.
- SMOTE (Synthetic Minority Over-sampling Technique): Creating new, synthetic instances of the minority class by interpolating between existing ones.
Risk
Random over-sampling can lead to overfitting because the model sees the same minority instances multiple times.