Overview
Real-world datasets often have missing values due to errors, non-responses, or system failures. Imputation allows analysts to retain the entire dataset rather than discarding rows with missing information.
Common Techniques
- Mean/Median/Mode Imputation: Replacing missing values with the average or most common value of that column.
- K-Nearest Neighbors (KNN) Imputation: Using similar observations to predict the missing value.
- Multiple Imputation: Creating several different plausible datasets and combining the results.
- Predictive Modeling: Using other features to predict the missing value.
Risk
Imputation can introduce bias if the data is not missing at random.