Overview
Data Wrangling, sometimes referred to as data munging, is the process of cleaning, structuring, and enriching raw data into a desired format for better decision-making in less time. It is a core task for data scientists and analysts, often consuming a significant portion of their time.
The Wrangling Process
- Discovering: Understanding what is in the data.
- Structuring: Organizing the data for easier use.
- Cleaning: Removing errors and inconsistencies.
- Enriching: Adding data from other sources to provide more context.
- Validating: Ensuring the data meets quality standards.
- Publishing: Making the wrangled data available for analysis.
Tools
- Python (Pandas, NumPy)
- R (tidyverse)
- SQL
- Specialized tools like Trifacta or Alteryx