Overview

Data Wrangling, sometimes referred to as data munging, is the process of cleaning, structuring, and enriching raw data into a desired format for better decision-making in less time. It is a core task for data scientists and analysts, often consuming a significant portion of their time.

The Wrangling Process

  1. Discovering: Understanding what is in the data.
  2. Structuring: Organizing the data for easier use.
  3. Cleaning: Removing errors and inconsistencies.
  4. Enriching: Adding data from other sources to provide more context.
  5. Validating: Ensuring the data meets quality standards.
  6. Publishing: Making the wrangled data available for analysis.

Tools

  • Python (Pandas, NumPy)
  • R (tidyverse)
  • SQL
  • Specialized tools like Trifacta or Alteryx

Related Terms