Overview

Machine learning models typically require numerical input. One-hot encoding is used to convert categorical data (e.g., 'Color': Red, Blue, Green) into a format that models can understand without implying an artificial order.

How it Works

Each unique category becomes a new column, and a '1' is placed in the column corresponding to the original category, while '0's are placed in the others.

Example

'Red' becomes [1, 0, 0], 'Blue' becomes [0, 1, 0], and 'Green' becomes [0, 0, 1].

Challenge

The Curse of Dimensionality: If a category has many unique values, one-hot encoding can create a very large number of sparse columns.

Related Terms