Overview

K-Means is the most common clustering algorithm. It aims to partition data so that points within the same cluster are as similar as possible, while points in different clusters are as different as possible.

The Process

  1. Choose the number of clusters (K).
  2. Randomly initialize K 'centroids' (center points).
  3. Assignment: Assign each data point to the nearest centroid.
  4. Update: Calculate the new mean of all points in each cluster and move the centroid to that mean.
  5. Repeat: Steps 3 and 4 until the centroids stop moving.

Limitations

  • You must specify K in advance.
  • Sensitive to the initial placement of centroids.
  • Struggles with clusters of varying sizes or non-spherical shapes.

Related Terms