K-Means Clustering

Overview

K-Means is the most common clustering algorithm. It aims to partition data so that points within the same cluster are as similar as possible, while points in different clusters are as different as possible.

The Process

Choose the number of clusters (K).
Randomly initialize K 'centroids' (center points).
Assignment: Assign each data point to the nearest centroid.
Update: Calculate the new mean of all points in each cluster and move the centroid to that mean.
Repeat: Steps 3 and 4 until the centroids stop moving.

Limitations

You must specify K in advance.
Sensitive to the initial placement of centroids.
Struggles with clusters of varying sizes or non-spherical shapes.

Overview

The Process

Limitations

Related Terms