Overview
Unlike PCA, which is linear and focuses on preserving large-scale structure (variance), t-SNE is non-linear and focuses on preserving local structure (keeping similar points close together).
How it Works
It converts similarities between data points to joint probabilities and tries to minimize the divergence between these probabilities in the high-dimensional and low-dimensional spaces.
Use Case
t-SNE is the gold standard for visualizing complex clusters in data, such as word embeddings or the internal representations of neural networks.
Limitations
- Computationally expensive for very large datasets.
- The resulting plot can be sensitive to hyperparameters (like 'perplexity').
- Distances between clusters in the 2D plot may not be meaningful.