Knowledge Distillation

Overview

Knowledge distillation allows us to transfer the 'intelligence' of a massive model (like GPT-4) into a much smaller, more efficient model that can run on a phone or laptop.

How it Works

The student model is trained not just on the final labels, but on the 'soft targets' (the full probability distribution) produced by the teacher. This provides more information about the relationships between classes.

Applications

Creating fast, lightweight versions of BERT (e.g., DistilBERT).
Deploying high-quality AI to edge devices.

Overview

How it Works

Applications

Related Terms