Overview

The quality and diversity of training data are the most important factors in an AI model's performance. 'Garbage in, garbage out' is a fundamental rule in AI development.

Types

  • Labeled Data: Used in supervised learning (e.g., images with tags).
  • Unlabeled Data: Used in self-supervised learning (e.g., raw text from the internet).

Challenges

  • Bias: If the training data is biased, the model will be biased.
  • Copyright: Legal issues surrounding the use of web-scraped data.

Related Terms