Training Data

Overview

The quality and diversity of training data are the most important factors in an AI model's performance. 'Garbage in, garbage out' is a fundamental rule in AI development.

Types

Labeled Data: Used in supervised learning (e.g., images with tags).
Unlabeled Data: Used in self-supervised learning (e.g., raw text from the internet).

Challenges

Bias: If the training data is biased, the model will be biased.
Copyright: Legal issues surrounding the use of web-scraped data.

Overview

Types

Challenges

Related Terms