Data Catalog

Overview

A data catalog (e.g., AWS Glue Data Catalog, Google Cloud Data Catalog) acts as a 'search engine' for your data. It stores metadata about your databases, tables, and files, making it easy for data scientists and analysts to find the information they need.

Key Features

Automated Discovery: Crawlers can automatically scan data sources to identify schemas and update the catalog.
Metadata Storage: Stores information about table structures, partitions, and data types.
Search and Discovery: Provides a searchable interface for finding data assets.
Data Lineage: Tracks where data came from and how it has been transformed.

Importance

Essential for data governance and for enabling self-service analytics in large organizations.

Overview

Key Features

Importance

Related Terms