Overview

A data catalog (e.g., AWS Glue Data Catalog, Google Cloud Data Catalog) acts as a 'search engine' for your data. It stores metadata about your databases, tables, and files, making it easy for data scientists and analysts to find the information they need.

Key Features

  • Automated Discovery: Crawlers can automatically scan data sources to identify schemas and update the catalog.
  • Metadata Storage: Stores information about table structures, partitions, and data types.
  • Search and Discovery: Provides a searchable interface for finding data assets.
  • Data Lineage: Tracks where data came from and how it has been transformed.

Importance

Essential for data governance and for enabling self-service analytics in large organizations.

Related Terms