Overview

Hadoop Distributed File System (HDFS) is the primary data storage system used by Hadoop applications. It provides high-throughput access to application data and is designed to be highly fault-tolerant.

Architecture

  • NameNode: Manages the file system namespace and controls access.
  • DataNodes: Store the actual data blocks.
  • Replication: Data is replicated across multiple nodes for reliability.

Related Terms