Overview
Hadoop Distributed File System (HDFS) is the primary data storage system used by Hadoop applications. It provides high-throughput access to application data and is designed to be highly fault-tolerant.
Architecture
- NameNode: Manages the file system namespace and controls access.
- DataNodes: Store the actual data blocks.
- Replication: Data is replicated across multiple nodes for reliability.