A production MongoDB design needs two separate choices: sharding to spread load and replica sets to survive node loss without losing the write path.

Problem
Teams add MongoDB sharding after a single replica set reaches a limit: write pressure climbs, hot collections crowd memory, or one machine carries too much data. Sharding solves that scale problem by splitting a collection across shards. Replication solves a different problem by keeping copies of each shard online after a node fails.
You need both concepts in the same diagram because production clusters use both at once. A shard usually runs as a replica set. That means a write enters the cluster through a router, lands on the shard that owns the document range, and reaches the primary member of that shard's replica set. Secondary members copy the operation from the oplog and give operators read capacity, backups, and failover options.
MongoDB documents the two layers in separate guides: sharding for data distribution and replica sets for redundancy. A real architecture diagram should show the border between those layers, because teams make different trade-offs at each layer.
Solution approach
Start the diagram with three client-facing components: application services, MongoDB drivers, and mongos routers. Application code should not choose a shard. The driver talks to mongos, and mongos consults config servers to find the shard that owns the target data.
Config servers store cluster metadata. They track databases, collections, shard key ranges, and chunk ownership. Operators run config servers as a replica set because mongos needs that metadata during normal request routing. If the config server set loses quorum, the cluster loses access to metadata changes. Existing reads and writes can continue in limited cases, but operators lose the ability to rebalance or change sharded metadata.
Each shard should appear as a replica set in the diagram. A three-node shard gives you one primary and two secondaries. The primary accepts writes. Secondaries replicate the oplog and can serve reads if the application chooses a read preference that permits it. MongoDB's read preference setting gives API teams a direct way to trade freshness for read distribution.

Shard key selection drives the shape of the whole system. A good shard key spreads writes across shards and supports the common query filters. A bad shard key sends a large share of writes to one shard, or forces mongos to scatter a query across shards because the query lacks the shard key. Teams should model the top API calls before they choose the key. The write endpoint, the list endpoint, and the lookup endpoint matter most.
A practical shard key review asks two questions. First, does the key avoid a single write hotspot? Second, can the main queries include the key? A customer-facing SaaS system might use tenantId as part of the shard key because many requests already carry that value. A time-series workload might need a compound key that includes a high-cardinality field before a time field, because a pure timestamp key can concentrate writes on the newest range.
Consistency model
MongoDB gives application teams consistency controls through read concern and write concern. These controls belong in the API contract, not in an operations footnote.
For writes, w: majority tells MongoDB to acknowledge the write after a majority of voting replica set members commit it. That costs latency, but it protects the application from a primary failure that could roll back an acknowledged write. For lower-risk events, such as analytics counters or retryable background jobs, an API may accept a weaker write concern and use idempotency to recover.
For reads, readConcern: majority asks MongoDB to return data that has reached a majority of replica set members. Teams use it when users must not see data that MongoDB could roll back after failover. Linearizable reads provide a stronger guarantee for single-document reads from the primary, but the application pays for coordination. Many product APIs use majority writes with local reads from the primary because that mix gives acceptable latency and avoids stale secondary reads.
Cross-shard transactions deserve a separate callout in the diagram. MongoDB supports distributed transactions, but teams should treat them as a cost center. A transaction that touches two shards needs coordination across both replica sets. That increases latency and failure surface. Good data modeling keeps the common write path inside one shard when the product allows it.
API patterns
Drivers should send retryable writes for idempotent operations, and services should expose idempotency keys for client-facing commands that might reach MongoDB twice. Network timeouts create ambiguity: the client may lose the response after MongoDB commits the write. An idempotency key lets the service return the prior result instead of creating a duplicate order, payment, or workflow item.
APIs should include the shard key in routes or request context. A route such as GET /tenants/{tenantId}/orders/{orderId} gives the service the tenantId before it queries MongoDB. The query can target one shard. A route that accepts only orderId may force a scatter-gather query unless orderId participates in the shard key.
Pagination needs the same discipline. Offset pagination performs poorly across large sharded collections because mongos may ask shards to scan and discard rows. Cursor pagination based on an indexed key keeps the query bounded. For tenant-scoped feeds, a compound index on tenantId and createdAt can support stable page tokens and shard targeting.
Background workers should respect ownership boundaries. A batch job that scans all tenants can flood each shard and compete with user traffic. Operators can split that job by shard key range, cap concurrency per shard, and schedule heavy jobs away from peak API traffic. The diagram should show batch clients as separate traffic sources so reviewers can see that read load.
Trade-offs
Sharding gives teams data and write scale, but it adds routing, balancing, and shard key risk. Replica sets give teams failover and read options, but they do not increase write capacity inside one shard. A cluster with three shards can absorb more writes than one replica set if the shard key spreads writes. A cluster with one hot shard still acts like one overloaded replica set.
Replication lag changes API behavior. A secondary read can return stale data after the primary accepts a write. That may work for dashboards, search previews, or exports. It can break checkout flows, access control checks, and user settings screens. Product teams should mark each endpoint with a read preference and consistency requirement before they send traffic to secondaries.
Balancer activity adds another operational trade-off. MongoDB moves chunks between shards to keep data distribution balanced. Those migrations consume network, disk, and CPU. Operators should monitor chunk distribution, jumbo chunks, replication lag, and slow queries together. A shard that looks full may suffer from one large chunk that the balancer cannot split without a better shard key.
A production diagram should show failure paths. If one secondary fails, the shard keeps serving writes. If the primary fails and the replica set can elect a new primary, clients see a short error window and drivers retry eligible operations. If a shard loses a majority of voting members, that shard cannot accept writes. Other shards may continue serving targeted operations, but any query that needs the failed shard will fail.
Architecture checklist
Use at least two mongos routers behind application services so one router process does not become a choke point. Run config servers as a replica set. Run each shard as a replica set with nodes spread across failure domains. Choose shard keys from real query shapes and write patterns. Put read concern, write concern, retry behavior, and read preference in service-level API standards.
Teams should test failover during load tests. Kill a primary, watch driver retries, measure write latency, and inspect user-visible errors. Repeat the test during chunk migration. A diagram that survives those exercises gives engineers a map they can use during incidents.
MongoDB sharding and replication solve different production problems. Teams get the best result when they draw both layers, assign each API path a consistency model, and design shard keys from the traffic they plan to run.

Comments
Please log in or register to join the discussion