The Architectural Decisions That Actually Mattered: Building a Production-Ready Multi-Service Backend

An in-depth look at seven pragmatic architectural decisions that enabled a high-functioning multi-service backend running on minimal infrastructure, with analysis of trade-offs and scalability paths.

Most system design articles focus on the exciting new technologies or complex patterns, but rarely discuss the mundane choices that actually determine production success. This article examines the architectural decisions that mattered most when building a platform handling social media giveaways, gift card marketplace, and telecom gift vending—all running on a single PostgreSQL database and Redis instance for just €27 per month.

1. Three Services, Not Ten—And Not One

The backend is implemented as a NestJS monorepo with three independently deployable applications:

Giveaway API (port 5000): Handles events, participants, host wallet, auth, and admin functionality
Giftcard API (port 5002): Manages cards, merchants, escrow, redemptions, and merchant wallet
Job Processor (port 5001/5003): Handles background work with no HTTP surface

This is not a microservice architecture but a modular monolith with deployment boundaries. The key insight is treating "microservices" as a tool rather than a goal. The author had two bounded domains with genuinely different data ownership and access patterns, justifying the split. Further division would have added coordination overhead without operational benefits at this scale.

The monorepo approach provides organizational clarity:

Shared build pipeline, migration runner, gRPC contract types, and test suite
Changes to shared infrastructure (JWT guards, Redis config, payment client) deploy everywhere in one merge
Avoids the dependency management nightmare of separate repositories

2. gRPC Between Services, Not HTTP

The Giveaway and Giftcard APIs communicate constantly, with each calling the other for various operations. Rather than using the standard REST approach, the author chose gRPC for three concrete reasons:

Typed contracts enforced at compile time: Request and response shapes live in .proto files and compile into TypeScript interfaces that both caller and handler must satisfy. Field renaming in one service causes immediate compilation failures in the other, preventing runtime errors.

Bidirectional calls without auth overhead: Internal Docker network communication uses gRPC over TCP with no auth headers, CORS, or URL routing—just typed method calls.

Persistent calls through Bull: Every gRPC call is persisted through Bull, not just retried in memory. A custom Proxy intercepts calls before they reach the wire, enqueuing GRPC_CALL Bull jobs that serialize service name, method name, and request payload into Redis.

The caller receives responses synchronously, but the transport through Redis ensures calls survive API process restarts. Bull's retry policy handles failures at the execution layer, with a circuit breaker preventing struggling downstream services from filling the queue with failing jobs.

3. Redis Is Doing Four Different Jobs Simultaneously

The single Redis instance handles four distinct, production-critical responsibilities concurrently:

Bull queue backend: Six job queues (email, social verification, notifications, analytics, event processing, WebSocket) backed by Redis lists and sorted sets with AOF persistence.

Socket.IO pub/sub adapter: The WebSocket gateway runs separately from the API. When the Job Processor needs to emit events, it publishes to Redis channels that all gateway instances subscribe to, enabling horizontal scaling without code changes.

TTL-based application cache: A CacheService wraps Redis with a getOrSet(key, factory, ttl) pattern for frequently accessed data like merchant wallet balances and service fee rates.

Atomic concurrency control: Real-time winner selection uses a Lua script executed atomically in Redis to prevent over-awarding during high concurrent load, without application-level locking.

4. The Escrow State Machine Is Where Financial Correctness Lives

Gift card prizes follow an irreversible state machine: RESERVED → SETTLED (redeemed) or RELEASED (cancelled/unassigned/expired). Financial correctness is maintained through several design decisions:

Idempotency keys: Every financial operation carries an idempotency key to prevent duplicate processing.

Settlement calculation: The settlement calculation strips out payment provider fees before applying platform service rates, avoiding double-charging.

Transfer ordering: Both payment transfers (to merchant and platform) occur before any database write. Failure leaves the database untouched, allowing safe retries.

Compensation pattern: Finalization uses inline saga compensation—wallet deduction failures trigger immediate escrow refunds before error propagation.

5. The Job Processor Runs in Two Separate Modes

The job processor operates in three modes controlled by JOB_MODE:

worker: Handles email sending, event draws, social API verification, analytics
gateway: Socket.IO WebSocket server only
all: Both modes for development

These scale independently:

Workers scale with queue backlog growth
Gateways scale with concurrent WebSocket connections
Worker crashes slow job processing but keep data safe in Redis
Gateway crashes cause brief client disconnections with automatic reconnection

The API processes never touch Socket.IO directly, keeping HTTP event loops free from connection overhead. Scaling is handled through simple Docker Compose commands.

6. Observability: Infrastructure Errors Go to Sentry, Not to Users

The system makes a clear distinction between two error categories:

User-facing errors: Validation failures, not-found, unauthorized—returned as structured HTTP responses with clear messages.

Infrastructure errors: Payment failures, gRPC circuit open, job exhaustion, Redis drops—captured to Sentry without propagating to users.

Every infrastructure boundary has explicit Sentry capture, with correlation IDs propagated through Bull jobs to trace failures back to originating user actions. Sentry is disabled in development and staging to avoid noise.

7. The Honest Capacity Ceiling

With a single VPS (8 vCPU, 16GB RAM) and PostgreSQL VPS, the system handles:

300-500 req/s for concurrent browsing/dashboard reads
~100-200 simultaneous event registrations
~50-100 concurrent auth operations
5,000-10,000 concurrent WebSocket connections
1,000-3,000 total comfortable active users
15,000-30,000 total registered users

The first scaling bottleneck is TypeORM's default connection pool (10 per app). Increasing to 25 per app roughly triples concurrent write capacity at no cost. WebSocket capacity scales linearly with additional gateway instances.

8. The Scaling Roadmap

The architecture is designed for independent scaling actions requiring no code changes:

Stage 1: Single VPS with tuned connection pools, handles 15,000-30,000 users Stage 2: Add more workers (--scale worker-proc=3) for increased queue throughput Stage 3: Move Redis to dedicated VPS when adding second app VPS Stage 4: Add second app VPS with Traefik load balancing Stage 5: Kubernetes only when managing 3+ VPS nodes becomes operationally expensive

The current Docker Compose setup maps nearly 1:1 to Kubernetes Deployments when needed, with stateless containers, externalized config, and health checks already defined.

The full architecture document covers additional topics including database schema separation, complete gRPC contracts, presigned URL file uploads, countdown timer implementation, detailed financial flows, and scaling migration procedures.

Built with NestJS 11, TypeScript, PostgreSQL 15, Redis 7, gRPC, Bull, Socket.IO, Traefik, and Docker on Hetzner Cloud.

#Architecture #gRPC #NestJS #Redis #backend