SAGA Pattern in Go

Exploring how to coordinate distributed transactions in Go using the Saga pattern, a practical approach to maintaining business consistency across microservices without relying on distributed transactions.

In this article, we will explore how to coordinate distributed transactions in Go using the Saga pattern. Coordinating Distributed Transactions Without Distributed Transactions Modern distributed systems are built from independently deployable services. That flexibility comes with a cost. The moment a business operation spans multiple services, a simple database transaction is no longer enough. Imagine an e-commerce checkout flow: Order Service ↓ Payment Service ↓ Inventory Service ↓ Shipping Service What happens if: the order is created successfully inventory is reserved payment fails You now have partially completed work spread across multiple services. In a monolith, you would simply roll back the transaction. In a distributed system, there is no single transaction to roll back. This is where the Saga Pattern comes in. Instead of relying on distributed transactions, a Saga coordinates a series of local transactions and compensating actions to maintain business consistency. In this article, we'll explore how production Go systems implement Saga workflows, the trade-offs involved, and practical patterns you can use in real microservices.

The Problem With Distributed Transactions In a monolithic application, business operations are often protected by a single database transaction. BEGIN; INSERT INTO orders (...); UPDATE inventory SET quantity = quantity - 1; INSERT INTO payments (...); COMMIT; Either everything succeeds or everything rolls back. Life is good. Microservices change the rules. Each service owns its own database: Order Service -> orders_db Payment Service -> payments_db Inventory Service -> inventory_db Shipping Service -> shipping_db No single transaction spans all of them. Some teams attempt: Two-Phase Commit (2PC) XA Transactions Distributed Locks In theory they provide consistency. In practice they introduce: operational complexity reduced availability tight coupling performance bottlenecks Most modern systems choose a different path: Accept eventual consistency and design for recovery.

What Is a Saga? A Saga is a sequence of local transactions. Each step: Performs some business action Commits locally Triggers the next step If a later step fails: previously completed steps execute compensating actions Think of it as a distributed rollback mechanism. Traditional Transaction BEGIN Step A Step B Step C COMMIT Failure: ROLLBACK Saga Transaction Step A ✓ Step B ✓ Step C ✗ Compensate B Compensate A Instead of undoing database state through a transaction log, we undo business actions through explicit compensation.

A Real Production Example Consider an online marketplace. Checkout workflow: Create Order Reserve Inventory Charge Payment Create Shipment Everything looks simple until a dependency fails. Scenario: Order Created ✓ Inventory Reserved ✓ Payment Failed ✗ Inventory is now locked. Customers cannot buy those products. Warehouse reports incorrect stock. This is a real production issue many teams encounter. The solution is compensation.

Defining Saga Steps in Go Let's start with a generic Saga implementation. type Step struct { Name string Execute func(context.Context) error Compensate func(context.Context) error } Each step knows: how to execute how to undo itself Now define the Saga. type Saga struct { steps []Step } Executing a Saga func (s *Saga) Execute(ctx context.Context) error { var completed []Step for _, step := range s.steps { if err := step.Execute(ctx); err != nil { s.rollback(ctx, completed) return fmt.Errorf( "saga failed at step %s: %w", step.Name, err, ) } completed = append(completed, step) } return nil } If a step fails: rollback starts immediately previously completed steps are compensated Implementing Compensation func (s *Saga) rollback( ctx context.Context, completed []Step, ) { for i := len(completed) - 1; i >= 0; i-- { step := completed[i] if err := step.Compensate(ctx); err != nil { log.Printf( "compensation failed for %s: %v", step.Name, err, ) } } } Compensation happens in reverse order. Just like a stack unwind.

Production Checkout Workflow Let's model an order process. Step 1: Create Order func createOrder( ctx context.Context, orderID string, ) error { log.Printf("order created: %s", orderID) return nil } Compensation: func cancelOrder( ctx context.Context, orderID string, ) error { log.Printf("order cancelled: %s", orderID) return nil } Step 2: Reserve Inventory func reserveInventory( ctx context.Context, productID string, ) error { log.Printf( "inventory reserved: %s", productID, ) return nil } Compensation: func releaseInventory( ctx context.Context, productID string, ) error { log.Printf("inventory released: %s", productID) return nil } Step 3: Charge Payment func chargePayment( ctx context.Context, orderID string, ) error { return errors.New( "payment provider unavailable", ) } Compensation: func refundPayment( ctx context.Context, orderID string, ) error { log.Printf("payment refunded: %s", orderID) return nil } Running the Saga saga := Saga{ steps: []Step{ { Name: "Create Order", Execute: func(ctx context.Context) error { return createOrder(ctx, "order-123") }, Compensate: func(ctx context.Context) error { return cancelOrder(ctx, "order-123") }}, { Name: "Reserve Inventory", Execute: func(ctx context.Context) error { return reserveInventory( ctx, "product-1", ) }, Compensate: func(ctx context.Context) error { return releaseInventory( ctx, "product-1", ) }}, { Name: "Charge Payment", Execute: func(ctx context.Context) error { return chargePayment( ctx, "order-123", ) }, Compensate: func(ctx context.Context) error { return refundPayment( ctx, "order-123", ) }}}} err := saga.Execute(context.Background()) Output: order created inventory reserved payment provider unavailable inventory released order cancelled Business consistency restored.

Compensation Is Not Rollback This is one of the biggest misconceptions. Many engineers assume: Compensation == Rollback It doesn't. Consider payment processing. You cannot magically undo: Bank Transfer Credit Card Charge Email Sent SMS Delivered You can only perform another business action. Examples: Charge Card ↓ Refund Card Create Shipment ↓ Cancel Shipment These are not the same thing. Compensation is business logic.

Choreography vs Orchestration Two common Saga styles exist. Choreography Services communicate through events. OrderCreated ↓ InventoryReserved ↓ PaymentProcessed ↓ ShipmentCreated Each service reacts independently. Advantages: loosely coupled scalable no central coordinator Disadvantages: difficult debugging event explosion hidden dependencies Large systems often struggle with visibility. Orchestration A dedicated coordinator controls the flow. Saga Orchestrator ↓ Inventory ↓ Payment ↓ Shipping Advantages: easier monitoring centralized workflow simpler debugging Disadvantages: additional component orchestration logic grows over time Many enterprise systems prefer orchestration because operational visibility matters.

Handling Retries Properly Distributed systems fail. Compensation can fail too. Consider: Payment Failed ↓ Release Inventory ↓ Inventory Service Down Now rollback itself has failed. Production systems usually implement: retries dead-letter queues manual recovery workflows Example: func retry( ctx context.Context, attempts int, fn func() error, ) error { for i := 0; i < attempts; i++ { if err := fn(); err == nil { return nil } time.Sleep( time.Duration(i+1) * time.Second, ) } return errors.New( "retry attempts exhausted", ) } Never assume compensation always succeeds.

Saga + Outbox Pattern This is where things become interesting. Most production systems combine: Saga + Outbox Pattern Why? Because Saga introduces events: OrderCreated InventoryReserved PaymentCompleted Those events must be delivered reliably. The Outbox Pattern guarantees: no event loss atomic persistence safe retries This combination is extremely common in modern microservices.

Idempotency Is Mandatory Compensation may execute twice. Retries may happen. Network failures may duplicate requests. Your operations must tolerate duplication. Bad: inventory -= 10 Good: if reservationAlreadyReleased { return nil } Idempotency is not optional. It is foundational to Saga reliability.

Observability Matters Track: saga started saga completed saga compensated compensation failures execution duration retry count Useful metrics: saga_execution_total saga_compensation_total saga_failure_total saga_duration_seconds If you cannot observe Saga behavior, you will eventually debug failures through database queries at 3 AM.

A Production Incident A payment provider began timing out during a Black Friday campaign. Order creation succeeded. Inventory reservations succeeded. Payment confirmations never arrived. Without compensation: 50,000 products locked Customers could not purchase inventory that physically existed. The warehouse team believed stock was depleted. After implementing Saga compensation: Payment Timeout ↓ Inventory Released ↓ Order Cancelled The system recovered automatically. No manual intervention required. This is exactly the type of failure Saga patterns are designed to handle.

Key Takeaways Distributed transactions rarely scale well in microservices. Saga patterns embrace eventual consistency rather than fighting it. Compensation is business logic, not database rollback. Retries and idempotency are mandatory. Most production systems combine Saga and Outbox patterns. Observability is critical for debugging distributed workflows. Microservices make distributed failures inevitable. Saga patterns don't eliminate those failures. They make them survivable. And in production systems, survivability is often more important than perfection. Happy Coding 🚀

Build seamlessly, securely, and flexibly with MongoDB Atlas. Try free.

#saga #Go #Microservices #distributed transactions #Observability

SAGA Pattern in Go

Comments