Saga Pattern in Go: Step-by-Step Implementation Guide for Distributed Transactions

Distributed systems break one of the most comforting guarantees in software: the database transaction. In a monolith, you wrap a sequence of operations in BEGIN / COMMIT and either everything succeeds or nothing does. In a microservices architecture, that sequence spans multiple services, each with its own database. A global transaction lock across all of them — two-phase commit — works in theory but is fragile, slow, and a single point of failure in practice.

The Saga pattern is the industry-standard answer. Instead of one atomic transaction, a saga is a sequence of local transactions, each of which publishes an event or message that triggers the next step. If any step fails, the saga runs compensating transactions in reverse to undo the work already done.

This post walks through both styles of saga — choreography and orchestration — with complete, idiomatic Go implementations.


The Running Example: An E-Commerce Order

Throughout this post we’ll model placing an order, which involves three services:

  1. Order Service — creates the order record
  2. Payment Service — charges the customer
  3. Inventory Service — reserves the stock

All three must succeed. If payment fails after the order is created, we need to cancel the order. If inventory is unavailable after payment, we need to refund the payment and cancel the order. That reversal logic is the heart of the saga.


Global Monolith DB Commit Local Order Payment Flow

Core Concepts Before the Code

Local Transactions

Each step in a saga performs a local transaction — an atomic write to a single service’s database. The saga does not hold a lock across services; it sequences independent commits.

Compensating Transactions

Every step that can succeed must have a corresponding compensating transaction that semantically undoes it. “Semantically” is the key word: compensation is a new forward transaction (a refund, a cancellation), not a rollback. The original transaction already committed.

Idempotency

Because compensations may be retried on failure, every operation — forward and compensating — must be idempotent. Running CancelOrder(orderID) twice must produce the same result as running it once.


Approach 1: Choreography

In the choreography style, there is no central coordinator. Each service listens for events, does its work, and emits new events. Services react to each other like dancers following music — no one is giving instructions.

Order Event Processing Flow

The Event Types

// saga/events.go
package saga

type EventType string

const (
    EventOrderCreated      EventType = "order.created"
    EventPaymentProcessed  EventType = "payment.processed"
    EventInventoryReserved EventType = "inventory.reserved"
    EventOrderCompleted    EventType = "order.completed"

    // Failure events
    EventPaymentFailed     EventType = "payment.failed"
    EventInventoryFailed   EventType = "inventory.failed"

    // Compensation events
    EventOrderCancelled    EventType = "order.cancelled"
    EventPaymentRefunded   EventType = "payment.refunded"
)

type Event struct {
    ID        string
    Type      EventType
    SagaID    string
    Payload   map[string]any
    Timestamp time.Time
}

The Event Bus

A simple in-process bus is enough to illustrate the pattern. In production this would be Kafka, RabbitMQ, or NATS.

// saga/bus.go
package saga

import "sync"

type Handler func(Event)

type EventBus struct {
    mu       sync.RWMutex
    handlers map[EventType][]Handler
}

func NewEventBus() *EventBus {
    return &EventBus{handlers: make(map[EventType][]Handler)}
}

func (b *EventBus) Subscribe(eventType EventType, h Handler) {
    b.mu.Lock()
    defer b.mu.Unlock()
    b.handlers[eventType] = append(b.handlers[eventType], h)
}

func (b *EventBus) Publish(e Event) {
    b.mu.RLock()
    handlers := b.handlers[e.Type]
    b.mu.RUnlock()

    for _, h := range handlers {
        go h(e) // fire-and-forget; production code would use durable queues
    }
}

The Services

Each service handles the events relevant to it and emits the next event in the chain.

// services/order_service.go
package services

type OrderService struct {
    bus    *saga.EventBus
    orders map[string]string // orderID → status
    mu     sync.Mutex
}

func NewOrderService(bus *saga.EventBus) *OrderService {
    svc := &OrderService{bus: bus, orders: make(map[string]string)}

    // Listen for compensation events
    bus.Subscribe(saga.EventPaymentFailed, svc.onPaymentFailed)
    bus.Subscribe(saga.EventInventoryFailed, svc.onInventoryFailed)

    return svc
}

func (s *OrderService) CreateOrder(sagaID, orderID, customerID string, amount float64) {
    s.mu.Lock()
    s.orders[orderID] = "PENDING"
    s.mu.Unlock()

    log.Printf("[Order] Created order %s for customer %s", orderID, customerID)

    s.bus.Publish(saga.Event{
        ID:     uuid.New().String(),
        Type:   saga.EventOrderCreated,
        SagaID: sagaID,
        Payload: map[string]any{
            "order_id":    orderID,
            "customer_id": customerID,
            "amount":      amount,
        },
    })
}

// Compensating transaction: cancel the order
func (s *OrderService) onPaymentFailed(e saga.Event) {
    s.cancelOrder(e.Payload["order_id"].(string), e.SagaID)
}

func (s *OrderService) onInventoryFailed(e saga.Event) {
    s.cancelOrder(e.Payload["order_id"].(string), e.SagaID)
}

func (s *OrderService) cancelOrder(orderID, sagaID string) {
    s.mu.Lock()
    if s.orders[orderID] == "CANCELLED" { // idempotency check
        s.mu.Unlock()
        return
    }
    s.orders[orderID] = "CANCELLED"
    s.mu.Unlock()

    log.Printf("[Order] Cancelled order %s", orderID)
    s.bus.Publish(saga.Event{
        ID:      uuid.New().String(),
        Type:    saga.EventOrderCancelled,
        SagaID:  sagaID,
        Payload: map[string]any{"order_id": orderID},
    })
}
// services/payment_service.go
package services

type PaymentService struct {
    bus      *saga.EventBus
    payments map[string]string
    mu       sync.Mutex
}

func NewPaymentService(bus *saga.EventBus) *PaymentService {
    svc := &PaymentService{bus: bus, payments: make(map[string]string)}
    bus.Subscribe(saga.EventOrderCreated, svc.onOrderCreated)
    bus.Subscribe(saga.EventInventoryFailed, svc.onInventoryFailed)
    return svc
}

func (s *PaymentService) onOrderCreated(e saga.Event) {
    orderID := e.Payload["order_id"].(string)
    amount := e.Payload["amount"].(float64)

    // Simulate a payment failure for demonstration
    if amount > 10000 {
        log.Printf("[Payment] Payment failed for order %s (amount too high)", orderID)
        s.bus.Publish(saga.Event{
            ID:      uuid.New().String(),
            Type:    saga.EventPaymentFailed,
            SagaID:  e.SagaID,
            Payload: map[string]any{"order_id": orderID, "reason": "insufficient funds"},
        })
        return
    }

    s.mu.Lock()
    s.payments[orderID] = "CHARGED"
    s.mu.Unlock()

    log.Printf("[Payment] Charged %.2f for order %s", amount, orderID)
    s.bus.Publish(saga.Event{
        ID:      uuid.New().String(),
        Type:    saga.EventPaymentProcessed,
        SagaID:  e.SagaID,
        Payload: map[string]any{"order_id": orderID, "amount": amount},
    })
}

// Compensating transaction: refund
func (s *PaymentService) onInventoryFailed(e saga.Event) {
    orderID := e.Payload["order_id"].(string)

    s.mu.Lock()
    if s.payments[orderID] != "CHARGED" {
        s.mu.Unlock()
        return
    }
    s.payments[orderID] = "REFUNDED"
    s.mu.Unlock()

    log.Printf("[Payment] Refunded payment for order %s", orderID)
    s.bus.Publish(saga.Event{
        ID:      uuid.New().String(),
        Type:    saga.EventPaymentRefunded,
        SagaID:  e.SagaID,
        Payload: map[string]any{"order_id": orderID},
    })
}
// services/inventory_service.go
package services

type InventoryService struct {
    bus   *saga.EventBus
    stock map[string]int
    mu    sync.Mutex
}

func NewInventoryService(bus *saga.EventBus, stock map[string]int) *InventoryService {
    svc := &InventoryService{bus: bus, stock: stock}
    bus.Subscribe(saga.EventPaymentProcessed, svc.onPaymentProcessed)
    return svc
}

func (s *InventoryService) onPaymentProcessed(e saga.Event) {
    orderID := e.Payload["order_id"].(string)
    productID := "PROD-001" // simplified

    s.mu.Lock()
    defer s.mu.Unlock()

    if s.stock[productID] <= 0 {
        log.Printf("[Inventory] Out of stock for order %s", orderID)
        s.bus.Publish(saga.Event{
            ID:      uuid.New().String(),
            Type:    saga.EventInventoryFailed,
            SagaID:  e.SagaID,
            Payload: map[string]any{"order_id": orderID, "reason": "out of stock"},
        })
        return
    }

    s.stock[productID]--
    log.Printf("[Inventory] Reserved stock for order %s", orderID)
    s.bus.Publish(saga.Event{
        ID:     uuid.New().String(),
        Type:   saga.EventInventoryReserved,
        SagaID: e.SagaID,
        Payload: map[string]any{"order_id": orderID},
    })
}

Wiring It Up

func main() {
    bus := saga.NewEventBus()

    orderSvc := services.NewOrderService(bus)
    services.NewPaymentService(bus)
    services.NewInventoryService(bus, map[string]int{"PROD-001": 5})

    // Happy path
    sagaID := uuid.New().String()
    orderSvc.CreateOrder(sagaID, "ORD-001", "CUST-42", 99.99)

    // Will trigger compensation (amount > 10000)
    sagaID2 := uuid.New().String()
    orderSvc.CreateOrder(sagaID2, "ORD-002", "CUST-43", 99999.00)

    time.Sleep(500 * time.Millisecond) // wait for async handlers
}

Choreography: Trade-offs

Pros: Simple to add new services (just subscribe to existing events). No single point of failure. Easy to scale individual services.

Cons: The overall flow is implicit — it’s spread across every service. Debugging a failed saga means tracing events across multiple logs. Cyclic dependencies between services can sneak in.


Approach 2: Orchestration

In the orchestration style, a dedicated Saga Orchestrator tells each participant what to do and decides what happens next based on the result. The flow is explicit and centralized.

Order Payment Saga

The Step Interface

// saga/orchestrator.go
package saga

import "context"

type StepResult struct {
    Success bool
    Data    map[string]any
    Err     error
}

type Step interface {
    Name() string
    Execute(ctx context.Context, data map[string]any) StepResult
    Compensate(ctx context.Context, data map[string]any) StepResult
}

The Orchestrator

type SagaState string

const (
    StateRunning     SagaState = "RUNNING"
    StateCompleted   SagaState = "COMPLETED"
    StateCompensating SagaState = "COMPENSATING"
    StateFailed      SagaState = "FAILED"
)

type Orchestrator struct {
    id             string
    steps          []Step
    executedSteps  []int       // indices of successfully executed steps
    data           map[string]any
    state          SagaState
    mu             sync.Mutex
}

func NewOrchestrator(id string, steps []Step, initialData map[string]any) *Orchestrator {
    return &Orchestrator{
        id:    id,
        steps: steps,
        data:  initialData,
        state: StateRunning,
    }
}

func (o *Orchestrator) Execute(ctx context.Context) error {
    log.Printf("[Saga %s] Starting with %d steps", o.id, len(o.steps))

    for i, step := range o.steps {
        log.Printf("[Saga %s] Executing step %d: %s", o.id, i+1, step.Name())

        result := step.Execute(ctx, o.data)
        if result.Err != nil {
            log.Printf("[Saga %s] Step %s failed: %v", o.id, step.Name(), result.Err)
            return o.compensate(ctx)
        }
        if !result.Success {
            log.Printf("[Saga %s] Step %s returned failure", o.id, step.Name())
            return o.compensate(ctx)
        }

        // Merge step output into shared data for subsequent steps
        for k, v := range result.Data {
            o.data[k] = v
        }

        o.mu.Lock()
        o.executedSteps = append(o.executedSteps, i)
        o.mu.Unlock()
    }

    o.mu.Lock()
    o.state = StateCompleted
    o.mu.Unlock()

    log.Printf("[Saga %s] Completed successfully", o.id)
    return nil
}

func (o *Orchestrator) compensate(ctx context.Context) error {
    o.mu.Lock()
    o.state = StateCompensating
    executed := make([]int, len(o.executedSteps))
    copy(executed, o.executedSteps)
    o.mu.Unlock()

    log.Printf("[Saga %s] Starting compensation for %d executed steps", o.id, len(executed))

    // Compensate in reverse order
    var compensationErrors []error
    for i := len(executed) - 1; i >= 0; i-- {
        stepIdx := executed[i]
        step := o.steps[stepIdx]

        log.Printf("[Saga %s] Compensating step: %s", o.id, step.Name())
        result := step.Compensate(ctx, o.data)
        if result.Err != nil {
            log.Printf("[Saga %s] Compensation failed for %s: %v", o.id, step.Name(), result.Err)
            compensationErrors = append(compensationErrors, result.Err)
            // Continue compensating remaining steps — never stop mid-compensation
        }
    }

    o.mu.Lock()
    o.state = StateFailed
    o.mu.Unlock()

    if len(compensationErrors) > 0 {
        return fmt.Errorf("saga %s failed with %d compensation errors", o.id, len(compensationErrors))
    }
    return fmt.Errorf("saga %s failed and was compensated", o.id)
}

Implementing the Steps

// steps/create_order_step.go
package steps

type CreateOrderStep struct {
    repo OrderRepository
}

func (s *CreateOrderStep) Name() string { return "CreateOrder" }

func (s *CreateOrderStep) Execute(ctx context.Context, data map[string]any) saga.StepResult {
    orderID := uuid.New().String()
    err := s.repo.Create(ctx, orderID, data["customer_id"].(string))
    if err != nil {
        return saga.StepResult{Err: err}
    }
    log.Printf("[CreateOrder] Created order %s", orderID)
    return saga.StepResult{
        Success: true,
        Data:    map[string]any{"order_id": orderID},
    }
}

func (s *CreateOrderStep) Compensate(ctx context.Context, data map[string]any) saga.StepResult {
    orderID, ok := data["order_id"].(string)
    if !ok {
        return saga.StepResult{Success: true} // nothing to undo
    }
    err := s.repo.Cancel(ctx, orderID)
    if err != nil {
        return saga.StepResult{Err: fmt.Errorf("cancel order: %w", err)}
    }
    log.Printf("[CreateOrder] Cancelled order %s", orderID)
    return saga.StepResult{Success: true}
}
// steps/process_payment_step.go
package steps

type ProcessPaymentStep struct {
    gateway PaymentGateway
}

func (s *ProcessPaymentStep) Name() string { return "ProcessPayment" }

func (s *ProcessPaymentStep) Execute(ctx context.Context, data map[string]any) saga.StepResult {
    amount := data["amount"].(float64)
    customerID := data["customer_id"].(string)

    txID, err := s.gateway.Charge(ctx, customerID, amount)
    if err != nil {
        return saga.StepResult{
            Success: false,
            Err:     fmt.Errorf("payment gateway: %w", err),
        }
    }
    log.Printf("[ProcessPayment] Charged %.2f, transaction %s", amount, txID)
    return saga.StepResult{
        Success: true,
        Data:    map[string]any{"transaction_id": txID},
    }
}

func (s *ProcessPaymentStep) Compensate(ctx context.Context, data map[string]any) saga.StepResult {
    txID, ok := data["transaction_id"].(string)
    if !ok {
        return saga.StepResult{Success: true}
    }
    err := s.gateway.Refund(ctx, txID)
    if err != nil {
        return saga.StepResult{Err: fmt.Errorf("refund transaction %s: %w", txID, err)}
    }
    log.Printf("[ProcessPayment] Refunded transaction %s", txID)
    return saga.StepResult{Success: true}
}
// steps/reserve_inventory_step.go
package steps

type ReserveInventoryStep struct {
    inventory InventoryService
}

func (s *ReserveInventoryStep) Name() string { return "ReserveInventory" }

func (s *ReserveInventoryStep) Execute(ctx context.Context, data map[string]any) saga.StepResult {
    orderID := data["order_id"].(string)
    productID := data["product_id"].(string)

    reservationID, err := s.inventory.Reserve(ctx, productID, orderID)
    if err != nil {
        return saga.StepResult{
            Success: false,
            Err:     fmt.Errorf("reserve inventory: %w", err),
        }
    }
    log.Printf("[ReserveInventory] Reserved %s, reservation %s", productID, reservationID)
    return saga.StepResult{
        Success: true,
        Data:    map[string]any{"reservation_id": reservationID},
    }
}

func (s *ReserveInventoryStep) Compensate(ctx context.Context, data map[string]any) saga.StepResult {
    reservationID, ok := data["reservation_id"].(string)
    if !ok {
        return saga.StepResult{Success: true}
    }
    err := s.inventory.Release(ctx, reservationID)
    if err != nil {
        return saga.StepResult{Err: fmt.Errorf("release reservation %s: %w", reservationID, err)}
    }
    log.Printf("[ReserveInventory] Released reservation %s", reservationID)
    return saga.StepResult{Success: true}
}

Running the Orchestrated Saga

func main() {
    ctx := context.Background()

    orderRepo := NewOrderRepository()
    paymentGateway := NewPaymentGateway()
    inventoryService := NewInventoryService()

    steps := []saga.Step{
        &steps.CreateOrderStep{Repo: orderRepo},
        &steps.ProcessPaymentStep{Gateway: paymentGateway},
        &steps.ReserveInventoryStep{Inventory: inventoryService},
    }

    initialData := map[string]any{
        "customer_id": "CUST-42",
        "product_id":  "PROD-001",
        "amount":      149.99,
    }

    orchestrator := saga.NewOrchestrator(uuid.New().String(), steps, initialData)
    if err := orchestrator.Execute(ctx); err != nil {
        log.Printf("Saga failed: %v", err)
        return
    }

    log.Println("Order placed successfully")
}

Orchestration: Trade-offs

Pros: The entire flow is visible in one place. Failure paths are explicit. Debugging is straightforward — follow the orchestrator’s log. Easier to add retry logic, timeouts, and circuit breakers at the orchestration layer.

Cons: The orchestrator is a central component that must be highly available. It can become a bottleneck if not designed carefully. Services become more tightly coupled to the orchestrator’s interface.


Durability: Surviving Crashes

Both examples above run in memory. A real saga must survive process crashes. The standard technique is the transactional outbox pattern:

  1. When a service performs its local transaction, it also writes the outgoing event (or command) to an outbox table in the same database transaction.
  2. A separate relay process (or CDC connector like Debezium) reads uncommitted outbox rows and publishes them to the message broker.
  3. Once published and acknowledged, the relay marks the row as sent.

This guarantees that a step’s side-effect and its resulting event are atomic with respect to the service’s own database — even if the process crashes between the commit and the publish.

-- outbox table schema
CREATE TABLE saga_outbox (
    id          UUID PRIMARY KEY,
    saga_id     TEXT NOT NULL,
    event_type  TEXT NOT NULL,
    payload     JSONB NOT NULL,
    created_at  TIMESTAMPTZ DEFAULT NOW(),
    sent_at     TIMESTAMPTZ
);
// Inside a service's local transaction
tx.Exec(`
    INSERT INTO saga_outbox (id, saga_id, event_type, payload)
    VALUES ($1, $2, $3, $4)
`, eventID, sagaID, eventType, payloadJSON)
// Commit both the business write and the outbox row atomically
tx.Commit()

Idempotency Keys

Because messages can be redelivered (network hiccup, consumer restart), every handler must be idempotent. A robust approach is to persist a record of which saga events have already been processed:

func (s *PaymentService) handleOrderCreated(ctx context.Context, e saga.Event) error {
    // Check if we already processed this event
    exists, err := s.db.EventProcessed(ctx, e.ID)
    if err != nil {
        return err
    }
    if exists {
        log.Printf("[Payment] Event %s already processed, skipping", e.ID)
        return nil
    }

    // Do the work and record the event in one transaction
    return s.db.WithTx(ctx, func(tx *sql.Tx) error {
        if err := s.chargeCustomer(ctx, tx, e); err != nil {
            return err
        }
        return tx.Exec(`INSERT INTO processed_events (id) VALUES ($1)`, e.ID)
    })
}

Choreography vs. Orchestration: Which to Choose

Factor Choreography Orchestration
Flow visibility Implicit, spread across services Explicit, in one place
Coupling Loose (event contracts) Tighter (step interfaces)
Debugging Hard — trace events across services Easier — one log to follow
Adding new steps Easy — subscribe to existing events Requires orchestrator change
Central failure point None Orchestrator must be HA
Best for Simple flows, many independent consumers Complex flows, strict ordering

A common real-world choice: start with orchestration for business-critical flows (orders, payments) where visibility matters most, and use choreography for ancillary reactions (sending confirmation emails, updating analytics) where loose coupling is more valuable.


Key Takeaways

The Saga pattern trades the simplicity of ACID transactions for the scalability and resilience of distributed systems. The cost is complexity: you must design and implement compensating transactions, ensure idempotency, and make your saga state durable.

The rules to carry with you:

  1. Every forward step needs a compensating step. If you can’t undo an action, it probably shouldn’t be a saga step.
  2. Always compensate in reverse order. The last executed step is compensated first.
  3. Never stop mid-compensation. If a compensation fails, log it, alert on it, and continue compensating the remaining steps. Stopping halfway leaves the system in a worse state.
  4. Make every handler idempotent. Messages will be redelivered. Design for it from day one.
  5. Use the transactional outbox. It’s the only reliable way to guarantee that a local commit and its downstream event are atomic.
  6. Instrument saga state. Track in-progress, completed, and failed sagas. A saga stuck in COMPENSATING for more than a few minutes is worth an alert.