Saga Pattern in Go: Step-by-Step Implementation Guide for Distributed Transactions
Distributed systems break one of the most comforting guarantees in software: the database transaction. In a monolith, you wrap a sequence of operations in BEGIN / COMMIT and either everything succeeds or nothing does. In a microservices architecture, that sequence spans multiple services, each with its own database. A global transaction lock across all of them — two-phase commit — works in theory but is fragile, slow, and a single point of failure in practice.
The Saga pattern is the industry-standard answer. Instead of one atomic transaction, a saga is a sequence of local transactions, each of which publishes an event or message that triggers the next step. If any step fails, the saga runs compensating transactions in reverse to undo the work already done.
This post walks through both styles of saga — choreography and orchestration — with complete, idiomatic Go implementations.
The Running Example: An E-Commerce Order
Throughout this post we’ll model placing an order, which involves three services:
- Order Service — creates the order record
- Payment Service — charges the customer
- Inventory Service — reserves the stock
All three must succeed. If payment fails after the order is created, we need to cancel the order. If inventory is unavailable after payment, we need to refund the payment and cancel the order. That reversal logic is the heart of the saga.
Core Concepts Before the Code
Local Transactions
Each step in a saga performs a local transaction — an atomic write to a single service’s database. The saga does not hold a lock across services; it sequences independent commits.
Compensating Transactions
Every step that can succeed must have a corresponding compensating transaction that semantically undoes it. “Semantically” is the key word: compensation is a new forward transaction (a refund, a cancellation), not a rollback. The original transaction already committed.
Idempotency
Because compensations may be retried on failure, every operation — forward and compensating — must be idempotent. Running CancelOrder(orderID) twice must produce the same result as running it once.
Approach 1: Choreography
In the choreography style, there is no central coordinator. Each service listens for events, does its work, and emits new events. Services react to each other like dancers following music — no one is giving instructions.
The Event Types
// saga/events.go
package saga
type EventType string
const (
EventOrderCreated EventType = "order.created"
EventPaymentProcessed EventType = "payment.processed"
EventInventoryReserved EventType = "inventory.reserved"
EventOrderCompleted EventType = "order.completed"
// Failure events
EventPaymentFailed EventType = "payment.failed"
EventInventoryFailed EventType = "inventory.failed"
// Compensation events
EventOrderCancelled EventType = "order.cancelled"
EventPaymentRefunded EventType = "payment.refunded"
)
type Event struct {
ID string
Type EventType
SagaID string
Payload map[string]any
Timestamp time.Time
}
The Event Bus
A simple in-process bus is enough to illustrate the pattern. In production this would be Kafka, RabbitMQ, or NATS.
// saga/bus.go
package saga
import "sync"
type Handler func(Event)
type EventBus struct {
mu sync.RWMutex
handlers map[EventType][]Handler
}
func NewEventBus() *EventBus {
return &EventBus{handlers: make(map[EventType][]Handler)}
}
func (b *EventBus) Subscribe(eventType EventType, h Handler) {
b.mu.Lock()
defer b.mu.Unlock()
b.handlers[eventType] = append(b.handlers[eventType], h)
}
func (b *EventBus) Publish(e Event) {
b.mu.RLock()
handlers := b.handlers[e.Type]
b.mu.RUnlock()
for _, h := range handlers {
go h(e) // fire-and-forget; production code would use durable queues
}
}
The Services
Each service handles the events relevant to it and emits the next event in the chain.
// services/order_service.go
package services
type OrderService struct {
bus *saga.EventBus
orders map[string]string // orderID → status
mu sync.Mutex
}
func NewOrderService(bus *saga.EventBus) *OrderService {
svc := &OrderService{bus: bus, orders: make(map[string]string)}
// Listen for compensation events
bus.Subscribe(saga.EventPaymentFailed, svc.onPaymentFailed)
bus.Subscribe(saga.EventInventoryFailed, svc.onInventoryFailed)
return svc
}
func (s *OrderService) CreateOrder(sagaID, orderID, customerID string, amount float64) {
s.mu.Lock()
s.orders[orderID] = "PENDING"
s.mu.Unlock()
log.Printf("[Order] Created order %s for customer %s", orderID, customerID)
s.bus.Publish(saga.Event{
ID: uuid.New().String(),
Type: saga.EventOrderCreated,
SagaID: sagaID,
Payload: map[string]any{
"order_id": orderID,
"customer_id": customerID,
"amount": amount,
},
})
}
// Compensating transaction: cancel the order
func (s *OrderService) onPaymentFailed(e saga.Event) {
s.cancelOrder(e.Payload["order_id"].(string), e.SagaID)
}
func (s *OrderService) onInventoryFailed(e saga.Event) {
s.cancelOrder(e.Payload["order_id"].(string), e.SagaID)
}
func (s *OrderService) cancelOrder(orderID, sagaID string) {
s.mu.Lock()
if s.orders[orderID] == "CANCELLED" { // idempotency check
s.mu.Unlock()
return
}
s.orders[orderID] = "CANCELLED"
s.mu.Unlock()
log.Printf("[Order] Cancelled order %s", orderID)
s.bus.Publish(saga.Event{
ID: uuid.New().String(),
Type: saga.EventOrderCancelled,
SagaID: sagaID,
Payload: map[string]any{"order_id": orderID},
})
}
// services/payment_service.go
package services
type PaymentService struct {
bus *saga.EventBus
payments map[string]string
mu sync.Mutex
}
func NewPaymentService(bus *saga.EventBus) *PaymentService {
svc := &PaymentService{bus: bus, payments: make(map[string]string)}
bus.Subscribe(saga.EventOrderCreated, svc.onOrderCreated)
bus.Subscribe(saga.EventInventoryFailed, svc.onInventoryFailed)
return svc
}
func (s *PaymentService) onOrderCreated(e saga.Event) {
orderID := e.Payload["order_id"].(string)
amount := e.Payload["amount"].(float64)
// Simulate a payment failure for demonstration
if amount > 10000 {
log.Printf("[Payment] Payment failed for order %s (amount too high)", orderID)
s.bus.Publish(saga.Event{
ID: uuid.New().String(),
Type: saga.EventPaymentFailed,
SagaID: e.SagaID,
Payload: map[string]any{"order_id": orderID, "reason": "insufficient funds"},
})
return
}
s.mu.Lock()
s.payments[orderID] = "CHARGED"
s.mu.Unlock()
log.Printf("[Payment] Charged %.2f for order %s", amount, orderID)
s.bus.Publish(saga.Event{
ID: uuid.New().String(),
Type: saga.EventPaymentProcessed,
SagaID: e.SagaID,
Payload: map[string]any{"order_id": orderID, "amount": amount},
})
}
// Compensating transaction: refund
func (s *PaymentService) onInventoryFailed(e saga.Event) {
orderID := e.Payload["order_id"].(string)
s.mu.Lock()
if s.payments[orderID] != "CHARGED" {
s.mu.Unlock()
return
}
s.payments[orderID] = "REFUNDED"
s.mu.Unlock()
log.Printf("[Payment] Refunded payment for order %s", orderID)
s.bus.Publish(saga.Event{
ID: uuid.New().String(),
Type: saga.EventPaymentRefunded,
SagaID: e.SagaID,
Payload: map[string]any{"order_id": orderID},
})
}
// services/inventory_service.go
package services
type InventoryService struct {
bus *saga.EventBus
stock map[string]int
mu sync.Mutex
}
func NewInventoryService(bus *saga.EventBus, stock map[string]int) *InventoryService {
svc := &InventoryService{bus: bus, stock: stock}
bus.Subscribe(saga.EventPaymentProcessed, svc.onPaymentProcessed)
return svc
}
func (s *InventoryService) onPaymentProcessed(e saga.Event) {
orderID := e.Payload["order_id"].(string)
productID := "PROD-001" // simplified
s.mu.Lock()
defer s.mu.Unlock()
if s.stock[productID] <= 0 {
log.Printf("[Inventory] Out of stock for order %s", orderID)
s.bus.Publish(saga.Event{
ID: uuid.New().String(),
Type: saga.EventInventoryFailed,
SagaID: e.SagaID,
Payload: map[string]any{"order_id": orderID, "reason": "out of stock"},
})
return
}
s.stock[productID]--
log.Printf("[Inventory] Reserved stock for order %s", orderID)
s.bus.Publish(saga.Event{
ID: uuid.New().String(),
Type: saga.EventInventoryReserved,
SagaID: e.SagaID,
Payload: map[string]any{"order_id": orderID},
})
}
Wiring It Up
func main() {
bus := saga.NewEventBus()
orderSvc := services.NewOrderService(bus)
services.NewPaymentService(bus)
services.NewInventoryService(bus, map[string]int{"PROD-001": 5})
// Happy path
sagaID := uuid.New().String()
orderSvc.CreateOrder(sagaID, "ORD-001", "CUST-42", 99.99)
// Will trigger compensation (amount > 10000)
sagaID2 := uuid.New().String()
orderSvc.CreateOrder(sagaID2, "ORD-002", "CUST-43", 99999.00)
time.Sleep(500 * time.Millisecond) // wait for async handlers
}
Choreography: Trade-offs
Pros: Simple to add new services (just subscribe to existing events). No single point of failure. Easy to scale individual services.
Cons: The overall flow is implicit — it’s spread across every service. Debugging a failed saga means tracing events across multiple logs. Cyclic dependencies between services can sneak in.
Approach 2: Orchestration
In the orchestration style, a dedicated Saga Orchestrator tells each participant what to do and decides what happens next based on the result. The flow is explicit and centralized.
The Step Interface
// saga/orchestrator.go
package saga
import "context"
type StepResult struct {
Success bool
Data map[string]any
Err error
}
type Step interface {
Name() string
Execute(ctx context.Context, data map[string]any) StepResult
Compensate(ctx context.Context, data map[string]any) StepResult
}
The Orchestrator
type SagaState string
const (
StateRunning SagaState = "RUNNING"
StateCompleted SagaState = "COMPLETED"
StateCompensating SagaState = "COMPENSATING"
StateFailed SagaState = "FAILED"
)
type Orchestrator struct {
id string
steps []Step
executedSteps []int // indices of successfully executed steps
data map[string]any
state SagaState
mu sync.Mutex
}
func NewOrchestrator(id string, steps []Step, initialData map[string]any) *Orchestrator {
return &Orchestrator{
id: id,
steps: steps,
data: initialData,
state: StateRunning,
}
}
func (o *Orchestrator) Execute(ctx context.Context) error {
log.Printf("[Saga %s] Starting with %d steps", o.id, len(o.steps))
for i, step := range o.steps {
log.Printf("[Saga %s] Executing step %d: %s", o.id, i+1, step.Name())
result := step.Execute(ctx, o.data)
if result.Err != nil {
log.Printf("[Saga %s] Step %s failed: %v", o.id, step.Name(), result.Err)
return o.compensate(ctx)
}
if !result.Success {
log.Printf("[Saga %s] Step %s returned failure", o.id, step.Name())
return o.compensate(ctx)
}
// Merge step output into shared data for subsequent steps
for k, v := range result.Data {
o.data[k] = v
}
o.mu.Lock()
o.executedSteps = append(o.executedSteps, i)
o.mu.Unlock()
}
o.mu.Lock()
o.state = StateCompleted
o.mu.Unlock()
log.Printf("[Saga %s] Completed successfully", o.id)
return nil
}
func (o *Orchestrator) compensate(ctx context.Context) error {
o.mu.Lock()
o.state = StateCompensating
executed := make([]int, len(o.executedSteps))
copy(executed, o.executedSteps)
o.mu.Unlock()
log.Printf("[Saga %s] Starting compensation for %d executed steps", o.id, len(executed))
// Compensate in reverse order
var compensationErrors []error
for i := len(executed) - 1; i >= 0; i-- {
stepIdx := executed[i]
step := o.steps[stepIdx]
log.Printf("[Saga %s] Compensating step: %s", o.id, step.Name())
result := step.Compensate(ctx, o.data)
if result.Err != nil {
log.Printf("[Saga %s] Compensation failed for %s: %v", o.id, step.Name(), result.Err)
compensationErrors = append(compensationErrors, result.Err)
// Continue compensating remaining steps — never stop mid-compensation
}
}
o.mu.Lock()
o.state = StateFailed
o.mu.Unlock()
if len(compensationErrors) > 0 {
return fmt.Errorf("saga %s failed with %d compensation errors", o.id, len(compensationErrors))
}
return fmt.Errorf("saga %s failed and was compensated", o.id)
}
Implementing the Steps
// steps/create_order_step.go
package steps
type CreateOrderStep struct {
repo OrderRepository
}
func (s *CreateOrderStep) Name() string { return "CreateOrder" }
func (s *CreateOrderStep) Execute(ctx context.Context, data map[string]any) saga.StepResult {
orderID := uuid.New().String()
err := s.repo.Create(ctx, orderID, data["customer_id"].(string))
if err != nil {
return saga.StepResult{Err: err}
}
log.Printf("[CreateOrder] Created order %s", orderID)
return saga.StepResult{
Success: true,
Data: map[string]any{"order_id": orderID},
}
}
func (s *CreateOrderStep) Compensate(ctx context.Context, data map[string]any) saga.StepResult {
orderID, ok := data["order_id"].(string)
if !ok {
return saga.StepResult{Success: true} // nothing to undo
}
err := s.repo.Cancel(ctx, orderID)
if err != nil {
return saga.StepResult{Err: fmt.Errorf("cancel order: %w", err)}
}
log.Printf("[CreateOrder] Cancelled order %s", orderID)
return saga.StepResult{Success: true}
}
// steps/process_payment_step.go
package steps
type ProcessPaymentStep struct {
gateway PaymentGateway
}
func (s *ProcessPaymentStep) Name() string { return "ProcessPayment" }
func (s *ProcessPaymentStep) Execute(ctx context.Context, data map[string]any) saga.StepResult {
amount := data["amount"].(float64)
customerID := data["customer_id"].(string)
txID, err := s.gateway.Charge(ctx, customerID, amount)
if err != nil {
return saga.StepResult{
Success: false,
Err: fmt.Errorf("payment gateway: %w", err),
}
}
log.Printf("[ProcessPayment] Charged %.2f, transaction %s", amount, txID)
return saga.StepResult{
Success: true,
Data: map[string]any{"transaction_id": txID},
}
}
func (s *ProcessPaymentStep) Compensate(ctx context.Context, data map[string]any) saga.StepResult {
txID, ok := data["transaction_id"].(string)
if !ok {
return saga.StepResult{Success: true}
}
err := s.gateway.Refund(ctx, txID)
if err != nil {
return saga.StepResult{Err: fmt.Errorf("refund transaction %s: %w", txID, err)}
}
log.Printf("[ProcessPayment] Refunded transaction %s", txID)
return saga.StepResult{Success: true}
}
// steps/reserve_inventory_step.go
package steps
type ReserveInventoryStep struct {
inventory InventoryService
}
func (s *ReserveInventoryStep) Name() string { return "ReserveInventory" }
func (s *ReserveInventoryStep) Execute(ctx context.Context, data map[string]any) saga.StepResult {
orderID := data["order_id"].(string)
productID := data["product_id"].(string)
reservationID, err := s.inventory.Reserve(ctx, productID, orderID)
if err != nil {
return saga.StepResult{
Success: false,
Err: fmt.Errorf("reserve inventory: %w", err),
}
}
log.Printf("[ReserveInventory] Reserved %s, reservation %s", productID, reservationID)
return saga.StepResult{
Success: true,
Data: map[string]any{"reservation_id": reservationID},
}
}
func (s *ReserveInventoryStep) Compensate(ctx context.Context, data map[string]any) saga.StepResult {
reservationID, ok := data["reservation_id"].(string)
if !ok {
return saga.StepResult{Success: true}
}
err := s.inventory.Release(ctx, reservationID)
if err != nil {
return saga.StepResult{Err: fmt.Errorf("release reservation %s: %w", reservationID, err)}
}
log.Printf("[ReserveInventory] Released reservation %s", reservationID)
return saga.StepResult{Success: true}
}
Running the Orchestrated Saga
func main() {
ctx := context.Background()
orderRepo := NewOrderRepository()
paymentGateway := NewPaymentGateway()
inventoryService := NewInventoryService()
steps := []saga.Step{
&steps.CreateOrderStep{Repo: orderRepo},
&steps.ProcessPaymentStep{Gateway: paymentGateway},
&steps.ReserveInventoryStep{Inventory: inventoryService},
}
initialData := map[string]any{
"customer_id": "CUST-42",
"product_id": "PROD-001",
"amount": 149.99,
}
orchestrator := saga.NewOrchestrator(uuid.New().String(), steps, initialData)
if err := orchestrator.Execute(ctx); err != nil {
log.Printf("Saga failed: %v", err)
return
}
log.Println("Order placed successfully")
}
Orchestration: Trade-offs
Pros: The entire flow is visible in one place. Failure paths are explicit. Debugging is straightforward — follow the orchestrator’s log. Easier to add retry logic, timeouts, and circuit breakers at the orchestration layer.
Cons: The orchestrator is a central component that must be highly available. It can become a bottleneck if not designed carefully. Services become more tightly coupled to the orchestrator’s interface.
Durability: Surviving Crashes
Both examples above run in memory. A real saga must survive process crashes. The standard technique is the transactional outbox pattern:
- When a service performs its local transaction, it also writes the outgoing event (or command) to an
outboxtable in the same database transaction. - A separate relay process (or CDC connector like Debezium) reads uncommitted outbox rows and publishes them to the message broker.
- Once published and acknowledged, the relay marks the row as sent.
This guarantees that a step’s side-effect and its resulting event are atomic with respect to the service’s own database — even if the process crashes between the commit and the publish.
-- outbox table schema
CREATE TABLE saga_outbox (
id UUID PRIMARY KEY,
saga_id TEXT NOT NULL,
event_type TEXT NOT NULL,
payload JSONB NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW(),
sent_at TIMESTAMPTZ
);
// Inside a service's local transaction
tx.Exec(`
INSERT INTO saga_outbox (id, saga_id, event_type, payload)
VALUES ($1, $2, $3, $4)
`, eventID, sagaID, eventType, payloadJSON)
// Commit both the business write and the outbox row atomically
tx.Commit()
Idempotency Keys
Because messages can be redelivered (network hiccup, consumer restart), every handler must be idempotent. A robust approach is to persist a record of which saga events have already been processed:
func (s *PaymentService) handleOrderCreated(ctx context.Context, e saga.Event) error {
// Check if we already processed this event
exists, err := s.db.EventProcessed(ctx, e.ID)
if err != nil {
return err
}
if exists {
log.Printf("[Payment] Event %s already processed, skipping", e.ID)
return nil
}
// Do the work and record the event in one transaction
return s.db.WithTx(ctx, func(tx *sql.Tx) error {
if err := s.chargeCustomer(ctx, tx, e); err != nil {
return err
}
return tx.Exec(`INSERT INTO processed_events (id) VALUES ($1)`, e.ID)
})
}
Choreography vs. Orchestration: Which to Choose
| Factor | Choreography | Orchestration |
|---|---|---|
| Flow visibility | Implicit, spread across services | Explicit, in one place |
| Coupling | Loose (event contracts) | Tighter (step interfaces) |
| Debugging | Hard — trace events across services | Easier — one log to follow |
| Adding new steps | Easy — subscribe to existing events | Requires orchestrator change |
| Central failure point | None | Orchestrator must be HA |
| Best for | Simple flows, many independent consumers | Complex flows, strict ordering |
A common real-world choice: start with orchestration for business-critical flows (orders, payments) where visibility matters most, and use choreography for ancillary reactions (sending confirmation emails, updating analytics) where loose coupling is more valuable.
Key Takeaways
The Saga pattern trades the simplicity of ACID transactions for the scalability and resilience of distributed systems. The cost is complexity: you must design and implement compensating transactions, ensure idempotency, and make your saga state durable.
The rules to carry with you:
- Every forward step needs a compensating step. If you can’t undo an action, it probably shouldn’t be a saga step.
- Always compensate in reverse order. The last executed step is compensated first.
- Never stop mid-compensation. If a compensation fails, log it, alert on it, and continue compensating the remaining steps. Stopping halfway leaves the system in a worse state.
- Make every handler idempotent. Messages will be redelivered. Design for it from day one.
- Use the transactional outbox. It’s the only reliable way to guarantee that a local commit and its downstream event are atomic.
- Instrument saga state. Track in-progress, completed, and failed sagas. A saga stuck in
COMPENSATINGfor more than a few minutes is worth an alert.