Outbox Pattern vs Dual Writes: A Practical Reliability Guide

Why dual writes fail under real distributed failures and how the outbox pattern provides safer event publishing.

Jan 14, 20265 min read

The core problem

Many teams need to write to a database and publish an event. The unsafe approach is dual writes: first write to DB, then publish to broker, or the reverse. This looks fine in local tests but fails under network partitions, process crashes, and broker timeouts. If one write succeeds and the second fails, your system state and event stream diverge.

Why dual writes are fragile

Dual writes fail because there is no single atomic boundary across two independent systems.

  • DB commit succeeds, broker publish fails
  • Broker publish succeeds, DB rollback happens
  • Retry logic publishes duplicate events
  • Partial failures create hard-to-reconcile drift

Even if failures are rare, they accumulate in production.

What the outbox pattern changes

Outbox pattern stores domain data and event record in the same database transaction.

  • Step 1: Business write and outbox row are committed atomically
  • Step 2: A background relay reads pending outbox rows
  • Step 3: Relay publishes events to broker and marks rows as sent

This converts a distributed atomicity problem into a local transaction plus asynchronous delivery.

Design details that matter

Use explicit status and retry metadata in outbox rows.

  • <code>event_id</code> for idempotent consumption
  • <code>aggregate_id</code> and type for routing
  • <code>payload</code> as immutable JSON
  • <code>published_at</code>, <code>retry_count</code>, and <code>next_retry_at</code>

Operationally, relay should support batch publish and backoff.

Ordering and exactly-once concerns

Outbox gives at-least-once delivery from relay to broker, so consumers still need idempotency.

  • Preserve ordering per aggregate where required
  • Use consumer dedup keys (<code>event_id</code>)
  • Avoid assuming exactly-once semantics end-to-end

Outbox improves correctness, but it does not remove consumer-side reliability design.

Migration strategy from dual writes

  • Add outbox table and write path first
  • Publish from outbox in shadow mode
  • Compare stream parity with existing producer
  • Cut over consumers gradually
  • Decommission dual-write publisher

This reduces risk during transition.

Suggested outbox schema

A minimal schema keeps relay logic predictable and observable.

  • <code>id</code> (monotonic primary key for batching)
  • <code>event_id</code> (globally unique dedup key)
  • <code>aggregate_type</code> and <code>aggregate_id</code>
  • <code>event_type</code> and <code>payload</code>
  • <code>status</code> (<code>pending</code>, <code>published</code>, <code>failed</code>)
  • <code>created_at</code>, <code>published_at</code>, <code>last_error</code>

Index by <code>status</code> and <code>created_at</code> for efficient relay scans.

Relay worker design

Relay is where most real-world issues surface. Keep it resilient.

  • Poll in small batches with lease/lock semantics
  • Publish with retry and exponential backoff
  • Mark success only after broker acknowledgment
  • Send repeated failures to dead-letter workflow

Relay should be horizontally scalable but preserve per-aggregate ordering where required.

Common anti-patterns

  • Deleting rows immediately after publish (loses audit trail)
  • No dedup key in payload contract
  • Unbounded retries without poison event handling
  • Running relay in same request path as user API call

Avoiding these mistakes makes outbox behavior stable under incidents.

What teams should monitor

Outbox systems become much easier to trust when the right metrics are visible.

  • Pending outbox row count
  • Oldest unsent event age
  • Relay publish failure rate
  • Retry distribution by event type

These signals help teams detect delivery drift before it becomes a business issue.

Where outbox gives the biggest value

The outbox pattern is especially useful when the database write is the source of truth and other systems depend on accurate follow-up events. Examples include order creation, payment state changes, user signup flows, and inventory updates.

Final takeaway

Dual writes optimize for short-term simplicity and long-term incidents. Outbox pattern adds controlled complexity but creates a much safer reliability boundary for event-driven systems.