λ fbounded  :: Blog
All posts
#distributed-systems #patterns #architecture

The Shared Log: One Sequence to Rule Them All

Why treating an append-only log as the primary source of truth simplifies distributed state, enables replay-based recovery, and maps perfectly onto CQRS.

State Is a Projection

Every stateful system stores state somewhere: a database row, an in-memory object, a cache entry. But that state is always derived. Something happened to produce it: a write, a transition, a computation. The state you see is the end of a story you probably threw away.

Most systems discard the story and keep only the final chapter. That works until the final chapter corrupts, diverges, or needs to be understood, at which point you have nothing to go back to.

The shared log flips this. It keeps the full story, in order, permanently. Current state becomes a consequence of the log, not the source of it.


The Shared Log

A shared log is an append-only, totally ordered, durable sequence of records. Writers append entries to the tail. Readers consume entries from any offset. Every reader sees the same sequence in the same order.

offset:  0       1       2       3       4       5
       ┌───────┬───────┬───────┬───────┬───────┬───────┐
       │  rec  │  rec  │  rec  │  rec  │  rec  │  rec  │──▶
       └───────┴───────┴───────┴───────┴───────┴───────┘
         ▲                       ▲                ▲
       reader A             reader B           tail

The critical property is not persistence. It is total order. Every record has a globally unique, monotonically increasing offset. No two records share an offset. No reader ever observes records out of sequence. This single guarantee is what makes the log a reliable source of truth across distributed components: everyone agrees on what happened, and in what order.

This is not a message queue. Kafka is the closest analogue: it is also log-structured and offset-based, which is why the confusion is common. But Kafka’s ordering guarantee is per partition. Two records written to different partitions have no defined order relative to each other; consumers that need to correlate them must impose their own sequencing on top. A shared log gives you one total order across every record from every writer, which is what makes it a source of truth rather than a delivery mechanism. vCorfu [1] is a concrete example: a cloud-scale object store built directly on this abstraction.

Traditional message queues (RabbitMQ, SQS) go further in the other direction: messages are consumed and deleted. There is no offset, no replay, no durable history. A shared log is neither of these. Every record is retained indefinitely, every reader holds its own offset and advances independently, and no reader’s progress affects any other.


State Is Derived

If the log is primary, state is derived. Reading every entry from offset 0 and applying each one in order gives you the current state. Not as a restore operation, but as the definition of the state.

State state = State.empty();
for (LogEntry entry : log.readFrom(0)) {
    state = state.apply(entry);
}

The practical consequence: if you lose your state, replay the log. There is no backup to restore, no snapshot to locate, no peer to copy from. The log is the backup.

This changes how you think about the in-memory or database copy of your state. It is not the source of truth. It is a materialized view, a projection of the log optimised for reads. You can throw it away. You can rebuild it at any time from offset 0. You can build a completely different projection tomorrow, pointed at the same log, without touching any writer.


Two Runtimes, One Principle

I have built two runtimes on top of a shared log. They look different on the surface, but the core principle is identical in both.

A workflow runtime (boudin). A workflow is a sequence of steps: external calls, branches, waits. Each step transition is appended to the log as the workflow progresses. On crash, you do not consult a checkpoint store. You replay the log from the beginning, re-derive the current step, and resume from exactly where execution paused. The log is the durable execution state. No external persistence required.

An actor runtime (bayou). Each actor has an identity. Messages addressed to it land in the shared log, tagged with its ID. The actor’s state is the fold of all messages it has ever received, in log order. On restart, the actor replays its slice of the log and arrives at the exact same state it had before the crash, deterministically, without coordination.

              ┌─────────────────────────────────┐
              │          Shared Log             │
              │  0   1   2   3   4   5   6   7  │
              └──────────────┬──────────────────┘
                    ┌────────┼──────────┐
                    ▼        ▼          ▼
              Workflow A  Actor #42  Actor #99
              (at off 4) (at off 7) (at off 5)

Neither runtime needs external state storage beyond the log. Both get fault tolerance and replay-based recovery as direct consequences of sitting on top of a totally ordered, durable sequence. Both are built on gumbo [3], a shared log implementation in Java 21 that provides pluggable storage backends and tag-scoped logical streams. Boki [2] applies the same principle to serverless computing, building stateful function runtimes — including workflow execution and a key-value store — on top of a shared log.


CQRS Is Already This

CQRS (Command Query Responsibility Segregation) says: separate the write path from the read path. Commands mutate state through one model; queries read from another.

The shared log maps onto this naturally. The log is the write model, the canonical ordered record of every command that has ever been accepted. The read models are projections built by consuming that log.

  Commands ──▶  [ Shared Log ]

         ┌────────────┼────────────┐
         ▼            ▼            ▼
    Actor State   Workflow View  Audit Trail
   (projection)  (projection)  (projection)

Take an order service. A client submits a PlaceOrder command. The handler validates it and appends an event to the log:

// write side: one append, nothing else
log.append(new OrderPlaced(orderId, userId, items, timestamp));

That single append is the entire write path. Three projections consume the same event independently:

// projection 1: order lookup by ID
case OrderPlaced e -> orders.put(e.orderId(), new Order(e));

// projection 2: orders per user (for "my orders" page)
case OrderPlaced e -> userOrders.get(e.userId()).add(e.orderId());

// projection 3: inventory
case OrderPlaced e -> e.items().forEach(i -> inventory.decrement(i.sku(), i.qty()));

Each projection is a fold: (currentState, event) -> nextState. They run independently, at their own pace, from the same log. Adding a fourth projection (a revenue summary, a fraud signal, a search index) means writing one more consumer. The write path does not change.

What makes this powerful is what adding a new read model costs: nothing on the write side. You open a new consumer at offset 0, replay the full log into whatever data structure you need (a search index, a time-series store, a relational view) and you are done. No migration scripts. No backfill jobs. No coordination with writers.

This is also event sourcing, if you want that label. The log stores the sequence of events; every state is derived from that sequence. CQRS and event sourcing describe the same shape from different angles; the shared log is the structure underneath both.


What You Get For Free

Once the log is primary, several capabilities arrive without additional design work.

Time travel. Replay to any offset and you reconstruct state at that exact point in history. Debugging a production issue means replaying the exact sequence of events that produced it, not reconstructing it from scattered application logs and guesses.

Recovery without coordination. A reader that crashes restarts from its last committed offset. No quorum, no leader election, no distributed lock required to restore consistent state. The log is already consistent; the reader just catches up.

Read scaling. Readers are stateless except for their current offset. You can fan out as many readers as needed, all consuming the same log independently, without affecting the write path.

Audit trail by default. The log is the audit trail. There is no separate audit table to maintain, no risk of audit records falling out of sync with system state; they are the same artifact.

Exactly-once effects. Append to the log at most once per logical operation; apply each entry exactly once using idempotent consumer logic. At-least-once delivery from the transport, at-most-once effect from the handler, exactly-once end-to-end.


Takeaway

The shared log is not a messaging pattern or a logging strategy. It is a foundational architectural primitive: the commitment to making history the primary artifact and letting everything else be derived from it.

Once you make that commitment, patterns like CQRS, event sourcing, actor recovery, and durable workflow execution stop being separate concerns to bolt together. They become natural consequences of a single underlying structure. The complexity does not disappear; it moves out of application logic and into the log substrate, where it can be solved once and reused everywhere.

Future posts will go into the workflow and actor runtimes in detail: what the log entries look like, how replay is made efficient, and where the real edge cases hide.


References

  1. Michael Wei, Amy Tai, Christopher J. Joseph, Abutalib Aghayev, Peter Desnoyers, Jason Flinn, Easwaran Raman. vCorfu: A Cloud-Scale Object Store on a Shared Log. NSDI 2017. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/wei-michael

  2. Zhipeng Jia, Chia-Che Tsai, Shiyi Wei, Muhammad Ali Gulzar, Junfeng Yang. Boki: Stateful Serverless Computing with Shared Logs. SOSP 2021. https://dl.acm.org/doi/10.1145/3477132.3483541

  3. CajunSystems. gumbo: A shared log and executor framework. https://github.com/CajunSystems/gumbo


Try It

If you want to see these ideas in action before building anything, I put together a Shared Log Visualizer. You can publish events with a payload and tags, watch them land in the central log, then add consumers with different filter tags and reducers to build materialized views from the same sequence. It is a small sandbox, but it makes the projection model concrete fast.

PS
Pradeep Samuel Tech Lead @ Nextiva