Publish/Subscribe (Pub/Sub) in System Design: Topics, Fan-Out & Delivery Guarantees (Visualized)
Publish/subscribe is a messaging pattern where publishers send messages to named topics and subscribers receive them without either side knowing the other exists. This guide covers fan-out, topics vs queues, push vs pull delivery, filtering, ordering and delivery guarantees, and real systems like Kafka, SNS, and Redis โ with live animations.
Publish/subscribe (pub/sub) is a messaging pattern in which senders, called publishers, emit messages to a named channel called a topic, and receivers, called subscribers, register interest in that topic to receive copies of those messages. The defining property is decoupling: a publisher does not know who (if anyone) is subscribed, and a subscriber does not know who produced the message.
This indirection is what makes pub/sub the backbone of event-driven architectures. A single "order placed" event can fan out to a billing service, an email service, an analytics pipeline, and a fraud detector at once โ and you can add a fifth consumer later without touching the publisher. The broker in the middle handles routing, buffering, and delivery.
How Pub/Sub Works: Publishers, Topics, Subscribers
Three roles make up the pattern. Publishers produce messages and address them to a topic, never to a specific recipient. The topic is a logical channel managed by a broker (Kafka, Google Pub/Sub, SNS, Redis). Subscribers declare interest in one or more topics; the broker delivers a copy of every matching message to each subscriber. Because all communication flows through topics, publishers and subscribers can be deployed, scaled, and restarted independently. This strict separation of concerns means you can add a new data consumer โ say, a fraud-detection service โ to an existing event stream with zero changes to any existing publisher or subscriber. The broker is the only shared contract, and that contract is simply a topic name and a message schema.
Fan-Out: One Message, Many Subscribers
The signature behavior of pub/sub is fan-out: when a publisher sends one message to a topic, every subscriber to that topic receives its own copy, simultaneously and independently. There is no contention between subscribers โ each gets the full stream. This is exactly what you want when many distinct services must react to the same event.
Topic vs Queue: Pub/Sub vs Point-to-Point
It is easy to confuse pub/sub with a plain message queue, but they distribute work differently. In a point-to-point queue, each message is delivered to exactly one consumer; multiple consumers compete and the broker load-balances among them. In pub/sub, each message is delivered to every subscriber. Queues spread work; topics broadcast events. A useful mental model: think of a message queue as a shared task list where only one worker picks up each item, and a pub/sub topic as a newspaper โ every subscriber gets their own copy of every edition, regardless of how many others also read it.
Push vs Pull Delivery
Brokers deliver messages to subscribers in one of two modes. With push, the broker actively sends each message to the subscriber (for example an HTTP POST to a webhook, as Amazon SNS does). With pull, the subscriber polls the broker and fetches messages at its own pace (Kafka consumers and Google Pub/Sub pull subscriptions). Push is low-latency but can overwhelm a slow consumer; pull gives the consumer back-pressure control and lets it batch. In high-throughput systems, pull is almost always preferred: a subscriber can read in batches of hundreds or thousands of messages per request, dramatically reducing round-trip overhead, and it can pause consumption entirely during a maintenance window without the broker queueing up retries. Google Cloud Pub/Sub supports both modes simultaneously โ the same subscription can be configured as a push endpoint or polled via gRPC streaming pull.
Filtering: Subscribing to a Subset
Subscribers often want only some messages on a topic. Filtering lets the broker evaluate predicates against message attributes and deliver only matches. SNS supports message-filtering policies on attributes; Google Cloud Pub/Sub supports attribute and CEL-based body filters; many MQTT brokers use hierarchical topic wildcards like sensors/+/temperature. Filtering moves the discard decision to the broker, saving the subscriber from receiving and dropping irrelevant traffic. This is especially valuable in high-fan-out scenarios: if a topic receives 100,000 events per minute but a given subscriber only cares about events where region == "us-east", the broker can discard the other 95 % before they ever cross the network to that subscriber. Done at scale, server-side filtering cuts both egress cost and subscriber CPU load significantly.
Delivery Guarantees and Ordering
Brokers offer different delivery guarantees: at-most-once (fire and forget, messages may be lost โ Redis classic pub/sub falls here), at-least-once (retried until acknowledged, so duplicates are possible โ Kafka and Google Pub/Sub default), and exactly-once (no loss, no duplicates โ the hardest and most expensive, available in Kafka with idempotent producers and transactional consumers, and in Google Cloud Pub/Sub's exactly-once subscriptions). Because at-least-once is the common default, consumers should be idempotent so reprocessing a duplicate is harmless. A practical technique is to assign a unique message ID at publish time and track processed IDs in a fast store like Redis; if the ID is already in the set, skip processing and ack immediately. This keeps your consumers correct under retries without needing expensive distributed transactions.
Ordering is a separate concern. Global ordering across an entire topic is rarely guaranteed at scale. Kafka preserves order only within a partition; messages sharing a partition key arrive in order, while different keys may interleave. Google Pub/Sub offers ordering keys with the same trade-off. Design your keys so that messages that must be ordered (e.g. all events for one user) share a partition.
# Idempotent at-least-once consumer: dedupe by message id
seen = set() # in production: Redis / DB with TTL
def handle(message):
msg_id = message.attributes["id"]
if msg_id in seen:
return # duplicate delivery -> safely skip
seen.add(msg_id)
process(message.data)
message.ack() # tell the broker delivery succeeded
# Ordering key keeps all events for one user in one partition,
# so they are delivered in the order they were published.
publisher.publish(topic, data, ordering_key=f"user-{user_id}")Pub/Sub vs Message Queue
| Pub/Sub (Topic) | Message Queue (Point-to-Point) | |
|---|---|---|
| Delivery | Copy to every subscriber | Each message to exactly one consumer |
| Consumers | Independent, all get full stream | Compete; broker load-balances |
| Primary use | Broadcast events, fan-out | Distribute / parallelize work |
| Scaling consumers | Adds another full copy of the stream | Adds throughput; work is split |
| Examples | Kafka, SNS, Google Pub/Sub, Redis | SQS, RabbitMQ queue, Kafka group |
The line blurs in practice. Kafka combines both: a topic broadcasts to multiple consumer groups (pub/sub fan-out), while within a single group partitions are split among members (queue-style work distribution). RabbitMQ models pub/sub with a fanout exchange in front of per-consumer queues.
A simple rule of thumb: choose a queue when you have a single logical job that many workers should share, and choose pub/sub when one event must trigger several unrelated reactions. When you need both โ durable history and broadcast โ reach for a log-based broker like Kafka, where each consumer group keeps its own offset and reads the full stream independently.
Common Implementations
Apache Kafka is a durable, partitioned append-only log: subscribers pull, messages are retained for a configurable window (days to weeks), and each consumer group tracks its own read offset independently of every other group. This makes Kafka the broker of choice when new consumers must replay historical data or when multiple teams each need the full event stream without any coordination overhead. Throughput can reach millions of events per second across a well-tuned cluster. Amazon SNS is a serverless, managed push pub/sub that fans out to SQS queues, Lambda functions, HTTP/HTTPS endpoints, email, and SMS โ with JSON attribute filtering so only relevant subscribers receive each message. It pairs naturally with SQS in the SNS+SQS fan-out pattern, where one SNS topic feeds N independent queues that each buffer work for a different downstream system. Google Cloud Pub/Sub is a globally distributed managed broker with both push and pull delivery, at-least-once and exactly-once subscription modes, dead-letter topics, and ordering keys. Redis Pub/Sub is in-memory and fire-and-forget (at-most-once) โ sub-millisecond latency but no persistence; if no subscriber is connected when a message is published, that message is lost. Redis Streams adds consumer group semantics and durability to Redis, making it far more suitable for production event pipelines that need replay or guaranteed delivery.
Frequently Asked Questions
What is the difference between pub/sub and a message queue?
A message queue delivers each message to exactly one consumer, so adding consumers spreads the work. Pub/sub delivers a copy of each message to every subscriber, so adding a subscriber gives you another independent stream. Use a queue to parallelize processing of one job stream; use pub/sub to broadcast an event to many distinct services.
Does pub/sub guarantee message ordering?
Usually not globally. Most brokers guarantee ordering only within a partition or ordering key โ messages sharing that key arrive in order, but messages across keys may interleave. If end-to-end order matters, route related messages through the same key, and remember that at-least-once delivery still means consumers should be idempotent.
Is Kafka a pub/sub system or a message queue?
Both. Kafka topics fan out to multiple consumer groups like pub/sub, while members within one consumer group split the topic's partitions like a work queue. Its durable, replayable log also lets late or new subscribers read past messages โ something classic in-memory pub/sub such as Redis cannot do.
Pub/sub decouples who produces an event from who reacts to it. Publish once, and any number of services โ present or future โ can listen without the publisher ever knowing they exist.
โ alokknight Engineering
