Event Driven Architecture Guide | Pillai Infotech LLP

Q: What's the difference between event-driven architecture and message-driven architecture?

Event-driven systems publish facts about what happened — producers don't care who consumes them. Message-driven systems send commands to specific recipients. Most real systems use both: events for notifications, messages for commands.

Q: How do I handle event schema evolution without breaking consumers?

Use a schema registry that enforces backward compatibility. Add new fields as optional — never remove or rename existing ones. Version your event types and run old and new consumers in parallel during migration.

Q: Is Kafka overkill for a startup?

Probably yes. If you're processing under 10,000 events per second, Redis Streams, Amazon SQS, or PostgreSQL LISTEN/NOTIFY can handle it. Start simple, migrate when you hit actual limits.

Q: How do I debug issues in an event-driven system?

Three essentials: correlation IDs in every event header, centralized logging with structured JSON, and distributed tracing with OpenTelemetry. Monitor consumer lag religiously.

Q: Can I use event-driven architecture with a monolith?

Absolutely. Use an in-process event bus within your monolith. When you extract a module into its own service, replace the in-process bus with Kafka or RabbitMQ. The event contracts stay the same.

In This Guide

1. Why Event-Driven? The Coupling Problem
2. Core EDA Patterns
3. Message Brokers Compared
4. Event Sourcing — When the Event IS the Data
5. CQRS — Separate Reads from Writes
6. Eventual Consistency and the Saga Pattern
7. Practical Implementation with Kafka
8. Pitfalls That Sink EDA Projects
9. When NOT to Use Event-Driven Architecture
10. Frequently Asked Questions

Event-driven architecture decouples services by having them communicate through events instead of direct API calls. A service publishes an event ("order placed"), and any interested service reacts to it — the publisher doesn't know or care who's listening. We moved one of our client's order processing system from synchronous REST calls to event-driven with Kafka, and what used to be a cascading failure across 6 services when payments went down became a 30-second delay with automatic retry. The system stayed up. But getting there required rethinking how we handle data consistency, debugging, and error recovery.

1. Why Event-Driven? The Coupling Problem

In a traditional request-response system, Service A calls Service B, which calls Service C. If C is down, B times out, which makes A time out, and your user sees an error. This is temporal coupling — services must be available at the same time.

Event-driven architecture breaks this chain. Service A publishes an event and moves on. Services B and C process it whenever they're ready — in milliseconds or in minutes. This isn't just about fault tolerance. It changes how teams work. The payments team can deploy independently because they only consume events — they don't need the orders team to update an API contract.

Dimension	Request-Response	Event-Driven
Temporal coupling	All services must be up simultaneously	Services process at their own pace
Knowledge coupling	Caller knows the callee's API	Publisher doesn't know consumers
Failure propagation	Cascading failures across the chain	Isolated — failed consumers retry independently
Scaling	Scale each service in the call chain	Scale consumers independently per topic
Adding features	Modify existing services, coordinate deploys	Add new consumer — zero changes to publisher
Debugging	Follow the call stack	Correlate across distributed logs (harder)

The debugging row is real. We've spent hours tracing why an order's email didn't send, only to discover the email consumer was 3 minutes behind because a batch job had flooded the topic. With request-response, you'd have gotten an error immediately. That's the trade-off.

2. Core EDA Patterns

Event Notification

The simplest pattern. A service publishes a thin event saying "something happened" — just an ID and event type. Consumers fetch details themselves if needed.

        
// Event notification — thin payload

{

  "event": "order.placed",

  "orderId": "ord_8x92k",

  "timestamp": "2026-01-22T10:30:00Z",

  "version": 1

}

// Consumer fetches details if it needs them

// GET /api/orders/ord_8x92k

Pro: Small messages, publisher doesn't need to know what consumers want. Con: Creates runtime coupling — consumer must call back to the publisher to get details.

Event-Carried State Transfer

The event carries all the data consumers need. No callbacks. This is what we use for most of our systems.

        
// Event-carried state — fat payload

{

  "event": "order.placed",

  "orderId": "ord_8x92k",

  "timestamp": "2026-01-22T10:30:00Z",

  "version": 1,

  "data": {

    "customerId": "cust_4fn2",

    "email": "buyer@example.com",

    "items": [

      { "sku": "WIDGET-01", "qty": 3, "price": 29.99 }

    ],

    "total": 89.97,

    "currency": "USD",

    "shippingAddress": { "city": "Mumbai", "zip": "400001" }

  }

}

Pro: True decoupling — consumers never call back. Con: Larger messages, and consumers might cache stale data. We mitigate this by always including a version number and using idempotency keys.

Domain Events vs Integration Events

This distinction matters more than most teams realize:

Domain events — internal to a bounded context. "InventoryReserved" is meaningful inside the warehouse service. Fine-grained, can change freely.
Integration events — cross service boundaries. "OrderShipped" goes to billing, notifications, analytics. These are your API contract — version them carefully.

We keep domain events on an internal bus (in-process or Redis Streams) and only publish integration events to Kafka. This prevents internal refactoring from breaking downstream consumers.

3. Message Brokers Compared

The broker you choose shapes your architecture more than you'd expect. Here's what actually matters after running each in production:

Broker	Best For	Throughput	Retention	Ops Complexity
Apache Kafka	Event streaming, replay, high volume	1M+ msgs/sec	Configurable (days/forever)	High (ZooKeeper/KRaft)
RabbitMQ	Task queues, routing, RPC	50K msgs/sec	Until consumed (no replay)	Low-Medium
Amazon SQS/SNS	AWS-native, serverless	Nearly unlimited	14 days max (SQS)	Zero (managed)
Redis Streams	Lightweight, already-have-Redis	200K msgs/sec	Memory-limited	Low
NATS JetStream	Edge, IoT, lightweight streaming	500K+ msgs/sec	Configurable	Low
Amazon EventBridge	Event routing, schema registry	High (managed)	Replay up to 24hrs	Zero (managed)

Our recommendation: Start with RabbitMQ if you're under 10K events/second and need routing flexibility. Move to Kafka when you need event replay, stream processing, or you're above 50K events/second. Use SQS/SNS if you're all-in on AWS and want zero ops burden. We've seen too many teams start with Kafka when Redis Streams would have been enough for their first two years.

4. Event Sourcing — When the Event IS the Data

Most applications store current state: "Account balance is $500." Event sourcing stores every change that led to that state: "Deposited $1000, Withdrew $300, Deposited $100, Withdrew $300." The current state is derived by replaying events.

This sounds academic until you need it. A fintech client needed to answer "Why does this account show $347.23?" — with a traditional database, you'd look at the balance column and shrug. With event sourcing, you replay the event stream and see every deposit, withdrawal, fee, and correction that produced that number.

        
Traditional DB (stores current state):

┌─────────────────────────────────────┐

│ accounts table                       │

│ id: acc_001 │ balance: 500.00        │

└─────────────────────────────────────┘

How did we get here? No idea.

Event Store (stores every change):

┌─────────────────────────────────────────────────┐

│ Stream: account-acc_001                          │

├──────┬─────────────────┬────────────┬───────────┤

│ Seq  │ Event Type       │ Amount     │ Balance   │

├──────┼─────────────────┼────────────┼───────────┤

│  1   │ AccountOpened     │ +1000.00  │  1000.00  │

│  2   │ FundsWithdrawn    │  -300.00  │   700.00  │

│  3   │ FundsDeposited    │  +100.00  │   800.00  │

│  4   │ FundsWithdrawn    │  -300.00  │   500.00  │

└──────┴─────────────────┴────────────┴───────────┘

Complete audit trail. Replay to any point in time.

Event Sourcing Implementation (Node.js/TypeScript)

        
// Event types

type AccountEvent =

  | { type: 'AccountOpened'; balance: number }

  | { type: 'FundsDeposited'; amount: number }

  | { type: 'FundsWithdrawn'; amount: number }

  | { type: 'AccountFrozen'; reason: string };

// State derived from events — never stored directly

interface AccountState {

  id: string;

  balance: number;

  status: 'active' | 'frozen';

  version: number;

}

// Pure function: fold events into state

function applyEvent(state: AccountState, event: AccountEvent): AccountState {

  switch (event.type) {

    case 'AccountOpened':

      return { ...state, balance: event.balance, status: 'active' };

    case 'FundsDeposited':

      return { ...state, balance: state.balance + event.amount };

    case 'FundsWithdrawn':

      return { ...state, balance: state.balance - event.amount };

    case 'AccountFrozen':

      return { ...state, status: 'frozen' };

  }

}

// Rebuild state by replaying all events

function rebuildAccount(events: AccountEvent[]): AccountState {

  const initial: AccountState = { id: '', balance: 0, status: 'active', version: 0 };

  return events.reduce((state, event, idx) => ({

    ...applyEvent(state, event),

    version: idx + 1

  }), initial);

}

When Event Sourcing Makes Sense

Use It When	Skip It When
Full audit trail is a requirement (finance, healthcare)	Simple CRUD with no history needs
You need to reconstruct past states ("what was the balance on March 3?")	Current state is all that matters
Multiple read models from the same data (reporting, search, analytics)	Single read pattern
Domain experts think in events ("order placed", "payment received")	Your team hasn't worked with eventual consistency before

Snapshot optimization: Replaying 10,000 events on every read is slow. Create snapshots every N events (we use every 100). To rebuild state: load latest snapshot + replay only events after it. This brings read latency from seconds to single-digit milliseconds.

5. CQRS — Separate Reads from Writes

Command Query Responsibility Segregation means using different models for reading and writing data. Commands change state (PlaceOrder, CancelSubscription). Queries read state (GetOrderDetails, ListActiveSubscriptions). In a traditional app, both use the same database and the same model. CQRS splits them.

        
CQRS Architecture:

  ┌──────────┐             ┌──────────────┐

  │ Commands  │──────────▶│ Write Model  │

  │ (POST/PUT)│             │ (normalized) │

  └──────────┘             └──────┬───────┘

                                  │ Events

                                  ▼

                         ┌──────────────┐

                         │  Event Bus   │

                         └──────┬───────┘

                                  │

                                  ▼

  ┌──────────┐             ┌──────────────┐

  │ Queries   │◀─────────── │ Read Model   │

  │ (GET)     │             │ (denormalized)│

  └──────────┘             └──────────────┘

Write: PostgreSQL (normalized, ACID)

Read: Elasticsearch (search), Redis (dashboard), DynamoDB (API)

Why bother? Because reads and writes have fundamentally different needs. Your write model should be normalized and enforce business rules. Your read model should be denormalized and optimized for the exact queries your UI needs. An e-commerce product page needs: product details, reviews, pricing, inventory, related products. Joining 6 tables on every page load is painful. With CQRS, you pre-compute a single document with everything the page needs.

We use CQRS without event sourcing on most projects. They pair well together, but they're independent patterns. You can have CQRS with a regular database — just use separate tables or databases for reads and writes, with a projection that keeps the read side updated.

6. Eventual Consistency and the Saga Pattern

In distributed event-driven systems, you lose ACID transactions across services. You place an order (Order Service), reserve inventory (Inventory Service), charge the card (Payment Service). What if payment fails after inventory is reserved? You need a way to coordinate rollback across services.

Saga Pattern: Choreography vs Orchestration

	Choreography	Orchestration
How it works	Each service listens for events and reacts	A central orchestrator tells each service what to do
Coupling	Low — services only know about events	Higher — orchestrator knows all participants
Visibility	Hard to see the full flow	Clear — saga state machine in one place
Best for	3-4 steps, independent teams	5+ steps, complex compensation logic
Our pick	Simple flows, early-stage systems	Order processing, payments, anything with money

Orchestrated Saga Example (Order Flow)

        
Order Saga — Happy Path:

OrderSaga ──▶ InventoryService: ReserveItems

             ◀── ItemsReserved

         ──▶ PaymentService: ChargeCard

             ◀── PaymentSucceeded

         ──▶ ShippingService: CreateShipment

             ◀── ShipmentCreated

         ──▶ OrderService: ConfirmOrder

Order Saga — Payment Failed (compensating actions):

OrderSaga ──▶ InventoryService: ReserveItems

             ◀── ItemsReserved

         ──▶ PaymentService: ChargeCard

             ◀── PaymentFailed

         ──▶ InventoryService: ReleaseItems (compensate)

             ◀── ItemsReleased

         ──▶ OrderService: RejectOrder

        
// Saga orchestrator (simplified TypeScript)

class OrderSaga {

  private steps = [

    {

      action: (ctx) => this.inventory.reserve(ctx.items),

      compensate: (ctx) => this.inventory.release(ctx.items)

    },

    {

      action: (ctx) => this.payment.charge(ctx.total, ctx.paymentMethod),

      compensate: (ctx) => this.payment.refund(ctx.paymentId)

    },

    {

      action: (ctx) => this.shipping.create(ctx.orderId, ctx.address),

      compensate: (ctx) => this.shipping.cancel(ctx.shipmentId)

    }

  ];

  async execute(ctx: OrderContext) {

    const completed: number[] = [];

    for (let i = 0; i < this.steps.length; i++) {

      try {

        await this.steps[i].action(ctx);

        completed.push(i);

      } catch (error) {

        // Compensate in reverse order

        for (const j of completed.reverse()) {

          await this.steps[j].compensate(ctx);

        }

        throw new SagaFailedError(i, error);

      }

    }

  }

}

7. Practical Implementation with Kafka

Here's a real implementation pattern we use for order processing. This handles idempotency, dead letter queues, and schema evolution — the three things most tutorials skip.

Producer: Publishing Events with Outbox Pattern

Never publish events directly to Kafka from your application. If the DB write succeeds but the Kafka publish fails, you've got an inconsistent state. Use the outbox pattern: write the event to a database table in the same transaction as the state change, then a separate process publishes from the outbox to Kafka.

        
-- Outbox table (PostgreSQL)

CREATE TABLE event_outbox (

  id           BIGSERIAL PRIMARY KEY,

  aggregate_type VARCHAR(100) NOT NULL,

  aggregate_id  VARCHAR(100) NOT NULL,

  event_type    VARCHAR(100) NOT NULL,

  payload       JSONB NOT NULL,

  created_at    TIMESTAMPTZ DEFAULT NOW(),

  published_at  TIMESTAMPTZ NULL

);

-- In the same transaction as the order insert:

BEGIN;

  INSERT INTO orders (id, customer_id, total, status)

  VALUES ('ord_8x92k', 'cust_4fn2', 89.97, 'placed');

  INSERT INTO event_outbox (aggregate_type, aggregate_id, event_type, payload)

  VALUES ('Order', 'ord_8x92k', 'order.placed', '{

    "orderId": "ord_8x92k",

    "customerId": "cust_4fn2",

    "total": 89.97,

    "items": [{"sku": "WIDGET-01", "qty": 3}]

  }'::jsonb);

COMMIT;

Consumer: Idempotent Processing

Consumers WILL receive duplicate events. Kafka guarantees at-least-once delivery, not exactly-once. Your consumer must be idempotent — processing the same event twice should produce the same result.

        
// Idempotent consumer (Node.js + KafkaJS)

import { Kafka } from 'kafkajs';

const kafka = new Kafka({ brokers: ['kafka:9092'] });

const consumer = kafka.consumer({ groupId: 'email-service' });

await consumer.subscribe({ topic: 'order-events', fromBeginning: false });

await consumer.run({

  eachMessage: async ({ topic, partition, message }) => {

    const event = JSON.parse(message.value.toString());

    const eventId = message.headers['event-id']?.toString();

    // Idempotency check — skip if already processed

    const existing = await db.query(

      'SELECT 1 FROM processed_events WHERE event_id = $1', [eventId]

    );

    if (existing.rows.length > 0) return; // duplicate, skip

    try {

      await db.query('BEGIN');

      // Process the event

      if (event.type === 'order.placed') {

        await sendOrderConfirmationEmail(event.data);

      }

      // Record as processed (same transaction)

      await db.query(

        'INSERT INTO processed_events (event_id, processed_at) VALUES ($1, NOW())',

        [eventId]

      );

      await db.query('COMMIT');

    } catch (error) {

      await db.query('ROLLBACK');

      // Don't commit offset — message will be retried

      throw error;

    }

  }

});

Dead Letter Queue (DLQ)

After N retries, move failed events to a DLQ for manual inspection. Don't let a single bad event block the entire partition.

        
// DLQ wrapper

async function processWithDLQ(event, handler, maxRetries = 3) {

  const retryCount = parseInt(event.headers['retry-count'] || '0');

  try {

    await handler(event);

  } catch (error) {

    if (retryCount >= maxRetries) {

      // Send to dead letter topic

      await producer.send({

        topic: `${event.topic}.dlq`,

        messages: [{

          key: event.key,

          value: event.value,

          headers: {

            ...event.headers,

            'error-message': error.message,

            'failed-at': new Date().toISOString()

          }

        }]

      });

      console.error(`Event sent to DLQ after ${maxRetries} retries`, event);

    } else {

      // Re-publish with incremented retry count

      await producer.send({

        topic: event.topic,

        messages: [{

          key: event.key,

          value: event.value,

          headers: { ...event.headers, 'retry-count': String(retryCount + 1) }

        }]

      });

    }

  }

}

8. Pitfalls That Sink EDA Projects

We've built event-driven systems for multiple clients. Here are the mistakes that hurt most:

Pitfall	What Happens	Prevention
No schema registry	Producer changes event format, consumers break silently	Use Confluent Schema Registry or AWS Glue Schema Registry
No correlation IDs	Can't trace a request across 8 services	Generate correlation ID at entry point, propagate in every event header
Ordering assumptions	"OrderPaid" arrives before "OrderPlaced"	Use partition keys (order ID) for ordering guarantees within a partition
No dead letter queue	Poison message blocks the partition forever	DLQ after 3-5 retries with exponential backoff
Event explosion	100+ event types, nobody knows what triggers what	Event catalog + AsyncAPI spec. Review new events like you review API changes.
Testing in isolation	Unit tests pass, integration fails at 2am	Contract tests (Pact) + end-to-end event flow tests with Testcontainers
No backpressure	Fast producer overwhelms slow consumer, lag grows unbounded	Monitor consumer lag, auto-scale consumer groups, set rate limits on producers

The biggest mistake we see: Teams adopt event-driven architecture but keep synchronous mindsets. They publish an event and immediately query the read model expecting updated data. Eventual consistency means there's a delay — milliseconds usually, but sometimes seconds under load. Your UI needs to handle this: show optimistic updates, use "processing" states, or poll for confirmation.

9. When NOT to Use Event-Driven Architecture

EDA adds complexity. It's justified when you get decoupling, scalability, and resilience in return. It's not justified when a simple REST call would do the job.

Small team, single service — If 3 developers maintain one monolith, events between internal modules add complexity with no decoupling benefit. Use function calls.
Strong consistency required — Banking transfers between accounts within the same bank should be ACID transactions, not eventual consistency. Don't fight the requirement.
Simple CRUD apps — A content management system that creates, reads, updates, and deletes articles doesn't need Kafka. PostgreSQL triggers or simple webhooks work fine.
Real-time request-response — User clicks "checkout" and needs immediate confirmation? That's a synchronous flow. You can publish events after the synchronous confirmation, but the primary path should be direct.
Team doesn't understand eventual consistency — If your team has never debugged a distributed system, introducing EDA under deadline pressure will end badly. Train first, adopt gradually.

The hybrid approach we recommend: Use synchronous REST/gRPC for the primary user-facing flow (place order -> confirm). Then publish events asynchronously for side effects (send email, update analytics, trigger fulfillment). This gives you immediate user feedback AND decoupled downstream processing.

10. Frequently Asked Questions

What's the difference between event-driven architecture and message-driven architecture?

Event-driven systems publish facts about what happened ("OrderPlaced") — producers don't care who consumes them. Message-driven systems send commands to specific recipients ("ProcessPayment") — the sender expects a particular receiver to act. Most real systems use both: events for notifications, messages for commands. Kafka is event-oriented, RabbitMQ is message-oriented, though both can do either.

How do I handle event schema evolution without breaking consumers?

Use a schema registry (Confluent or AWS Glue) that enforces backward compatibility. Add new fields as optional — never remove or rename existing ones. Version your event types (order.placed.v2) and run old and new consumers in parallel during migration. We use Avro schemas for strict contracts and JSON Schema for lighter use cases.

Is Kafka overkill for a startup?

Probably, yes. If you're processing under 10,000 events per second, Redis Streams, Amazon SQS, or even PostgreSQL LISTEN/NOTIFY can handle it. Kafka shines at massive scale, stream processing, and event replay — features most startups don't need in year one. Start simple, migrate when you hit actual limits. We've seen too many seed-stage companies spend months on Kafka infrastructure instead of shipping features.

How do I debug issues in an event-driven system?

Three essentials: correlation IDs in every event header (trace a request across all services), centralized logging with structured JSON (we use ELK stack), and distributed tracing (OpenTelemetry with Jaeger). Monitor consumer lag religiously — if a consumer falls behind, you'll get stale data and complaints before you notice. Kafka Manager or Confluent Control Center make lag visible.

Can I use event-driven architecture with a monolith?

Absolutely — and this is often the smartest starting point. Use an in-process event bus (like MediatR in .NET or a simple EventEmitter in Node.js) within your monolith. Modules publish and subscribe to events without HTTP calls. When you eventually extract a module into its own service, you replace the in-process bus with Kafka or RabbitMQ. The event contracts stay the same. We call this "monolith with event seams."

Pillai Infotech Engineering Team

We build distributed systems for clients ranging from fintech startups to enterprise logistics platforms. Our event-driven implementations process millions of events daily across Kafka, RabbitMQ, and AWS-native stacks. This guide reflects patterns we've validated in production — not theoretical ideals.

Microservices vs Monolith: When to Make the Switch → Domain-Driven Design (DDD): A Practical Guide for Developers → API Gateway Patterns for Microservices Architecture →

Event-Driven Architecture: Design Patterns and Implementation

In This Guide

1. Why Event-Driven? The Coupling Problem

2. Core EDA Patterns

Event Notification

Event-Carried State Transfer

Domain Events vs Integration Events

3. Message Brokers Compared

4. Event Sourcing — When the Event IS the Data

Event Sourcing Implementation (Node.js/TypeScript)

When Event Sourcing Makes Sense

5. CQRS — Separate Reads from Writes

6. Eventual Consistency and the Saga Pattern

Saga Pattern: Choreography vs Orchestration

Orchestrated Saga Example (Order Flow)

7. Practical Implementation with Kafka

Producer: Publishing Events with Outbox Pattern

Consumer: Idempotent Processing

Dead Letter Queue (DLQ)

8. Pitfalls That Sink EDA Projects

9. When NOT to Use Event-Driven Architecture

10. Frequently Asked Questions

What's the difference between event-driven architecture and message-driven architecture?

How do I handle event schema evolution without breaking consumers?

Is Kafka overkill for a startup?

How do I debug issues in an event-driven system?

Can I use event-driven architecture with a monolith?

Related Articles

Event-Driven Architecture: Design Patterns and Implementation

In This Guide

1. Why Event-Driven? The Coupling Problem

2. Core EDA Patterns

Event Notification

Event-Carried State Transfer

Domain Events vs Integration Events

3. Message Brokers Compared

4. Event Sourcing — When the Event IS the Data

Event Sourcing Implementation (Node.js/TypeScript)

When Event Sourcing Makes Sense

5. CQRS — Separate Reads from Writes

6. Eventual Consistency and the Saga Pattern

Saga Pattern: Choreography vs Orchestration

Orchestrated Saga Example (Order Flow)

7. Practical Implementation with Kafka

Producer: Publishing Events with Outbox Pattern

Consumer: Idempotent Processing

Dead Letter Queue (DLQ)

8. Pitfalls That Sink EDA Projects

9. When NOT to Use Event-Driven Architecture

10. Frequently Asked Questions

What's the difference between event-driven architecture and message-driven architecture?

How do I handle event schema evolution without breaking consumers?

Is Kafka overkill for a startup?

How do I debug issues in an event-driven system?

Can I use event-driven architecture with a monolith?

Related Articles

Book a Free Consultation

Your Details

Pick a 30-min Slot

Thank You!