Ideas Engineered for Tomorrow
We Engineer Services & Solutions for Your Business Needs
Home About
Products
Services
Hire
Industries
Consulting
Partners
Articles Careers Contact
Software Development

Event-Driven Architecture: Design Patterns and Implementation

Your request-response architecture hits a wall around 10,000 concurrent users. Here's how event-driven systems handle 10 million — and the hidden complexity you should prepare for.

🔗 Architecture January 22, 2026 14 min read

In This Guide

Event-driven architecture decouples services by having them communicate through events instead of direct API calls. A service publishes an event ("order placed"), and any interested service reacts to it — the publisher doesn't know or care who's listening. We moved one of our client's order processing system from synchronous REST calls to event-driven with Kafka, and what used to be a cascading failure across 6 services when payments went down became a 30-second delay with automatic retry. The system stayed up. But getting there required rethinking how we handle data consistency, debugging, and error recovery.

1. Why Event-Driven? The Coupling Problem

In a traditional request-response system, Service A calls Service B, which calls Service C. If C is down, B times out, which makes A time out, and your user sees an error. This is temporal coupling — services must be available at the same time.

Event-driven architecture breaks this chain. Service A publishes an event and moves on. Services B and C process it whenever they're ready — in milliseconds or in minutes. This isn't just about fault tolerance. It changes how teams work. The payments team can deploy independently because they only consume events — they don't need the orders team to update an API contract.

Dimension Request-Response Event-Driven
Temporal couplingAll services must be up simultaneouslyServices process at their own pace
Knowledge couplingCaller knows the callee's APIPublisher doesn't know consumers
Failure propagationCascading failures across the chainIsolated — failed consumers retry independently
ScalingScale each service in the call chainScale consumers independently per topic
Adding featuresModify existing services, coordinate deploysAdd new consumer — zero changes to publisher
DebuggingFollow the call stackCorrelate across distributed logs (harder)

The debugging row is real. We've spent hours tracing why an order's email didn't send, only to discover the email consumer was 3 minutes behind because a batch job had flooded the topic. With request-response, you'd have gotten an error immediately. That's the trade-off.

2. Core EDA Patterns

Event Notification

The simplest pattern. A service publishes a thin event saying "something happened" — just an ID and event type. Consumers fetch details themselves if needed.

// Event notification — thin payload
{
  "event": "order.placed",
  "orderId": "ord_8x92k",
  "timestamp": "2026-01-22T10:30:00Z",
  "version": 1
}

// Consumer fetches details if it needs them
// GET /api/orders/ord_8x92k

Pro: Small messages, publisher doesn't need to know what consumers want. Con: Creates runtime coupling — consumer must call back to the publisher to get details.

Event-Carried State Transfer

The event carries all the data consumers need. No callbacks. This is what we use for most of our systems.

// Event-carried state — fat payload
{
  "event": "order.placed",
  "orderId": "ord_8x92k",
  "timestamp": "2026-01-22T10:30:00Z",
  "version": 1,
  "data": {
    "customerId": "cust_4fn2",
    "email": "buyer@example.com",
    "items": [
      { "sku": "WIDGET-01", "qty": 3, "price": 29.99 }
    ],
    "total": 89.97,
    "currency": "USD",
    "shippingAddress": { "city": "Mumbai", "zip": "400001" }
  }
}

Pro: True decoupling — consumers never call back. Con: Larger messages, and consumers might cache stale data. We mitigate this by always including a version number and using idempotency keys.

Domain Events vs Integration Events

This distinction matters more than most teams realize:

We keep domain events on an internal bus (in-process or Redis Streams) and only publish integration events to Kafka. This prevents internal refactoring from breaking downstream consumers.

3. Message Brokers Compared

The broker you choose shapes your architecture more than you'd expect. Here's what actually matters after running each in production:

Broker Best For Throughput Retention Ops Complexity
Apache KafkaEvent streaming, replay, high volume1M+ msgs/secConfigurable (days/forever)High (ZooKeeper/KRaft)
RabbitMQTask queues, routing, RPC50K msgs/secUntil consumed (no replay)Low-Medium
Amazon SQS/SNSAWS-native, serverlessNearly unlimited14 days max (SQS)Zero (managed)
Redis StreamsLightweight, already-have-Redis200K msgs/secMemory-limitedLow
NATS JetStreamEdge, IoT, lightweight streaming500K+ msgs/secConfigurableLow
Amazon EventBridgeEvent routing, schema registryHigh (managed)Replay up to 24hrsZero (managed)
Our recommendation: Start with RabbitMQ if you're under 10K events/second and need routing flexibility. Move to Kafka when you need event replay, stream processing, or you're above 50K events/second. Use SQS/SNS if you're all-in on AWS and want zero ops burden. We've seen too many teams start with Kafka when Redis Streams would have been enough for their first two years.

4. Event Sourcing — When the Event IS the Data

Most applications store current state: "Account balance is $500." Event sourcing stores every change that led to that state: "Deposited $1000, Withdrew $300, Deposited $100, Withdrew $300." The current state is derived by replaying events.

This sounds academic until you need it. A fintech client needed to answer "Why does this account show $347.23?" — with a traditional database, you'd look at the balance column and shrug. With event sourcing, you replay the event stream and see every deposit, withdrawal, fee, and correction that produced that number.

Traditional DB (stores current state):
┌─────────────────────────────────────┐
│ accounts table │
│ id: acc_001 │ balance: 500.00 │
└─────────────────────────────────────┘
How did we get here? No idea.

Event Store (stores every change):
┌─────────────────────────────────────────────────┐
│ Stream: account-acc_001 │
├──────┬─────────────────┬────────────┬───────────┤
│ Seq │ Event Type │ Amount │ Balance │
├──────┼─────────────────┼────────────┼───────────┤
│ 1 │ AccountOpened │ +1000.00 │ 1000.00 │
│ 2 │ FundsWithdrawn │ -300.00 │ 700.00 │
│ 3 │ FundsDeposited │ +100.00 │ 800.00 │
│ 4 │ FundsWithdrawn │ -300.00 │ 500.00 │
└──────┴─────────────────┴────────────┴───────────┘
Complete audit trail. Replay to any point in time.

Event Sourcing Implementation (Node.js/TypeScript)

// Event types
type AccountEvent =
  | { type: 'AccountOpened'; balance: number }
  | { type: 'FundsDeposited'; amount: number }
  | { type: 'FundsWithdrawn'; amount: number }
  | { type: 'AccountFrozen'; reason: string };

// State derived from events — never stored directly
interface AccountState {
  id: string;
  balance: number;
  status: 'active' | 'frozen';
  version: number;
}

// Pure function: fold events into state
function applyEvent(state: AccountState, event: AccountEvent): AccountState {
  switch (event.type) {
    case 'AccountOpened':
      return { ...state, balance: event.balance, status: 'active' };
    case 'FundsDeposited':
      return { ...state, balance: state.balance + event.amount };
    case 'FundsWithdrawn':
      return { ...state, balance: state.balance - event.amount };
    case 'AccountFrozen':
      return { ...state, status: 'frozen' };
  }
}

// Rebuild state by replaying all events
function rebuildAccount(events: AccountEvent[]): AccountState {
  const initial: AccountState = { id: '', balance: 0, status: 'active', version: 0 };
  return events.reduce((state, event, idx) => ({
    ...applyEvent(state, event),
    version: idx + 1
  }), initial);
}

When Event Sourcing Makes Sense

Use It When Skip It When
Full audit trail is a requirement (finance, healthcare)Simple CRUD with no history needs
You need to reconstruct past states ("what was the balance on March 3?")Current state is all that matters
Multiple read models from the same data (reporting, search, analytics)Single read pattern
Domain experts think in events ("order placed", "payment received")Your team hasn't worked with eventual consistency before
Snapshot optimization: Replaying 10,000 events on every read is slow. Create snapshots every N events (we use every 100). To rebuild state: load latest snapshot + replay only events after it. This brings read latency from seconds to single-digit milliseconds.

5. CQRS — Separate Reads from Writes

Command Query Responsibility Segregation means using different models for reading and writing data. Commands change state (PlaceOrder, CancelSubscription). Queries read state (GetOrderDetails, ListActiveSubscriptions). In a traditional app, both use the same database and the same model. CQRS splits them.

CQRS Architecture:

  ┌──────────┐             ┌──────────────┐
  │ Commands │──────────▶│ Write Model │
  │ (POST/PUT)│             │ (normalized) │
  └──────────┘             └──────┬───────┘
                                  │ Events
                                  ▼
                         ┌──────────────┐
                         │ Event Bus │
                         └──────┬───────┘
                                  │
                                  ▼
  ┌──────────┐             ┌──────────────┐
  │ Queries │◀─────────── │ Read Model │
  │ (GET) │             │ (denormalized)│
  └──────────┘             └──────────────┘

Write: PostgreSQL (normalized, ACID)
Read: Elasticsearch (search), Redis (dashboard), DynamoDB (API)

Why bother? Because reads and writes have fundamentally different needs. Your write model should be normalized and enforce business rules. Your read model should be denormalized and optimized for the exact queries your UI needs. An e-commerce product page needs: product details, reviews, pricing, inventory, related products. Joining 6 tables on every page load is painful. With CQRS, you pre-compute a single document with everything the page needs.

We use CQRS without event sourcing on most projects. They pair well together, but they're independent patterns. You can have CQRS with a regular database — just use separate tables or databases for reads and writes, with a projection that keeps the read side updated.

6. Eventual Consistency and the Saga Pattern

In distributed event-driven systems, you lose ACID transactions across services. You place an order (Order Service), reserve inventory (Inventory Service), charge the card (Payment Service). What if payment fails after inventory is reserved? You need a way to coordinate rollback across services.

Saga Pattern: Choreography vs Orchestration

Choreography Orchestration
How it worksEach service listens for events and reactsA central orchestrator tells each service what to do
CouplingLow — services only know about eventsHigher — orchestrator knows all participants
VisibilityHard to see the full flowClear — saga state machine in one place
Best for3-4 steps, independent teams5+ steps, complex compensation logic
Our pickSimple flows, early-stage systemsOrder processing, payments, anything with money

Orchestrated Saga Example (Order Flow)

Order Saga — Happy Path:

OrderSaga ──▶ InventoryService: ReserveItems
             ◀── ItemsReserved
         ──▶ PaymentService: ChargeCard
             ◀── PaymentSucceeded
         ──▶ ShippingService: CreateShipment
             ◀── ShipmentCreated
         ──▶ OrderService: ConfirmOrder

Order Saga — Payment Failed (compensating actions):

OrderSaga ──▶ InventoryService: ReserveItems
             ◀── ItemsReserved
         ──▶ PaymentService: ChargeCard
             ◀── PaymentFailed
         ──▶ InventoryService: ReleaseItems (compensate)
             ◀── ItemsReleased
         ──▶ OrderService: RejectOrder
// Saga orchestrator (simplified TypeScript)
class OrderSaga {
  private steps = [
    {
      action: (ctx) => this.inventory.reserve(ctx.items),
      compensate: (ctx) => this.inventory.release(ctx.items)
    },
    {
      action: (ctx) => this.payment.charge(ctx.total, ctx.paymentMethod),
      compensate: (ctx) => this.payment.refund(ctx.paymentId)
    },
    {
      action: (ctx) => this.shipping.create(ctx.orderId, ctx.address),
      compensate: (ctx) => this.shipping.cancel(ctx.shipmentId)
    }
  ];

  async execute(ctx: OrderContext) {
    const completed: number[] = [];

    for (let i = 0; i < this.steps.length; i++) {
      try {
        await this.steps[i].action(ctx);
        completed.push(i);
      } catch (error) {
        // Compensate in reverse order
        for (const j of completed.reverse()) {
          await this.steps[j].compensate(ctx);
        }
        throw new SagaFailedError(i, error);
      }
    }
  }
}

7. Practical Implementation with Kafka

Here's a real implementation pattern we use for order processing. This handles idempotency, dead letter queues, and schema evolution — the three things most tutorials skip.

Producer: Publishing Events with Outbox Pattern

Never publish events directly to Kafka from your application. If the DB write succeeds but the Kafka publish fails, you've got an inconsistent state. Use the outbox pattern: write the event to a database table in the same transaction as the state change, then a separate process publishes from the outbox to Kafka.

-- Outbox table (PostgreSQL)
CREATE TABLE event_outbox (
  id           BIGSERIAL PRIMARY KEY,
  aggregate_type VARCHAR(100) NOT NULL,
  aggregate_id  VARCHAR(100) NOT NULL,
  event_type    VARCHAR(100) NOT NULL,
  payload       JSONB NOT NULL,
  created_at    TIMESTAMPTZ DEFAULT NOW(),
  published_at  TIMESTAMPTZ NULL
);

-- In the same transaction as the order insert:
BEGIN;
  INSERT INTO orders (id, customer_id, total, status)
  VALUES ('ord_8x92k', 'cust_4fn2', 89.97, 'placed');

  INSERT INTO event_outbox (aggregate_type, aggregate_id, event_type, payload)
  VALUES ('Order', 'ord_8x92k', 'order.placed', '{
    "orderId": "ord_8x92k",
    "customerId": "cust_4fn2",
    "total": 89.97,
    "items": [{"sku": "WIDGET-01", "qty": 3}]
  }'::jsonb);
COMMIT;

Consumer: Idempotent Processing

Consumers WILL receive duplicate events. Kafka guarantees at-least-once delivery, not exactly-once. Your consumer must be idempotent — processing the same event twice should produce the same result.

// Idempotent consumer (Node.js + KafkaJS)
import { Kafka } from 'kafkajs';

const kafka = new Kafka({ brokers: ['kafka:9092'] });
const consumer = kafka.consumer({ groupId: 'email-service' });

await consumer.subscribe({ topic: 'order-events', fromBeginning: false });

await consumer.run({
  eachMessage: async ({ topic, partition, message }) => {
    const event = JSON.parse(message.value.toString());
    const eventId = message.headers['event-id']?.toString();

    // Idempotency check — skip if already processed
    const existing = await db.query(
      'SELECT 1 FROM processed_events WHERE event_id = $1', [eventId]
    );
    if (existing.rows.length > 0) return; // duplicate, skip

    try {
      await db.query('BEGIN');

      // Process the event
      if (event.type === 'order.placed') {
        await sendOrderConfirmationEmail(event.data);
      }

      // Record as processed (same transaction)
      await db.query(
        'INSERT INTO processed_events (event_id, processed_at) VALUES ($1, NOW())',
        [eventId]
      );

      await db.query('COMMIT');
    } catch (error) {
      await db.query('ROLLBACK');
      // Don't commit offset — message will be retried
      throw error;
    }
  }
});

Dead Letter Queue (DLQ)

After N retries, move failed events to a DLQ for manual inspection. Don't let a single bad event block the entire partition.

// DLQ wrapper
async function processWithDLQ(event, handler, maxRetries = 3) {
  const retryCount = parseInt(event.headers['retry-count'] || '0');

  try {
    await handler(event);
  } catch (error) {
    if (retryCount >= maxRetries) {
      // Send to dead letter topic
      await producer.send({
        topic: `${event.topic}.dlq`,
        messages: [{
          key: event.key,
          value: event.value,
          headers: {
            ...event.headers,
            'error-message': error.message,
            'failed-at': new Date().toISOString()
          }
        }]
      });
      console.error(`Event sent to DLQ after ${maxRetries} retries`, event);
    } else {
      // Re-publish with incremented retry count
      await producer.send({
        topic: event.topic,
        messages: [{
          key: event.key,
          value: event.value,
          headers: { ...event.headers, 'retry-count': String(retryCount + 1) }
        }]
      });
    }
  }
}

8. Pitfalls That Sink EDA Projects

We've built event-driven systems for multiple clients. Here are the mistakes that hurt most:

Pitfall What Happens Prevention
No schema registryProducer changes event format, consumers break silentlyUse Confluent Schema Registry or AWS Glue Schema Registry
No correlation IDsCan't trace a request across 8 servicesGenerate correlation ID at entry point, propagate in every event header
Ordering assumptions"OrderPaid" arrives before "OrderPlaced"Use partition keys (order ID) for ordering guarantees within a partition
No dead letter queuePoison message blocks the partition foreverDLQ after 3-5 retries with exponential backoff
Event explosion100+ event types, nobody knows what triggers whatEvent catalog + AsyncAPI spec. Review new events like you review API changes.
Testing in isolationUnit tests pass, integration fails at 2amContract tests (Pact) + end-to-end event flow tests with Testcontainers
No backpressureFast producer overwhelms slow consumer, lag grows unboundedMonitor consumer lag, auto-scale consumer groups, set rate limits on producers
The biggest mistake we see: Teams adopt event-driven architecture but keep synchronous mindsets. They publish an event and immediately query the read model expecting updated data. Eventual consistency means there's a delay — milliseconds usually, but sometimes seconds under load. Your UI needs to handle this: show optimistic updates, use "processing" states, or poll for confirmation.

9. When NOT to Use Event-Driven Architecture

EDA adds complexity. It's justified when you get decoupling, scalability, and resilience in return. It's not justified when a simple REST call would do the job.

The hybrid approach we recommend: Use synchronous REST/gRPC for the primary user-facing flow (place order -> confirm). Then publish events asynchronously for side effects (send email, update analytics, trigger fulfillment). This gives you immediate user feedback AND decoupled downstream processing.

10. Frequently Asked Questions

What's the difference between event-driven architecture and message-driven architecture?

Event-driven systems publish facts about what happened ("OrderPlaced") — producers don't care who consumes them. Message-driven systems send commands to specific recipients ("ProcessPayment") — the sender expects a particular receiver to act. Most real systems use both: events for notifications, messages for commands. Kafka is event-oriented, RabbitMQ is message-oriented, though both can do either.

How do I handle event schema evolution without breaking consumers?

Use a schema registry (Confluent or AWS Glue) that enforces backward compatibility. Add new fields as optional — never remove or rename existing ones. Version your event types (order.placed.v2) and run old and new consumers in parallel during migration. We use Avro schemas for strict contracts and JSON Schema for lighter use cases.

Is Kafka overkill for a startup?

Probably, yes. If you're processing under 10,000 events per second, Redis Streams, Amazon SQS, or even PostgreSQL LISTEN/NOTIFY can handle it. Kafka shines at massive scale, stream processing, and event replay — features most startups don't need in year one. Start simple, migrate when you hit actual limits. We've seen too many seed-stage companies spend months on Kafka infrastructure instead of shipping features.

How do I debug issues in an event-driven system?

Three essentials: correlation IDs in every event header (trace a request across all services), centralized logging with structured JSON (we use ELK stack), and distributed tracing (OpenTelemetry with Jaeger). Monitor consumer lag religiously — if a consumer falls behind, you'll get stale data and complaints before you notice. Kafka Manager or Confluent Control Center make lag visible.

Can I use event-driven architecture with a monolith?

Absolutely — and this is often the smartest starting point. Use an in-process event bus (like MediatR in .NET or a simple EventEmitter in Node.js) within your monolith. Modules publish and subscribe to events without HTTP calls. When you eventually extract a module into its own service, you replace the in-process bus with Kafka or RabbitMQ. The event contracts stay the same. We call this "monolith with event seams."

PI
Pillai Infotech Engineering Team

We build distributed systems for clients ranging from fintech startups to enterprise logistics platforms. Our event-driven implementations process millions of events daily across Kafka, RabbitMQ, and AWS-native stacks. This guide reflects patterns we've validated in production — not theoretical ideals.

Related Articles

Microservices vs Monolith: When to Make the Switch → Domain-Driven Design (DDD): A Practical Guide for Developers → API Gateway Patterns for Microservices Architecture →