Api Gateway Patterns Microservices

Q: Should I use GraphQL as my API gateway?

GraphQL federation works when frontend flexibility matters. For B2B APIs, REST with an edge gateway is simpler. We use GraphQL gateways for B2C products.

Q: How much latency does an API gateway add?

Typically 1-5ms for routing and authentication. The latency savings from response caching usually more than offset the added hop.

API Gateway Patterns for Microservices Architecture

Your mobile app makes 12 API calls on one screen, each to a different microservice. An API gateway turns that into a single call — and handles auth, rate limiting, and caching along the way.

🔌 Architecture January 19, 2026 12 min read

In This Guide

1. What an API Gateway Does (and Doesn't Do)
2. Gateway Patterns: Edge, BFF, and Mesh
3. Gateway Solutions Compared
4. Routing and Request Transformation
5. Authentication at the Gateway
6. Rate Limiting and Throttling
7. Circuit Breakers and Resilience
8. Implementation: Kong + Nginx
9. Frequently Asked Questions

An API gateway sits between your clients and your microservices. It's a single entry point that handles cross-cutting concerns — authentication, rate limiting, request routing, response caching, and protocol translation. Without it, every microservice must implement these concerns independently, and every client must know how to reach every service. We moved a client from direct service-to-service calls to a Kong gateway, and their mobile app went from 14 API calls per screen to 2, with 60% lower latency because the gateway handles aggregation server-side.

1. What an API Gateway Does (and Doesn't Do)

Gateway Responsibility	Without Gateway	With Gateway
Authentication	Every service validates JWT tokens	Gateway validates once, forwards user context
Rate limiting	Each service implements its own limits	Centralized rate limiting per client/API key
Request routing	Client knows every service URL	Client hits one URL, gateway routes internally
Response caching	Each service manages its own cache headers	Gateway caches responses, reducing backend load
Protocol translation	Client must speak gRPC, GraphQL, REST	Client speaks REST; gateway translates to gRPC internally
Request aggregation	Client makes N calls for one screen	Gateway composites responses from multiple services

What a gateway should NOT do:

Business logic — keep it in services. The gateway routes and transforms; it doesn't decide.
Data joins across services — that's an anti-pattern. Use a BFF (Backend for Frontend) instead.
Become a "god service" that everything depends on — keep it thin and fast.

2. Gateway Patterns: Edge, BFF, and Mesh

Edge Gateway (Single Entry Point)

One gateway handles all external traffic. It's the simplest pattern and works well when you have a single client type (just a web app) or when all clients need the same API shape.

        
Edge Gateway Pattern:

 Web App ──┐

            ├──▶ API Gateway ──▶ [User Service]

 Mobile ───┤                  ├──▶ [Order Service]

            │                  ├──▶ [Product Service]

 3rd Party ┘                  └──▶ [Payment Service]

Backend for Frontend (BFF)

A separate gateway per client type. The mobile BFF returns smaller payloads and fewer images. The web BFF returns richer data. The admin BFF has different auth requirements. This is the pattern we use most — different clients have fundamentally different needs.

        
BFF Pattern:

 Web App ────▶ Web BFF ────────┐

               (rich payloads)  ├──▶ [User Service]

                               ├──▶ [Order Service]

 Mobile ─────▶ Mobile BFF ─────┤     ...

               (slim payloads)  │

                               │

 Admin ──────▶ Admin BFF ──────┘

               (internal APIs)

Service Mesh (Internal Gateway)

A service mesh (Istio, Linkerd) adds gateway-like functionality to internal service-to-service communication — mTLS, retries, circuit breakers, observability. It's a sidecar proxy next to every service, not a central gateway. Use this when you have 20+ microservices and need fine-grained traffic control internally.

Our recommendation: Start with a single edge gateway. Add BFF gateways when mobile and web teams diverge. Add a service mesh only when internal traffic management becomes painful (usually 20+ services). Most teams don't need Istio — it adds significant operational complexity. A simple HTTP client library with retries and circuit breakers covers 80% of what a mesh does.

3. Gateway Solutions Compared

Solution	Type	Best For	Cost	Complexity
Kong	Open-source / Enterprise	Plugin ecosystem, Kubernetes-native	Free (OSS) / $35K+/yr	Medium
AWS API Gateway	Managed (serverless)	AWS-native, Lambda integration	$3.50/M requests	Low
Nginx + Lua	Self-hosted	High performance, full control	Free (ops cost)	High
Traefik	Open-source	Docker/K8s auto-discovery, Let's Encrypt	Free (OSS)	Low-Medium
Envoy	Open-source (CNCF)	Service mesh sidecar, gRPC-native	Free (ops cost)	High
Express/Fastify (custom)	Build your own	BFF pattern, custom aggregation	Dev time	Medium

4. Routing and Request Transformation

The gateway maps public API paths to internal service URLs. This decouples your public API contract from your internal service structure — you can split, merge, or rename services without changing the public API.

        
# Kong declarative config (kong.yml)

services:

  - name: user-service

    url: http://user-svc.internal:3001

    routes:

      - name: users-api

        paths: ["/api/v1/users"]

        strip_path: true

        methods: [GET, POST, PUT, DELETE]

  - name: order-service

    url: http://order-svc.internal:3002

    routes:

      - name: orders-api

        paths: ["/api/v1/orders"]

        strip_path: true

  - name: product-service

    url: http://product-svc.internal:3003

    routes:

      - name: products-api

        paths: ["/api/v1/products"]

        strip_path: true

# Client calls: api.example.com/api/v1/orders/123

# Gateway routes to: order-svc.internal:3002/123

Request/Response Transformation

Gateways can modify requests and responses in flight. Common transformations:

Header injection: Add X-Request-ID, X-User-ID (from JWT) to forwarded requests
Response filtering: Strip internal fields (database IDs, debug info) from public responses
API versioning: Route /v1/ to old service, /v2/ to new service
Payload shaping: Mobile BFF returns 5 fields, web BFF returns 20 from the same service

5. Authentication at the Gateway

Gateway authentication validates tokens once and forwards user context to downstream services. This eliminates redundant validation and centralizes your auth logic.

        
Auth flow through gateway:

Client ──▶ Gateway ──▶ Downstream Service

1. Client sends: Authorization: Bearer eyJhbG...

2. Gateway validates JWT (signature, expiry, issuer)

3. Gateway extracts claims: { userId: "u_123", role: "admin" }

4. Gateway forwards:

   X-User-ID: u_123

   X-User-Role: admin

   X-Request-ID: req_abc123

5. Downstream trusts these headers (internal network only)

6. Downstream skips JWT validation entirely

        
# Kong JWT plugin configuration

plugins:

  - name: jwt

    config:

      key_claim_name: iss

      claims_to_verify: [exp]

      header_names: [Authorization]

  - name: request-transformer

    config:

      add:

        headers:

          - "X-User-ID:$(jwt.payload.sub)"

          - "X-User-Role:$(jwt.payload.role)"

# Public endpoints (no auth required)

  - name: public-routes

    paths: ["/api/v1/health", "/api/v1/products"]

    plugins:

      - name: jwt

        enabled: false

Auth Strategy	When to Use	Gateway Handles
JWT validation	Stateless, most common	Signature check, expiry, forwarding claims
API key	Machine-to-machine, B2B	Key lookup, rate limit per key
OAuth2 / OIDC	Social login, SSO	Token exchange, redirect flows
mTLS	Service-to-service	Certificate validation, service identity

6. Rate Limiting and Throttling

Rate limiting protects your services from abuse and ensures fair usage. The gateway is the natural place for it — one central enforcement point instead of every service implementing its own.

Algorithm	How It Works	Best For
Fixed Window	100 requests per minute, counter resets at :00	Simple, but allows bursts at window boundaries
Sliding Window	Weighted average of current and previous windows	Smooth limiting, no burst at boundaries
Token Bucket	Bucket fills at fixed rate; each request takes a token	Allows controlled bursts (bucket can hold N tokens)
Leaky Bucket	Requests queue up and drain at a fixed rate	Smooth output rate, good for downstream protection

        
# Kong rate limiting plugin

plugins:

  - name: rate-limiting

    config:

      minute: 60           # 60 requests per minute

      hour: 1000           # 1000 per hour

      policy: redis        # shared counter across gateway instances

      redis_host: redis.internal

      limit_by: credential # per API key (or: ip, consumer, header)

      hide_client_headers: false

# Response headers clients receive:

# X-RateLimit-Limit-Minute: 60

# X-RateLimit-Remaining-Minute: 42

# Retry-After: 18 (when limit exceeded)

7. Circuit Breakers and Resilience

When a downstream service is failing, the gateway should fail fast instead of waiting for timeouts. A circuit breaker tracks failure rates and "opens" when too many requests fail — subsequent requests return immediately with an error instead of queuing up and exhausting connections.

        
Circuit Breaker States:

┌──────────┐  failures > threshold  ┌──────────┐

│  CLOSED  │ ──────────────────────▶│   OPEN   │

│ (normal) │                       │ (reject) │

└──────────┘                       └────┬─────┘

     ▲                                 │

     │ success                 after timeout│

     │                                 ▼

     │                 ┌───────────────┐

     └──────────────── │  HALF-OPEN    │

                     │ (test 1 req)  │

                     └───────────────┘

Resilience patterns we implement at the gateway layer:

Timeouts: 5-second max per upstream call. Never let a slow service block the gateway.
Retries: 1-2 retries with exponential backoff for 5xx errors. NOT for 4xx (those are client errors).
Circuit breaker: Open after 5 consecutive failures. Half-open after 30 seconds.
Bulkhead: Limit connections per upstream service. One slow service shouldn't consume all gateway connections.
Fallback: Return cached data or a degraded response when a service is down.

8. Implementation: Kong + Nginx

Here's the production gateway stack we deploy for most clients: Kong (built on Nginx/OpenResty) running in Docker with PostgreSQL for configuration storage and Redis for rate limiting state.

        
# docker-compose.yml — Kong API Gateway

version: '3.8'

services:

  kong-db:

    image: postgres:15

    environment:

      POSTGRES_DB: kong

      POSTGRES_USER: kong

      POSTGRES_PASSWORD: ${KONG_DB_PASSWORD}

    volumes:

      - kong_data:/var/lib/postgresql/data

  kong:

    image: kong:3.6

    environment:

      KONG_DATABASE: postgres

      KONG_PG_HOST: kong-db

      KONG_PG_PASSWORD: ${KONG_DB_PASSWORD}

      KONG_PROXY_LISTEN: "0.0.0.0:8000, 0.0.0.0:8443 ssl"

      KONG_ADMIN_LISTEN: "0.0.0.0:8001"

      KONG_LOG_LEVEL: info

    ports:

      - "80:8000"

      - "443:8443"

    depends_on: [kong-db, redis]

  redis:

    image: redis:7-alpine

    command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru

Our standard plugin stack: JWT auth + rate-limiting (Redis-backed) + request-transformer (add headers) + cors + prometheus (metrics) + file-log (structured JSON). This covers 95% of gateway needs without custom code. The remaining 5% we handle with custom Lua plugins or a thin BFF service.

9. Frequently Asked Questions

Isn't an API gateway a single point of failure?

Yes, if you run a single instance. In production, run at least two gateway instances behind a cloud load balancer (ALB, NLB) with health checks. Kong and Nginx handle this natively — multiple instances share the same configuration database. We typically run 3 gateway instances across availability zones for production workloads.

Should I use GraphQL as my API gateway?

GraphQL federation (Apollo Gateway) can work as a gateway layer — it stitches schemas from multiple services into one graph. This works well when your frontend team wants flexible queries. But it adds complexity: schema stitching, query planning, and performance concerns with deeply nested queries. We use it for B2C products where frontend flexibility matters. For B2B APIs, REST with an edge gateway is simpler and better documented.

How much latency does an API gateway add?

Typically 1-5ms for routing and authentication. Kong adds about 1-2ms per request for basic routing. JWT validation adds another 1-2ms. Rate limiting with Redis adds under 1ms. The total overhead is negligible compared to the 50-200ms your services spend on business logic and database queries. The latency savings from response caching at the gateway usually more than offset the added hop.

Do I need an API gateway if I only have a monolith?

Probably not a full gateway, but a reverse proxy (Nginx, Caddy) gives you most benefits with less complexity: SSL termination, static file serving, gzip compression, rate limiting, and basic auth. When you split your monolith into services later, upgrading the reverse proxy to a full gateway (Kong, Traefik) is straightforward.

Pillai Infotech Engineering Team

We've deployed API gateways for clients ranging from early-stage startups (Nginx reverse proxy) to enterprise platforms (Kong clusters handling 50K+ RPS). Our approach: start with the simplest solution that works, add complexity when the metrics demand it.

System Design: Architecture Principles for Scalable Systems → Microservices vs Monolith: When to Make the Switch → API Security Best Practices for 2026 →