In This Guide
- 1. What an API Gateway Does (and Doesn't Do)
- 2. Gateway Patterns: Edge, BFF, and Mesh
- 3. Gateway Solutions Compared
- 4. Routing and Request Transformation
- 5. Authentication at the Gateway
- 6. Rate Limiting and Throttling
- 7. Circuit Breakers and Resilience
- 8. Implementation: Kong + Nginx
- 9. Frequently Asked Questions
An API gateway sits between your clients and your microservices. It's a single entry point that handles cross-cutting concerns — authentication, rate limiting, request routing, response caching, and protocol translation. Without it, every microservice must implement these concerns independently, and every client must know how to reach every service. We moved a client from direct service-to-service calls to a Kong gateway, and their mobile app went from 14 API calls per screen to 2, with 60% lower latency because the gateway handles aggregation server-side.
1. What an API Gateway Does (and Doesn't Do)
| Gateway Responsibility | Without Gateway | With Gateway |
|---|---|---|
| Authentication | Every service validates JWT tokens | Gateway validates once, forwards user context |
| Rate limiting | Each service implements its own limits | Centralized rate limiting per client/API key |
| Request routing | Client knows every service URL | Client hits one URL, gateway routes internally |
| Response caching | Each service manages its own cache headers | Gateway caches responses, reducing backend load |
| Protocol translation | Client must speak gRPC, GraphQL, REST | Client speaks REST; gateway translates to gRPC internally |
| Request aggregation | Client makes N calls for one screen | Gateway composites responses from multiple services |
What a gateway should NOT do:
- Business logic — keep it in services. The gateway routes and transforms; it doesn't decide.
- Data joins across services — that's an anti-pattern. Use a BFF (Backend for Frontend) instead.
- Become a "god service" that everything depends on — keep it thin and fast.
2. Gateway Patterns: Edge, BFF, and Mesh
Edge Gateway (Single Entry Point)
One gateway handles all external traffic. It's the simplest pattern and works well when you have a single client type (just a web app) or when all clients need the same API shape.
Edge Gateway Pattern:
Web App ──┐
├──▶ API Gateway ──▶ [User Service]
Mobile ───┤ ├──▶ [Order Service]
│ ├──▶ [Product Service]
3rd Party ┘ └──▶ [Payment Service]
Backend for Frontend (BFF)
A separate gateway per client type. The mobile BFF returns smaller payloads and fewer images. The web BFF returns richer data. The admin BFF has different auth requirements. This is the pattern we use most — different clients have fundamentally different needs.
BFF Pattern:
Web App ────▶ Web BFF ────────┐
(rich payloads) ├──▶ [User Service]
├──▶ [Order Service]
Mobile ─────▶ Mobile BFF ─────┤ ...
(slim payloads) │
│
Admin ──────▶ Admin BFF ──────┘
(internal APIs)
Service Mesh (Internal Gateway)
A service mesh (Istio, Linkerd) adds gateway-like functionality to internal service-to-service communication — mTLS, retries, circuit breakers, observability. It's a sidecar proxy next to every service, not a central gateway. Use this when you have 20+ microservices and need fine-grained traffic control internally.
3. Gateway Solutions Compared
| Solution | Type | Best For | Cost | Complexity |
|---|---|---|---|---|
| Kong | Open-source / Enterprise | Plugin ecosystem, Kubernetes-native | Free (OSS) / $35K+/yr | Medium |
| AWS API Gateway | Managed (serverless) | AWS-native, Lambda integration | $3.50/M requests | Low |
| Nginx + Lua | Self-hosted | High performance, full control | Free (ops cost) | High |
| Traefik | Open-source | Docker/K8s auto-discovery, Let's Encrypt | Free (OSS) | Low-Medium |
| Envoy | Open-source (CNCF) | Service mesh sidecar, gRPC-native | Free (ops cost) | High |
| Express/Fastify (custom) | Build your own | BFF pattern, custom aggregation | Dev time | Medium |
4. Routing and Request Transformation
The gateway maps public API paths to internal service URLs. This decouples your public API contract from your internal service structure — you can split, merge, or rename services without changing the public API.
# Kong declarative config (kong.yml)
services:
- name: user-service
url: http://user-svc.internal:3001
routes:
- name: users-api
paths: ["/api/v1/users"]
strip_path: true
methods: [GET, POST, PUT, DELETE]
- name: order-service
url: http://order-svc.internal:3002
routes:
- name: orders-api
paths: ["/api/v1/orders"]
strip_path: true
- name: product-service
url: http://product-svc.internal:3003
routes:
- name: products-api
paths: ["/api/v1/products"]
strip_path: true
# Client calls: api.example.com/api/v1/orders/123
# Gateway routes to: order-svc.internal:3002/123
Request/Response Transformation
Gateways can modify requests and responses in flight. Common transformations:
- Header injection: Add
X-Request-ID,X-User-ID(from JWT) to forwarded requests - Response filtering: Strip internal fields (database IDs, debug info) from public responses
- API versioning: Route
/v1/to old service,/v2/to new service - Payload shaping: Mobile BFF returns 5 fields, web BFF returns 20 from the same service
5. Authentication at the Gateway
Gateway authentication validates tokens once and forwards user context to downstream services. This eliminates redundant validation and centralizes your auth logic.
Auth flow through gateway:
Client ──▶ Gateway ──▶ Downstream Service
1. Client sends: Authorization: Bearer eyJhbG...
2. Gateway validates JWT (signature, expiry, issuer)
3. Gateway extracts claims: { userId: "u_123", role: "admin" }
4. Gateway forwards:
X-User-ID: u_123
X-User-Role: admin
X-Request-ID: req_abc123
5. Downstream trusts these headers (internal network only)
6. Downstream skips JWT validation entirely
# Kong JWT plugin configuration
plugins:
- name: jwt
config:
key_claim_name: iss
claims_to_verify: [exp]
header_names: [Authorization]
- name: request-transformer
config:
add:
headers:
- "X-User-ID:$(jwt.payload.sub)"
- "X-User-Role:$(jwt.payload.role)"
# Public endpoints (no auth required)
- name: public-routes
paths: ["/api/v1/health", "/api/v1/products"]
plugins:
- name: jwt
enabled: false
| Auth Strategy | When to Use | Gateway Handles |
|---|---|---|
| JWT validation | Stateless, most common | Signature check, expiry, forwarding claims |
| API key | Machine-to-machine, B2B | Key lookup, rate limit per key |
| OAuth2 / OIDC | Social login, SSO | Token exchange, redirect flows |
| mTLS | Service-to-service | Certificate validation, service identity |
6. Rate Limiting and Throttling
Rate limiting protects your services from abuse and ensures fair usage. The gateway is the natural place for it — one central enforcement point instead of every service implementing its own.
| Algorithm | How It Works | Best For |
|---|---|---|
| Fixed Window | 100 requests per minute, counter resets at :00 | Simple, but allows bursts at window boundaries |
| Sliding Window | Weighted average of current and previous windows | Smooth limiting, no burst at boundaries |
| Token Bucket | Bucket fills at fixed rate; each request takes a token | Allows controlled bursts (bucket can hold N tokens) |
| Leaky Bucket | Requests queue up and drain at a fixed rate | Smooth output rate, good for downstream protection |
# Kong rate limiting plugin
plugins:
- name: rate-limiting
config:
minute: 60 # 60 requests per minute
hour: 1000 # 1000 per hour
policy: redis # shared counter across gateway instances
redis_host: redis.internal
limit_by: credential # per API key (or: ip, consumer, header)
hide_client_headers: false
# Response headers clients receive:
# X-RateLimit-Limit-Minute: 60
# X-RateLimit-Remaining-Minute: 42
# Retry-After: 18 (when limit exceeded)
7. Circuit Breakers and Resilience
When a downstream service is failing, the gateway should fail fast instead of waiting for timeouts. A circuit breaker tracks failure rates and "opens" when too many requests fail — subsequent requests return immediately with an error instead of queuing up and exhausting connections.
Circuit Breaker States:
┌──────────┐ failures > threshold ┌──────────┐
│ CLOSED │ ──────────────────────▶│ OPEN │
│ (normal) │ │ (reject) │
└──────────┘ └────┬─────┘
▲ │
│ success after timeout│
│ ▼
│ ┌───────────────┐
└──────────────── │ HALF-OPEN │
│ (test 1 req) │
└───────────────┘
Resilience patterns we implement at the gateway layer:
- Timeouts: 5-second max per upstream call. Never let a slow service block the gateway.
- Retries: 1-2 retries with exponential backoff for 5xx errors. NOT for 4xx (those are client errors).
- Circuit breaker: Open after 5 consecutive failures. Half-open after 30 seconds.
- Bulkhead: Limit connections per upstream service. One slow service shouldn't consume all gateway connections.
- Fallback: Return cached data or a degraded response when a service is down.
8. Implementation: Kong + Nginx
Here's the production gateway stack we deploy for most clients: Kong (built on Nginx/OpenResty) running in Docker with PostgreSQL for configuration storage and Redis for rate limiting state.
# docker-compose.yml — Kong API Gateway
version: '3.8'
services:
kong-db:
image: postgres:15
environment:
POSTGRES_DB: kong
POSTGRES_USER: kong
POSTGRES_PASSWORD: ${KONG_DB_PASSWORD}
volumes:
- kong_data:/var/lib/postgresql/data
kong:
image: kong:3.6
environment:
KONG_DATABASE: postgres
KONG_PG_HOST: kong-db
KONG_PG_PASSWORD: ${KONG_DB_PASSWORD}
KONG_PROXY_LISTEN: "0.0.0.0:8000, 0.0.0.0:8443 ssl"
KONG_ADMIN_LISTEN: "0.0.0.0:8001"
KONG_LOG_LEVEL: info
ports:
- "80:8000"
- "443:8443"
depends_on: [kong-db, redis]
redis:
image: redis:7-alpine
command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru
9. Frequently Asked Questions
Isn't an API gateway a single point of failure?
Yes, if you run a single instance. In production, run at least two gateway instances behind a cloud load balancer (ALB, NLB) with health checks. Kong and Nginx handle this natively — multiple instances share the same configuration database. We typically run 3 gateway instances across availability zones for production workloads.
Should I use GraphQL as my API gateway?
GraphQL federation (Apollo Gateway) can work as a gateway layer — it stitches schemas from multiple services into one graph. This works well when your frontend team wants flexible queries. But it adds complexity: schema stitching, query planning, and performance concerns with deeply nested queries. We use it for B2C products where frontend flexibility matters. For B2B APIs, REST with an edge gateway is simpler and better documented.
How much latency does an API gateway add?
Typically 1-5ms for routing and authentication. Kong adds about 1-2ms per request for basic routing. JWT validation adds another 1-2ms. Rate limiting with Redis adds under 1ms. The total overhead is negligible compared to the 50-200ms your services spend on business logic and database queries. The latency savings from response caching at the gateway usually more than offset the added hop.
Do I need an API gateway if I only have a monolith?
Probably not a full gateway, but a reverse proxy (Nginx, Caddy) gives you most benefits with less complexity: SSL termination, static file serving, gzip compression, rate limiting, and basic auth. When you split your monolith into services later, upgrading the reverse proxy to a full gateway (Kong, Traefik) is straightforward.
We've deployed API gateways for clients ranging from early-stage startups (Nginx reverse proxy) to enterprise platforms (Kong clusters handling 50K+ RPS). Our approach: start with the simplest solution that works, add complexity when the metrics demand it.