System Design Interview Guide | Pillai Infotech LLP

Q: How many users can a single server handle?

A well-optimized server on 4 cores handles 5,000-10,000 RPS, supporting 50,000-100,000 daily active users with proper caching. Measure before scaling.

Q: When should I switch from a monolith to microservices?

When teams block each other on deploys, one module needs different scaling, or the codebase is too large. Don't switch for hype — switch for specific problems.

Q: SQL or NoSQL — how do I decide?

Default to PostgreSQL. It handles 90% of use cases. Use NoSQL for specific needs: DynamoDB for serverless scale, MongoDB for schema-less data, Redis for caching.

Q: How do I handle real-time features like notifications or chat?

WebSockets for persistent connections, Server-Sent Events for one-way updates, long-polling as fallback. Use Redis Pub/Sub for horizontal scaling of WebSocket servers.

Q: What's the cheapest way to run a scalable system?

One VPS ($20/month) with PostgreSQL, Redis, Nginx, and CloudFlare's free CDN. We've run 50K DAU production systems on $100/month infrastructure.

In This Guide

1. The Fundamentals That Actually Matter
2. Scaling — Vertical vs Horizontal
3. Load Balancing Strategies
4. Caching — The Biggest Performance Win
5. Database Selection and Sharding
6. Message Queues and Async Processing
7. CDN and Edge Architecture
8. Design Walkthrough: URL Shortener at Scale
9. Common System Design Mistakes
10. Frequently Asked Questions

System design is the art of making architecture decisions under uncertainty. You're choosing databases before you know your query patterns, picking queue systems before you understand your throughput, and designing APIs before the product team finalizes requirements. We've designed systems for clients that scale from zero to millions of requests — and the biggest lesson is that premature optimization kills more projects than lack of optimization. This guide covers the building blocks you actually need, when to use them, and when to wait.

1. The Fundamentals That Actually Matter

Before diving into specific components, understand the constraints every distributed system faces:

Principle	What It Means	Real Impact
CAP Theorem	Pick 2 of 3: Consistency, Availability, Partition tolerance	Network partitions WILL happen. Choose CP (banking) or AP (social media feeds)
Latency numbers	L1 cache: 1ns, RAM: 100ns, SSD: 100µs, Network: 150ms	A disk read is 100,000x slower than L1 cache. Cache everything you can.
Back-of-envelope math	Estimate before building: QPS, storage, bandwidth	1M DAU × 10 req/user × 1KB = 10GB/day traffic. Can one server handle it? Yes.
Single point of failure	Every component that isn't redundant is a SPOF	One database, one load balancer, one DNS provider = downtime when any fails
Stateless > Stateful	Servers that hold no session state are trivially scalable	Move state to external stores (Redis, DB). Any server can handle any request.

2. Scaling — Vertical vs Horizontal

Vertical scaling (bigger server) is simpler and should be your first move. A single modern server with 64 cores, 256GB RAM, and NVMe SSDs handles more traffic than most startups will ever see. Don't add complexity until you need it.

	Vertical (Scale Up)	Horizontal (Scale Out)
How	More CPU, RAM, disk on one machine	More machines behind a load balancer
Complexity	Zero code changes	Requires stateless design, distributed systems knowledge
Limit	Hardware ceiling (~256 cores, ~24TB RAM)	Practically unlimited
Cost curve	Exponential (2x specs = 3-4x cost)	Linear (2x servers = 2x cost)
When to use	< 10K RPS, databases, caches	> 10K RPS, stateless app servers, read replicas

        
Scaling progression (what we recommend to clients):

Stage 1: Single Server (0-1K users)

App + DB + Cache on one machine

Cost: $50-200/mo

Stage 2: Separate DB (1K-10K users)

App Server <--> Database Server + Redis

Cost: $200-500/mo

Stage 3: Load Balanced (10K-100K users)

LB --> [App1, App2, App3] --> DB Primary + Read Replica + Redis

Cost: $500-2K/mo

Stage 4: Full Distribution (100K+ users)

CDN --> LB --> [App Pool] --> [DB Cluster + Sharding] + [Cache Cluster]

                           --> [Message Queue] --> [Worker Pool]

Cost: $2K-20K/mo

3. Load Balancing Strategies

A load balancer distributes incoming requests across multiple servers. The algorithm you choose affects performance, session handling, and failure behavior.

Algorithm	How It Works	Best For	Watch Out
Round Robin	Request 1 → Server A, Request 2 → Server B, ...	Uniform request cost, stateless servers	Ignores server health and load
Least Connections	Send to server with fewest active connections	Variable request duration (websockets, uploads)	New servers get hammered initially
Weighted Round Robin	Bigger servers get proportionally more requests	Mixed hardware, gradual canary deploys	Weights need manual tuning
IP Hash	Same client IP always goes to same server	Sticky sessions without cookies	Uneven distribution behind NAT/proxy
Consistent Hashing	Hash key determines server, minimal redistribution on changes	Cache clusters, database sharding	Need virtual nodes for even distribution

        
# Nginx load balancer config — our standard setup

upstream app_servers {

  least_conn;  # or: round_robin, ip_hash

  server app1.internal:3000 weight=3;  # 8-core, gets 3x traffic

  server app2.internal:3000 weight=2;  # 4-core

  server app3.internal:3000 weight=2;  # 4-core

  server app4.internal:3000 backup;    # only when others are down

  # Health checks

  keepalive 32;

}

server {

  listen 443 ssl http2;

  location / {

    proxy_pass http://app_servers;

    proxy_set_header X-Real-IP $remote_addr;

    proxy_set_header X-Request-ID $request_id;

    proxy_next_upstream error timeout http_503;

    proxy_connect_timeout 5s;

    proxy_read_timeout 30s;

  }

}

4. Caching — The Biggest Performance Win

Caching is the single most effective way to improve system performance. A cache hit returns data in microseconds instead of milliseconds (database) or seconds (external API). We've reduced P99 latency from 800ms to 12ms for a client's product catalog by adding a two-layer cache.

Cache Layer	Latency	Tool	What to Cache
Browser	0ms (already loaded)	Cache-Control headers	Static assets, API responses
CDN	5-50ms	CloudFlare, Fastly	Images, CSS/JS, public API responses
Application (in-process)	~1µs	Node-cache, Guava, in-memory map	Config, feature flags, hot data
Distributed cache	0.5-2ms	Redis, Memcached	Sessions, DB query results, computed data
Database query cache	0.1-1ms	MySQL query cache, pg_stat	Repeated identical queries

Cache Invalidation Strategies

Phil Karlton famously said there are only two hard things in computer science: cache invalidation and naming things. He was right about the first one.

Strategy	How It Works	Trade-off
Cache-Aside	App checks cache first, falls back to DB, writes to cache	Cache miss = extra latency. Most common pattern — we use this by default.
Write-Through	Every write goes to cache AND DB simultaneously	Always consistent, but writes are slower
Write-Behind	Write to cache immediately, batch-write to DB later	Fast writes but risk data loss if cache crashes
TTL-based	Data expires after N seconds, re-fetched on next request	Simple but stale data for up to TTL duration

        
// Cache-aside pattern (Node.js + Redis)

async function getProduct(id: string): Promise<Product> {

  // 1. Check cache first

  const cached = await redis.get(`product:${id}`);

  if (cached) return JSON.parse(cached);

  // 2. Cache miss — query database

  const product = await db.query('SELECT * FROM products WHERE id = $1', [id]);

  if (!product) throw new NotFoundError('Product', id);

  // 3. Write to cache with TTL

  await redis.setex(`product:${id}`, 3600, JSON.stringify(product));

  return product;

}

// Invalidate on update

async function updateProduct(id: string, data: Partial<Product>): Promise<void> {

  await db.query('UPDATE products SET ... WHERE id = $1', [id]);

  await redis.del(`product:${id}`);  // delete, don't update

}

5. Database Selection and Sharding

The database you choose limits your architecture more than any other decision. Here's how we think about it:

Need	Database	Why
Transactions + relationships	PostgreSQL	ACID, JSON support, extensions. Our default choice for 90% of projects.
Flexible schema + documents	MongoDB	Schema-less, horizontal scaling. Good for content, catalogs, logs.
Key-value + caching	Redis	Sub-millisecond reads. Sessions, rate limiting, leaderboards.
Full-text search	Elasticsearch	Inverted index, relevance scoring, faceted search.
Time series data	TimescaleDB / InfluxDB	Optimized for write-heavy, time-ordered data. Metrics, IoT, logs.
Graph relationships	Neo4j	Traversal queries. Social networks, recommendations, fraud detection.
Massive write throughput	Cassandra / ScyllaDB	Write-optimized, multi-region. Messaging, activity feeds.

Database Sharding

Sharding splits your database across multiple servers. Each shard holds a subset of the data. It's necessary when a single database can't handle your write volume or storage needs — but it adds enormous complexity. Don't shard until you've exhausted read replicas, query optimization, and caching.

Shard Key Strategy	Example	Risk
Hash-based	`hash(user_id) % num_shards`	Even distribution, but re-sharding requires data migration
Range-based	Users A-M → Shard 1, N-Z → Shard 2	Hot spots if distribution is uneven
Geography-based	US users → US shard, EU → EU shard	Cross-region queries are slow. Good for data residency.
Tenant-based	Each company gets its own shard	Large tenants become hot shards. Works well for multi-tenant SaaS.

6. Message Queues and Async Processing

Message queues decouple producers from consumers. The user clicks "send email" — your API returns 200ms later while the actual email is processed asynchronously. This pattern turns a 3-second synchronous operation into a 200ms API response plus a background task.

        
Without queue (synchronous):

User → API → Generate PDF → Send Email → Log Analytics → Response

Total: 3200ms (user waits the whole time)

With queue (asynchronous):

User → API → Enqueue Tasks → Response (200ms)

               ↓

          Queue → Worker 1: Generate PDF (1500ms)

                 → Worker 2: Send Email (800ms)

                 → Worker 3: Log Analytics (200ms)

When to use a message queue:

Long-running tasks (PDF generation, image processing, report building)
Third-party API calls that might be slow or fail (email, SMS, payment webhooks)
Smoothing traffic spikes (queue absorbs burst, workers process at constant rate)
Decoupling services so they can scale independently

For broker comparison, see our Event-Driven Architecture guide which covers Kafka, RabbitMQ, SQS, and Redis Streams in detail.

7. CDN and Edge Architecture

A CDN (Content Delivery Network) caches your content on servers worldwide. A user in Mumbai gets your site from a Mumbai edge server (5ms) instead of your US origin server (150ms). For a global audience, this is the single biggest performance improvement you can make.

What to Cache at Edge	TTL	Invalidation
Static assets (JS, CSS, images)	1 year (with content hash in filename)	New filename on deploy — no purge needed
Public API responses	30-300 seconds	Purge API on data change
HTML pages	60-3600 seconds (depends on freshness needs)	Stale-while-revalidate for best UX
User-specific data	DON'T cache at CDN	Use Cache-Control: private

8. Design Walkthrough: URL Shortener at Scale

Let's apply these principles to a real system. We'll design a URL shortener that handles 100M URLs and 1B redirects per month.

Back-of-Envelope Math

Writes: 100M URLs/month = ~40 URLs/second
Reads: 1B redirects/month = ~400 reads/second (10:1 read-write ratio)
Storage: 100M URLs × 500 bytes avg = 50GB/year
Bandwidth: 400 RPS × 500 bytes = 200KB/s (trivial)

This is modest. A single PostgreSQL instance handles this easily. The bottleneck is redirect latency — every millisecond of delay is a bad user experience.

        
URL Shortener Architecture:

             ┌────────────┐

  Client ───▶│ CloudFlare  │ (CDN cache for popular URLs)

             └─────┬──────┘

                   │ cache miss

             ┌─────▼──────┐

             │ Load Balancer│

             └─────┬──────┘

                   │

      ┌──────────┼──────────┐

      ▼          ▼          ▼

  ┌──────┐  ┌──────┐  ┌──────┐

  │ App 1│  │ App 2│  │ App 3│ (stateless API servers)

  └──┬───┘  └──┬───┘  └──┬───┘

     │         │         │

     └─────────┼─────────┘

              │

      ┌───────┴───────┐

      ▼               ▼

  ┌──────┐      ┌──────────┐

  │ Redis │      │ PostgreSQL│

  │(cache)│      │ (source) │

  └──────┘      └──────────┘

Redirect flow: CDN (80% hit) → Redis (15%) → PostgreSQL (5%)

P50 latency: 5ms (CDN hit), P99: 25ms

Key Design Decisions

Short code generation: Base62 encoding of auto-increment ID. 6 characters = 56B combinations. No collision checking needed.
Cache hot URLs at CDN: The top 20% of URLs account for 80% of traffic. CDN handles most redirects without hitting your servers.
Redis for warm URLs: Cache the most recent 10M URLs in Redis (5GB). Sub-millisecond lookups.
Analytics via Kafka: Log every redirect to Kafka. A separate consumer aggregates click stats. Never slow down the redirect path for analytics.

9. Common System Design Mistakes

Mistake	Why It Happens	What to Do Instead
Premature microservices	"Netflix does it"	Start with a monolith. Extract when you hit specific scaling or team bottlenecks.
Sharding too early	Fear of future scale	Read replicas + caching handle 10x more than you think. Shard when you must.
No caching strategy	"We'll add caching later"	Design cache keys from day one. Retrofitting is 10x harder.
Synchronous everything	Simpler to reason about	Any operation over 200ms should be async. Users shouldn't wait for emails to send.
Ignoring failure modes	"It probably won't fail"	Every external call fails eventually. Implement timeouts, retries, circuit breakers.
Single database for everything	Simplicity	Use the right DB for each job: PostgreSQL for transactions, Redis for cache, ES for search.

Our biggest lesson: The system that works is better than the system that's perfectly designed. We've seen beautifully architected systems that never shipped, and ugly monoliths that make millions. Start simple, measure everything, and add complexity only when the metrics tell you to. The best architecture is the simplest one that solves your current problem — not the one that solves problems you might have in three years.

10. Frequently Asked Questions

How many users can a single server handle?

More than most people think. A well-optimized Node.js or Go server on a 4-core machine handles 5,000-10,000 requests per second. With proper caching and database optimization, that's 50,000-100,000 daily active users. Don't over-engineer until you're actually hitting limits — measure first.

When should I switch from a monolith to microservices?

When specific problems demand it: teams are blocking each other on deployments, one module needs different scaling than others, or the codebase is too large for any developer to understand. Don't switch because of hype. Our article on Microservices vs Monolith covers the decision framework in detail.

SQL or NoSQL — how do I decide?

Default to PostgreSQL. It handles 90% of use cases with transactions, JSON support, full-text search, and excellent performance. Use NoSQL when you have specific needs: DynamoDB for serverless at scale, MongoDB for truly schema-less data, Redis for caching. The "SQL can't scale" myth died years ago — Instagram runs on PostgreSQL.

How do I handle real-time features like notifications or chat?

WebSockets for persistent connections (chat, live collaboration), Server-Sent Events for one-way updates (notifications, live feeds), and long-polling as a fallback. We use Socket.io or native WebSockets with Redis Pub/Sub for horizontal scaling. For push notifications, Firebase Cloud Messaging or AWS SNS handles the device delivery.

What's the cheapest way to run a scalable system?

One VPS ($20/month) with PostgreSQL, Redis, and Nginx handles more than most startups need. Add CloudFlare's free CDN tier. Use managed databases only when you can't afford DBA time. Serverless (Lambda/Cloud Functions) is cheapest at very low traffic but gets expensive at scale. We've run production systems serving 50K DAU on $100/month infrastructure.

Pillai Infotech Engineering Team

We design and build scalable systems for startups and enterprises — from zero-to-one MVPs to platforms handling millions of requests. This guide reflects real architecture decisions we've made, including the ones that didn't work out the first time.

Event-Driven Architecture: Design Patterns and Implementation → Microservices vs Monolith: When to Make the Switch → Distributed Systems Fundamentals Every Developer Should Know →

System Design: Architecture Principles for Scalable Systems

In This Guide

1. The Fundamentals That Actually Matter

2. Scaling — Vertical vs Horizontal

3. Load Balancing Strategies

4. Caching — The Biggest Performance Win

Cache Invalidation Strategies

5. Database Selection and Sharding

Database Sharding

6. Message Queues and Async Processing

7. CDN and Edge Architecture

8. Design Walkthrough: URL Shortener at Scale

Back-of-Envelope Math

Key Design Decisions

9. Common System Design Mistakes

10. Frequently Asked Questions

How many users can a single server handle?

When should I switch from a monolith to microservices?

SQL or NoSQL — how do I decide?

How do I handle real-time features like notifications or chat?

What's the cheapest way to run a scalable system?

Related Articles

System Design: Architecture Principles for Scalable Systems

In This Guide

1. The Fundamentals That Actually Matter

2. Scaling — Vertical vs Horizontal

3. Load Balancing Strategies

4. Caching — The Biggest Performance Win

Cache Invalidation Strategies

5. Database Selection and Sharding

Database Sharding

6. Message Queues and Async Processing

7. CDN and Edge Architecture

8. Design Walkthrough: URL Shortener at Scale

Back-of-Envelope Math

Key Design Decisions

9. Common System Design Mistakes

10. Frequently Asked Questions

How many users can a single server handle?

When should I switch from a monolith to microservices?

SQL or NoSQL — how do I decide?

How do I handle real-time features like notifications or chat?

What's the cheapest way to run a scalable system?

Related Articles

Book a Free Consultation

Your Details

Pick a 30-min Slot

Thank You!