In This Guide
- 1. Signs You Need to Scale
- 2. Vertical Scaling — The Quick Fix
- 3. Read Replicas — 80% of Your Reads, Offloaded
- 4. Connection Pooling — The Overlooked Bottleneck
- 5. Caching Layers — Don't Hit the Database at All
- 6. Sharding — The Nuclear Option
- 7. Query Optimization — Free Performance
- 8. The Scaling Playbook — What to Do When
- 9. Frequently Asked Questions
Every database hits a wall. The question isn't if — it's when, and whether you'll be ready. We've watched startups go from "our PostgreSQL handles everything" to "why are queries taking 30 seconds?" in a matter of weeks once they hit product-market fit. The good news: database scaling is a solved problem. The bad news: most teams reach for the wrong solution first.
This guide walks through every scaling strategy in the order you should consider them — from free optimizations to architectural overhauls. Most teams never need to go past step 5.
1. Signs You Need to Scale
Before you redesign anything, confirm you actually have a scaling problem — not a bad query problem.
| Symptom | Likely Cause | First Action |
|---|---|---|
| Single query takes > 5s | Missing index or bad query | EXPLAIN ANALYZE |
| All queries slow during peak | CPU/memory saturated | Vertical scale or read replicas |
| Connection errors / timeouts | Connection pool exhausted | Connection pooler (PgBouncer) |
| Write throughput plateaued | Single-writer bottleneck | Partitioning, then sharding |
| Storage growing > 500GB | Data retention / archival needed | Partitioning + cold storage |
| Replication lag > 1s | Replica overloaded | Add replicas or split workloads |
2. Vertical Scaling — The Quick Fix
Vertical scaling means giving your existing database a bigger machine — more CPU, RAM, faster SSDs. It's unglamorous, but it works, and it's the simplest scaling strategy.
| Cloud Provider | Max Instance | vCPUs | RAM | Cost/Month |
|---|---|---|---|---|
| AWS RDS | db.r7g.16xlarge | 64 | 512 GB | ~$8,500 |
| GCP Cloud SQL | db-perf-optimized | 128 | 864 GB | ~$12,000 |
| Azure Flexible | E96ds_v5 | 96 | 672 GB | ~$9,200 |
When to vertical scale: Your CPU or RAM is consistently above 70% utilization, you're under 1TB data, and you haven't optimized queries yet. Vertical scaling buys you time to implement proper horizontal scaling.
When to stop: When doubling machine size gives less than 30% improvement, you've hit the vertical ceiling. Time for horizontal strategies.
3. Read Replicas — 80% of Your Reads, Offloaded
Most applications are read-heavy — often 80-90% reads vs 10-20% writes. Read replicas let you distribute that read load across multiple database copies while writes go to a single primary.
How Read Replicas Work
Application
│
├── WRITE (INSERT/UPDATE/DELETE)
│ └── Primary Database
│ │
│ ├── Async replication ──→ Replica 1 (analytics queries)
│ ├── Async replication ──→ Replica 2 (API read traffic)
│ └── Async replication ──→ Replica 3 (search/reporting)
│
└── READ (SELECT)
└── Load Balancer → Replica 1, 2, or 3
| Database | Max Replicas | Replication Type | Typical Lag |
|---|---|---|---|
| PostgreSQL | Unlimited (practical: 5-10) | Streaming (async/sync) | < 100ms |
| MySQL | Unlimited (practical: 5-10) | Binary log (async/semi-sync) | < 100ms |
| Aurora PostgreSQL | 15 | Storage-level replication | < 20ms |
| Cloud SQL (GCP) | 20 | Managed async | < 100ms |
Application-Level Read/Write Splitting
// Database router — Laravel example
class DatabaseRouter {
public function select($query) {
// Route reads to replica
return DB::connection('replica')->select($query);
}
public function insert($query, $bindings) {
// Route writes to primary
return DB::connection('primary')->insert($query, $bindings);
}
}
// Django — built-in router
DATABASES = {
'default': {'HOST': 'primary.db.internal'},
'replica': {'HOST': 'replica.db.internal'},
}
DATABASE_ROUTERS = ['myapp.routers.ReadReplicaRouter']
// Spring Boot — @Transactional(readOnly = true) routes to replica
4. Connection Pooling — The Overlooked Bottleneck
Each database connection consumes memory (5-10 MB in PostgreSQL). Serverless architectures and microservices create thousands of connections. A connection pooler sits between your app and database, multiplexing many app connections onto fewer database connections.
| Pooler | Database | Mode | Best For |
|---|---|---|---|
| PgBouncer | PostgreSQL | Transaction pooling | High-connection serverless |
| Pgcat | PostgreSQL | Transaction + sharding | Multi-tenant sharded setups |
| ProxySQL | MySQL | Query routing + pooling | Read/write split + failover |
| Supavisor | PostgreSQL | Tenant-aware pooling | Multi-tenant SaaS |
PgBouncer Configuration
[databases]
myapp = host=primary.db.internal port=5432 dbname=myapp
[pgbouncer]
listen_port = 6432
listen_addr = 0.0.0.0
; Transaction pooling — connection returned after each transaction
pool_mode = transaction
; 20 actual database connections serve 2,000 app connections
default_pool_size = 20
max_client_conn = 2000
; Reserve connections for admin
reserve_pool_size = 5
reserve_pool_timeout = 3
5. Caching Layers — Don't Hit the Database at All
The fastest database query is the one you never make. Caching stores frequently accessed data in memory (Redis, Memcached) so your application skips the database entirely for repeat reads.
| Cache Pattern | How It Works | Best For | Staleness Risk |
|---|---|---|---|
| Cache-Aside | App checks cache → miss → query DB → store in cache | General purpose, user profiles | TTL-dependent |
| Write-Through | Write to cache + DB simultaneously | Data that's read immediately after write | Low |
| Write-Behind | Write to cache → async flush to DB | High-write workloads (counters, analytics) | Data loss risk on crash |
| Read-Through | Cache itself fetches from DB on miss | CDN patterns, materialized views | TTL-dependent |
Cache-Aside Pattern — Redis Example
async function getUser(userId) {
const cacheKey = `user:${userId}`;
// 1. Check cache
const cached = await redis.get(cacheKey);
if (cached) return JSON.parse(cached);
// 2. Cache miss — query database
const user = await db.query('SELECT * FROM users WHERE id = $1', [userId]);
// 3. Store in cache with TTL (5 minutes)
await redis.setex(cacheKey, 300, JSON.stringify(user));
return user;
}
// Invalidate on write
async function updateUser(userId, data) {
await db.query('UPDATE users SET ... WHERE id = $1', [userId, ...data]);
await redis.del(`user:${userId}`); // Delete, don't update — avoids race conditions
}
6. Sharding — The Nuclear Option
Sharding splits your data across multiple database instances. Each shard holds a subset of data. It's the most powerful scaling technique — and the most complex to implement and operate.
| Sharding Strategy | How It Works | Pros | Cons |
|---|---|---|---|
| Hash-Based | shard = hash(key) % N | Even distribution | Resharding requires data migration |
| Range-Based | Shard by date, ID range, region | Range queries stay on one shard | Hotspots (recent data shard) |
| Directory-Based | Lookup table maps key → shard | Flexible, can rebalance easily | Lookup table is a SPOF |
| Tenant-Based | Each tenant/customer on own shard | Natural isolation, easy compliance | Uneven load if tenants vary |
Consistent Hashing — Minimizing Resharding Pain
// Naive hashing — adding a shard moves ALL data
shard = hash(user_id) % 4 // user 12345 → shard 1
shard = hash(user_id) % 5 // user 12345 → shard 0 (DIFFERENT!)
// Consistent hashing — adding a shard moves ~1/N data
class ConsistentHash {
constructor(nodes, virtualNodes = 150) {
this.ring = new SortedMap();
nodes.forEach(node => {
for (let i = 0; i < virtualNodes; i++) {
const hash = md5(`${node}:${i}`);
this.ring.set(hash, node);
}
});
}
getShard(key) {
const hash = md5(key);
// Find first node clockwise on the ring
const entry = this.ring.firstAfter(hash) || this.ring.first();
return entry.value;
}
}
What Sharding Breaks
Before you shard, understand what you lose:
- JOINs across shards — queries spanning multiple shards require scatter-gather, which is slow
- Transactions — ACID guarantees don't extend across database instances without distributed transactions (2PC)
- Unique constraints — uniqueness is per-shard, not global. Need a separate uniqueness service
- Aggregations — COUNT, SUM, AVG require querying all shards and merging results
- Schema migrations — every shard needs the migration, and they can't all run simultaneously
7. Query Optimization — Free Performance
Before scaling infrastructure, optimize what you have. We've seen 10-100x improvements from query optimization alone.
Index Strategy — The Biggest Win
-- BEFORE: Full table scan (120 seconds on 50M rows)
SELECT * FROM orders WHERE customer_id = 12345 AND status = 'active';
-- EXPLAIN shows: Seq Scan on orders (cost=0.00..1847293.00 rows=50000000)
-- FIX: Composite index matching the WHERE clause
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);
-- AFTER: Index scan (2ms)
-- EXPLAIN shows: Index Scan using idx_orders_customer_status (cost=0.56..8.72 rows=15)
-- RULE: Column order in composite index matters!
-- Put equality conditions first, range conditions last
CREATE INDEX idx_orders_date ON orders(status, customer_id, created_at DESC);
-- Works for: WHERE status = 'active' AND customer_id = 123 AND created_at > '2026-01-01'
| Optimization | Impact | Effort | When to Use |
|---|---|---|---|
| Add missing indexes | 10-1000x | Low | Always first |
| Rewrite N+1 → JOIN/batch | 5-50x | Medium | ORM-heavy code |
| Pagination (cursor-based) | 2-10x | Low | OFFSET > 10,000 |
| Partitioning (by date) | 2-20x | Medium | Tables > 100M rows with date queries |
| Materialized views | 10-100x | Low | Complex aggregation dashboards |
| SELECT only needed columns | 1.5-5x | Low | Wide tables, SELECT * |
8. The Scaling Playbook — What to Do When
Follow this decision tree based on your current scale. Each step should be fully implemented before moving to the next.
| Stage | Scale | Strategy | Effort | Headroom Gained |
|---|---|---|---|---|
| 0 | Any | Query optimization + indexing | Hours | 10-100x for bad queries |
| 1 | < 1TB | Vertical scaling (bigger instance) | Minutes | 2-4x |
| 2 | Read-heavy | Read replicas + connection pooling | Days | 3-10x reads |
| 3 | Repeat reads | Redis/Memcached caching layer | Days | 10-50x for cached data |
| 4 | > 100M rows | Table partitioning | Days | 2-20x for partition-pruned queries |
| 5 | > 1TB / write-heavy | Sharding (or NewSQL: CockroachDB, Vitess) | Months | Near-linear horizontal scaling |
Frequently Asked Questions
When should I shard my database?
Only after you've exhausted vertical scaling, read replicas, caching, and query optimization — and you have a genuine write throughput problem or your data exceeds what a single instance can store (typically > 1-2TB). Most applications never need sharding. If you're unsure, you probably don't need it yet.
Can I scale without application changes?
Vertical scaling and managed read replicas (like Aurora) require minimal code changes. Connection pooling (PgBouncer) requires zero code changes. Caching and sharding always require application-level modifications. The more impactful the strategy, the more code changes it needs.
Redis or Memcached for caching?
Redis in almost all cases. It supports data structures (sorted sets, lists, hashes), persistence, pub/sub, and Lua scripting. Memcached is simpler and slightly faster for pure key-value caching, but the feature gap has widened. Pick Memcached only if you need multi-threaded performance at extreme scale and don't need data structures.
How do I handle database scaling with microservices?
Each microservice owns its database — this is natural sharding by domain (orders service has orders DB, users service has users DB). Cross-service queries go through APIs, not database joins. Use eventual consistency via events (Kafka) for data that spans services. This pattern scales well but adds complexity to transactions.
Should I use a managed database or self-hosted?
Managed (RDS, Cloud SQL, Aurora) unless you have a dedicated DBA team. The operational overhead of backups, failover, patching, and monitoring is substantial. Managed databases cost 20-40% more but save hundreds of engineering hours. Self-host only if you need specific configurations that managed services don't support, or at massive scale where the cost premium becomes prohibitive.
Pillai Infotech LLP
We design and implement database scaling strategies for growing applications — from query optimization to multi-region sharding. Let's scale your database.