Ideas Engineered for Tomorrow
We Engineer Services & Solutions for Your Business Needs
Home About
Products
Services
Hire
Industries
Consulting
Partners
Articles Careers Contact
Software Development

Database Migration Strategies: Zero-Downtime Approaches

Migrating a production database is like changing the engine of a car while it's driving. Here's how to do it without crashing — or losing a single row of data.

🗄️ Database & Data February 7, 2026 12 min read

In This Guide

Database migrations are among the riskiest operations in software engineering. A bad migration can take down your application, corrupt data, or create inconsistencies that take weeks to untangle. We've been through enough of these — both smooth and painful — to know what works and what doesn't.

This guide covers every migration scenario: schema changes that don't lock your tables, engine swaps that don't lose data, and rollback plans that actually work when things go wrong. The common thread: never do anything you can't undo.

1. Types of Database Migrations

Migration Type Example Risk Level Typical Downtime
Schema changeAdd column, rename tableLow–Medium0 (if done right)
Data transformationBackfill new column, split tableMedium0 (batched)
Version upgradePostgreSQL 15 → 16MediumMinutes (replica promotion)
Engine migrationMySQL → PostgreSQLHigh0 (dual-write), minutes (cutover)
Cloud migrationSelf-hosted → RDS/Cloud SQLHigh0 (with DMS/replication)

2. Schema Migrations — The Expand-Contract Pattern

The expand-contract pattern is the gold standard for zero-downtime schema changes. Instead of modifying in place, you expand (add new), migrate data, then contract (remove old).

Example: Renaming a Column (the safe way)

-- WRONG: This breaks all existing queries instantly
ALTER TABLE users RENAME COLUMN name TO full_name;  -- 💥 Downtime

-- RIGHT: Expand-Contract (3 deployments, 0 downtime)

-- Deploy 1: EXPAND — add new column
ALTER TABLE users ADD COLUMN full_name VARCHAR(255);
-- Backfill (batched to avoid locks):
UPDATE users SET full_name = name WHERE id BETWEEN 1 AND 10000;
UPDATE users SET full_name = name WHERE id BETWEEN 10001 AND 20000;
-- ...continue in batches

-- Deploy 2: MIGRATE — app writes to both, reads from new
-- Application code:
-- INSERT INTO users (name, full_name, ...) VALUES ($1, $1, ...);
-- SELECT full_name FROM users WHERE ...;

-- Deploy 3: CONTRACT — remove old column (after verifying all reads use new)
ALTER TABLE users DROP COLUMN name;

Dangerous Schema Operations (and Safe Alternatives)

Operation Risk Safe Alternative
ADD COLUMN with DEFAULTTable lock (MySQL < 8.0)ADD COLUMN, then backfill in batches
ADD NOT NULL constraintFull table scan to validateAdd CHECK constraint with NOT VALID, then VALIDATE separately
CREATE INDEXTable lock during buildCREATE INDEX CONCURRENTLY (PostgreSQL)
Change column typeTable rewriteAdd new column, backfill, swap reads, drop old
DROP COLUMNBreaks code still reading itRemove all reads first, then drop in separate deploy
RENAME TABLEBreaks all queriesCreate new table, dual-write, migrate reads, drop old
Tools That Help: gh-ost (GitHub) and pt-online-schema-change (Percona) for MySQL — they create shadow tables and swap atomically. pgroll and pg_repack for PostgreSQL. Rails has strong_migrations gem that blocks dangerous migrations. Django has django-pg-zero-downtime-migrations.

3. Engine Migrations — Changing Databases Entirely

Switching from MySQL to PostgreSQL, or from PostgreSQL to a NoSQL database, is a major undertaking. Here's the proven four-phase approach.

The Four Phases

Phase 1: PREPARE (2-4 weeks)
├── Audit all queries — identify incompatible syntax
├── Build abstraction layer (Repository pattern)
├── Set up new database with equivalent schema
├── Build data validation scripts (row counts, checksums)
└── Create rollback plan

Phase 2: DUAL-WRITE (1-2 weeks)
├── Write to BOTH old and new databases
├── Read from OLD database only
├── Verify new database has consistent data
└── Fix any sync discrepancies

Phase 3: SHADOW-READ (1-2 weeks)
├── Read from BOTH databases
├── Compare results, log mismatches
├── Primary reads still from old database
└── Fix all mismatches before proceeding

Phase 4: CUTOVER (1 day)
├── Switch reads to new database
├── Keep dual-writes for rollback window (24-48 hours)
├── Monitor error rates, latency, data consistency
└── Disable old database writes after rollback window

4. The Dual-Write Pattern

Dual-write means every write goes to both the old and new database. It's the foundation of zero-downtime engine migrations, but it has subtle pitfalls.

Dual-Write Implementation

class OrderRepository {
    constructor(
        private oldDb: MySQLConnection,
        private newDb: PostgreSQLConnection,
        private migrationPhase: 'dual-write' | 'shadow-read' | 'cutover'
    ) {}

    async createOrder(order: Order): Promise<Order> {
        // Always write to old (source of truth during migration)
        const result = await this.oldDb.insert('orders', order);

        // Write to new database — async, don't block on failure
        try {
            await this.newDb.insert('orders', this.transformForNewDb(order));
        } catch (error) {
            // Log for reconciliation, DON'T fail the request
            logger.error('Dual-write to new DB failed', { orderId: order.id, error });
            await this.reconciliationQueue.add({ table: 'orders', id: order.id });
        }

        return result;
    }

    async getOrder(id: string): Promise<Order> {
        if (this.migrationPhase === 'cutover') {
            return this.newDb.query('SELECT * FROM orders WHERE id = $1', [id]);
        }

        const oldResult = await this.oldDb.query('SELECT * FROM orders WHERE id = ?', [id]);

        if (this.migrationPhase === 'shadow-read') {
            // Read from new DB too, compare results, log mismatches
            const newResult = await this.newDb.query('SELECT * FROM orders WHERE id = $1', [id]);
            if (!this.deepEqual(oldResult, newResult)) {
                logger.warn('Shadow read mismatch', { id, old: oldResult, new: newResult });
            }
        }

        return oldResult;  // Always return from old during migration
    }
}
Dual-Write Pitfalls:
  • Ordering issues — if write to old succeeds but new fails, then retries, you can get out-of-order writes. Use idempotent writes.
  • Increased latency — writing to two databases doubles write time. Make the secondary write async.
  • Transaction consistency — you can't have ACID transactions across two databases. Accept eventual consistency during migration.

5. Shadow Database Strategy

An alternative to dual-write: use Change Data Capture (CDC) to replicate from old to new. This is less invasive — no application code changes for the write path.

Application ──→ Old Database (MySQL)
                    │
                    │  CDC (Debezium / DMS)
                    ▼
              Shadow Database (PostgreSQL)
                    │
                    │  Validation job (compares row counts, checksums)
                    ▼
              Dashboard: "99.97% consistent, 12 rows divergent"

When consistency reaches 100% and stays there for 48 hours → safe to cut over
Approach Pros Cons Best For
Dual-WriteFull control, works with any DBApp code changes, consistency riskDifferent DB engines
CDC (Debezium)No app changes, real-time syncSetup complexity, transform limitationsSame schema, different engine
AWS DMSManaged, handles schema conversionAWS lock-in, limited customizationCloud migrations

6. Large-Scale Data Migration

Moving terabytes of data without locking tables or consuming all your database's resources requires careful batching.

Batched Data Backfill

#!/bin/bash
# Batched migration — 1000 rows at a time with throttling

BATCH_SIZE=1000
LAST_ID=0
TOTAL=0

while true; do
    # Migrate next batch
    COUNT=$(psql -t -A -c "
        INSERT INTO new_orders (id, customer_id, amount, status, created_at)
        SELECT id, customer_id, amount, status, created_at
        FROM old_orders
        WHERE id > $LAST_ID
        ORDER BY id
        LIMIT $BATCH_SIZE
        ON CONFLICT (id) DO NOTHING
        RETURNING 1
    " | wc -l)

    # Update cursor
    LAST_ID=$(psql -t -A -c "
        SELECT COALESCE(MAX(id), $LAST_ID) FROM new_orders WHERE id > $LAST_ID
    ")

    TOTAL=$((TOTAL + COUNT))
    echo "Migrated $TOTAL rows (last_id: $LAST_ID)"

    # No more rows — done
    if [ "$COUNT" -lt "$BATCH_SIZE" ]; then
        echo "Migration complete: $TOTAL rows"
        break
    fi

    # Throttle — sleep between batches to avoid overwhelming the database
    sleep 0.1
done
Data Volume Recommended Approach Estimated Time
< 1 GBpg_dump / mysqldumpMinutes
1–100 GBBatched INSERT with throttlingHours
100 GB – 1 TBParallel workers + CDC for ongoing changesHours to days
> 1 TBInitial bulk load (COPY) + CDC for delta syncDays

7. Rollback Strategies

Every migration needs a tested rollback plan. The best rollback is one you never need — but the second-best is one that's been practiced.

Migration Type Rollback Strategy Rollback Time Data Loss Risk
Add columnDrop column (safe if not read yet)SecondsNone
Engine migrationKeep dual-writes active, switch reads backMinutes (config change)None (both DBs have data)
Version upgradePromote old replica back to primaryMinutesWrites since cutover
Data transformationKeep old column until verified, revert readsSeconds (deploy revert)None (old data preserved)
Our Rollback Rule: Never delete old data, columns, or tables until at least one full business cycle (usually one week) after migration. Keep the old path available. The cost of extra storage is trivial compared to the cost of not being able to roll back. We've seen teams delete the old column on Friday, discover a bug on Monday, and spend the next week reconstructing data.

8. Migration Checklist

Use this checklist for any non-trivial database migration. Skip items at your own risk.

Pre-Migration

  • Document rollback plan — specific commands, not "revert if needed"
  • Test migration on staging with production-size data
  • Measure migration duration on staging (multiply by 1.5x for production)
  • Notify on-call team and stakeholders of migration window
  • Take a backup (and verify you can restore from it)
  • Set up monitoring dashboards for error rates and latency

During Migration

  • Monitor database CPU, connections, replication lag
  • Watch application error rates in real-time
  • Run data validation queries continuously
  • Have rollback command ready to execute

Post-Migration

  • Verify row counts match (old vs new)
  • Run checksums on critical tables
  • Test all critical application flows end-to-end
  • Keep old database running for rollback window (minimum 48 hours)
  • Schedule cleanup of old resources after rollback window

Frequently Asked Questions

Is zero-downtime migration really possible?

Yes, for most migration types. Schema changes use expand-contract. Engine migrations use dual-write + shadow reads. The tradeoff is increased complexity and migration duration — what could be a 5-minute maintenance window becomes a multi-week gradual rollout. For most production systems, the extra complexity is worth it.

How do I migrate a 500GB database?

Initial bulk load with COPY or pg_dump (hours, can run during low traffic), then CDC (Debezium or AWS DMS) for ongoing changes. Once CDC catches up and consistency is verified, cut over. The initial load is the slow part — CDC keeps it in sync after that.

What's the biggest migration mistake teams make?

Not testing with production-volume data. A migration that takes 2 seconds on a dev database with 1,000 rows can lock your table for 45 minutes on a production table with 100 million rows. Always test on a staging environment with realistic data volumes before touching production.

Should I use an ORM migration tool or raw SQL?

ORM migration tools (Django migrations, Alembic, Rails migrations) are fine for simple schema changes. For complex migrations — data transformations, engine switches, performance-sensitive operations — write raw SQL. ORM-generated SQL often includes unnecessary locks or suboptimal patterns. Always review the generated SQL before running in production.

How do I handle foreign keys during migration?

Migrate parent tables first, then child tables. If using expand-contract, add the new foreign key as NOT VALID first (PostgreSQL), backfill data, then VALIDATE CONSTRAINT separately. In MySQL, temporarily disable foreign key checks during bulk loads (SET FOREIGN_KEY_CHECKS=0), but re-enable and validate immediately after.

🗄️

Pillai Infotech LLP

We plan and execute database migrations for production systems — from schema changes to full engine swaps. Let's plan your migration.

Related Articles

PostgreSQL vs MySQL: Database Comparison for 2026 → Database Scaling Strategies: Sharding, Replication, and Caching → Data Engineering Fundamentals: Building Modern Data Pipelines →