Skip to content

Migrations & Schema Changes

← DBRE Home | ← Main


Why Migrations Are Risky [B]

Schema changes on live databases can: - Lock tables for seconds or minutes (blocking all reads/writes) - Cause application errors if new/old code expects different schema - Fill up disk during table rewrites - Cause replication lag on replicas

The goal: zero-downtime, backward-compatible schema changes.


Safe vs Unsafe Operations [B]

PostgreSQL

Operation Risk Notes
ADD COLUMN (nullable, no default) Safe Instant metadata change
ADD COLUMN with DEFAULT (Postgres 11+) Safe Stored as default, instant
ADD COLUMN with DEFAULT (Postgres < 11) Unsafe Rewrites entire table
DROP COLUMN Safe (after code deploy) Hides column, no rewrite
CREATE INDEX CONCURRENTLY Safe Non-blocking
CREATE INDEX Unsafe Locks table writes
ADD NOT NULL Unsafe Scans entire table
ALTER COLUMN TYPE Unsafe Rewrites table
DROP TABLE Unsafe Permanent data loss
RENAME TABLE/COLUMN Unsafe Breaks existing code
ADD CONSTRAINT Unsafe Validates entire table

MySQL / Aurora

Operation INSTANT INPLACE (online) Needs pt-osc/gh-ost
ADD COLUMN (end, any default) ✓ 8.0.29+ No
ADD COLUMN (with GENERATED) No
DROP COLUMN ✓ 8.0.29+ No
RENAME COLUMN No
ADD SECONDARY INDEX ✓ (lock=NONE) No — INPLACE is online
DROP INDEX No
ADD FULLTEXT INDEX (first) No
CHANGE COLUMN TYPE ✗ (COPY) Yes
ADD / DROP PRIMARY KEY ✗ (COPY) Yes
CONVERT CHARACTER SET ✗ (COPY) Yes
ADD FOREIGN KEY No
OPTIMIZE TABLE No (but slow — consider pt-osc)

INSTANT = zero-lock, metadata-only. INPLACE (online) = brief MDL at start and end, no table copy. COPY = full table rewrite, reads blocked.


Expand/Contract Pattern [I]

The safest way to make breaking schema changes:

Phase 1: EXPAND  — add new structure alongside old
Phase 2: MIGRATE — backfill data to new structure
Phase 3: SWITCH  — update code to use new structure
Phase 4: CONTRACT — remove old structure

Example: Renaming a Column

❌ Wrong: RENAME COLUMN user_name TO username
         → breaks running apps instantly

✓ Correct:
Phase 1: ADD COLUMN username VARCHAR(255)
         + write to BOTH columns in application code

Phase 2: UPDATE users SET username = user_name WHERE username IS NULL
         (run in batches, see batching below)

Phase 3: Deploy code that reads from username only
         + verify no reads of user_name in production

Phase 4: DROP COLUMN user_name

Example: Adding NOT NULL Constraint

-- Phase 1: Add column as nullable
ALTER TABLE orders ADD COLUMN customer_email VARCHAR(255);

-- Phase 2: Backfill (in batches)
UPDATE orders SET customer_email = (
    SELECT email FROM customers WHERE customers.id = orders.customer_id
)
WHERE customer_email IS NULL AND id BETWEEN :start AND :end;

-- Phase 3: Add NOT NULL (after 100% backfill verified)
-- PostgreSQL: avoid full scan with NOT VALID + VALIDATE
ALTER TABLE orders
  ADD CONSTRAINT orders_customer_email_not_null
  CHECK (customer_email IS NOT NULL) NOT VALID;

ALTER TABLE orders VALIDATE CONSTRAINT orders_customer_email_not_null;
-- VALIDATE acquires ShareUpdateExclusiveLock (allows reads/writes)

Batching Large Updates [I]

Never run UPDATE ... WHERE that touches millions of rows in a single transaction: - Holds locks for minutes - Generates huge WAL / binlog - Can cause replication lag

# Python batching example
def backfill_in_batches(batch_size=1000, sleep_seconds=0.1):
    last_id = 0
    while True:
        result = db.execute("""
            UPDATE orders
            SET customer_email = (
                SELECT email FROM customers WHERE customers.id = orders.customer_id
            )
            WHERE id > :last_id
              AND customer_email IS NULL
            LIMIT :batch_size
            RETURNING id
        """, last_id=last_id, batch_size=batch_size)

        if not result:
            break

        last_id = max(row['id'] for row in result)
        time.sleep(sleep_seconds)  # Be gentle on the DB
        print(f"Processed up to id={last_id}")
-- Pure SQL batching (PostgreSQL)
DO $$
DECLARE
  batch_id BIGINT := 0;
  max_id BIGINT;
BEGIN
  SELECT MAX(id) INTO max_id FROM orders;

  WHILE batch_id < max_id LOOP
    UPDATE orders
    SET processed = TRUE
    WHERE id > batch_id AND id <= batch_id + 1000
      AND processed IS FALSE;

    batch_id := batch_id + 1000;
    PERFORM pg_sleep(0.1);  -- small delay between batches
  END LOOP;
END $$;

Non-Blocking Index Creation [I]

-- Always use CONCURRENTLY in production
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders(customer_id);

-- If CONCURRENTLY fails, it leaves an INVALID index:
-- Check for invalid indexes
SELECT indexname, indisvalid
FROM pg_indexes
JOIN pg_index ON pg_index.indexrelid = pg_class.oid
JOIN pg_class ON pg_class.relname = pg_indexes.indexname
WHERE NOT indisvalid;

-- Drop and retry
DROP INDEX CONCURRENTLY idx_orders_customer_id;
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders(customer_id);

Migration Tools [I]

Flyway

-- V1__create_orders.sql
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER NOT NULL,
    total DECIMAL(10,2) NOT NULL,
    created_at TIMESTAMP DEFAULT NOW()
);

-- V2__add_status_to_orders.sql
ALTER TABLE orders ADD COLUMN status VARCHAR(20) DEFAULT 'pending';
CREATE INDEX CONCURRENTLY idx_orders_status ON orders(status);
flyway -url=jdbc:postgresql://localhost/mydb migrate
flyway info   # show migration status

Alembic (Python/SQLAlchemy)

# alembic/versions/abc123_add_customer_email.py
def upgrade():
    op.add_column('orders',
        sa.Column('customer_email', sa.String(255), nullable=True)
    )
    # Note: create index CONCURRENTLY separately or use op.execute()
    op.execute("""
        CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_orders_customer_email
        ON orders(customer_email)
    """)

def downgrade():
    op.drop_index('idx_orders_customer_email', table_name='orders')
    op.drop_column('orders', 'customer_email')

gh-ost (GitHub Online Schema Change — MySQL)

gh-ost \
  --user="dbre" \
  --password="..." \
  --host=primary.db.example.com \
  --database="myapp" \
  --table="orders" \
  --alter="ADD COLUMN customer_email VARCHAR(255)" \
  --execute

Uses binary log to apply changes incrementally, no long table locks.


Rollback Strategies [I]

Not all migrations are reversible in the same way. Know your rollback before you run.

Migration type Rollback procedure Reversible?
ADD COLUMN (nullable) DROP COLUMN Yes — instant
ADD COLUMN with DEFAULT DROP COLUMN Yes — instant (Postgres 11+)
CREATE INDEX CONCURRENTLY DROP INDEX CONCURRENTLY Yes
Backfill (UPDATE in batches) Reverse backfill (run opposite UPDATE) Yes — but slow
DROP COLUMN Restore from snapshot No — must snapshot before
DROP TABLE Restore from snapshot No — must snapshot before
RENAME COLUMN (via expand/contract) Remove new column, restore old writes Yes — if expand phase only
ALTER COLUMN TYPE Restore from snapshot No
Constraint (NOT VALID + VALIDATE) DROP CONSTRAINT Yes

Rule: any migration that touches existing data or removes structure needs a pre-migration snapshot.

# AWS RDS: snapshot before every risky migration
aws rds create-db-snapshot \
  --db-instance-identifier prod-db \
  --db-snapshot-identifier "pre-migration-$(date +%Y%m%d-%H%M)"

# Verify snapshot is available before proceeding
aws rds describe-db-snapshots \
  --db-snapshot-identifier "pre-migration-$(date +%Y%m%d-%H%M)" \
  --query 'DBSnapshots[0].Status'

What to Do When a Migration Goes Wrong [I]

Step 1: Assess the situation

-- Is the migration still running?
SELECT pid, now() - query_start AS duration, state, query
FROM pg_stat_activity
WHERE query NOT LIKE '%pg_stat_activity%'
ORDER BY duration DESC;

-- How many rows remain? (estimate from last_id progress in backfill)
SELECT COUNT(*) FROM orders WHERE customer_email IS NULL;

-- Is anything blocked behind it?
SELECT
  blocking.pid AS blocking_pid,
  blocked.pid  AS blocked_pid,
  blocking.query AS blocker,
  blocked.query  AS blocked,
  now() - blocked.query_start AS blocked_duration
FROM pg_stat_activity blocked
JOIN pg_stat_activity blocking
  ON blocking.pid = ANY(pg_blocking_pids(blocked.pid));

Step 2: Decision — wait or cancel?

Duration estimate acceptable AND no critical queries blocked?
  → Wait. Monitor.

Duration unacceptable OR critical queries blocked?
  → Cancel the migration.

Step 3: Cancel safely

-- Soft cancel (preferred — waits for safe point)
SELECT pg_cancel_backend(:migration_pid);

-- Hard terminate (use if soft cancel doesn't work after 30s)
SELECT pg_terminate_backend(:migration_pid);

Step 4: Clean up after a failed CREATE INDEX CONCURRENTLY

A failed CREATE INDEX CONCURRENTLY leaves an INVALID index that continues to consume space and slow writes without helping reads:

-- Find invalid indexes
SELECT indexname, tablename
FROM pg_indexes
JOIN pg_class ON pg_class.relname = pg_indexes.indexname
JOIN pg_index ON pg_index.indexrelid = pg_class.oid
WHERE NOT pg_index.indisvalid;

-- Drop the invalid index, then retry
DROP INDEX CONCURRENTLY idx_orders_customer_email;
CREATE INDEX CONCURRENTLY idx_orders_customer_email ON orders(customer_email);

Step 5: If rollback is needed

# Restore from pre-migration snapshot (RDS)
aws rds restore-db-instance-from-db-snapshot \
  --db-instance-identifier prod-db-restored \
  --db-snapshot-identifier "pre-migration-20240315-1400"

# Point application at restored instance, verify data
# Then cut back DNS / connection string

What to Watch During a Migration [I]

Open these dashboards before starting any migration with risk > Low:

-- Terminal 1: watch lock waits in real time
SELECT
  now() - query_start AS wait_duration,
  pid,
  wait_event_type,
  wait_event,
  LEFT(query, 80) AS query
FROM pg_stat_activity
WHERE wait_event_type = 'Lock'
ORDER BY wait_duration DESC;

-- Terminal 2: watch replication lag
SELECT
  client_addr,
  state,
  sent_lsn - replay_lsn AS lag_bytes,
  replay_lag
FROM pg_stat_replication;

-- Terminal 3: disk usage (table rewrites can temporarily double disk)
SELECT
  pg_size_pretty(pg_total_relation_size('orders')) AS table_size,
  pg_size_pretty(pg_relation_size('orders')) AS heap_size;

Alert thresholds during migration: - Replication lag > 60s → pause batching, let replica catch up - Lock wait > 30s → assess if migration is blocking critical app queries - Disk usage growth > 20% during migration → check for unexpected table rewrite


Migration Checklist [A]

Before running a migration in production:

  • [ ] Migration tested on a production-sized copy of data
  • [ ] Estimated duration known (tested in staging)
  • [ ] Pre-migration snapshot taken (aws rds create-db-snapshot)
  • [ ] Rollback procedure documented (see table above)
  • [ ] Application code is backward-compatible with both old and new schema
  • [ ] Batching used for large data updates
  • [ ] CONCURRENTLY used for index creation
  • [ ] NOT VALID + VALIDATE CONSTRAINT for PostgreSQL constraints
  • [ ] Monitoring terminals open (locks, replication lag, disk)
  • [ ] DBA / DBRE aware (for SEV1-risk changes)
  • [ ] Runbook updated

MySQL Schema Changes — Planning & Execution [I]

Step 1: Determine which algorithm MySQL will use

Always test the algorithm before running on production. Add the algorithm hint — if MySQL can't honour it, the statement errors immediately rather than running a dangerous operation silently:

-- Test each tier without executing (ALGORITHM clause makes it a dry-run contract)

-- Tier 1: no lock, no copy, instant metadata change
ALTER TABLE orders ADD COLUMN notes TEXT, ALGORITHM=INSTANT;

-- Tier 2: no table copy, brief MDL at start/end only
ALTER TABLE orders ADD COLUMN notes TEXT, ALGORITHM=INPLACE, LOCK=NONE;

-- If both error → table must be copied. Use pt-osc or gh-ost (not native ALTER).
-- Never run ALGORITHM=COPY on a large table in production.

What each error means:

ERROR 1845: ALGORITHM=INSTANT is not supported  → try INPLACE
ERROR 1846: LOCK=NONE is not supported          → operation needs COPY → use pt-osc/gh-ost

Step 2: Tool selection

┌─────────────────────────────────────────────────────────────────────┐
│                   Which tool should I use?                          │
│                                                                     │
│  ALGORITHM=INSTANT works?                                           │
│    YES → run native ALTER (zero risk)                               │
│    NO  ↓                                                            │
│                                                                     │
│  ALGORITHM=INPLACE, LOCK=NONE works?                                │
│    YES → run native ALTER (brief MDL only, typically <1s)           │
│          acceptable for most tables regardless of size              │
│    NO  ↓  (operation requires COPY)                                 │
│                                                                     │
│  Table size < 1 GB AND write rate < 100 writes/sec?                 │
│    YES → pt-online-schema-change (simpler, good enough)             │
│    NO  → gh-ost (better throttling, binlog-based, pauseable)        │
└─────────────────────────────────────────────────────────────────────┘
pt-osc gh-ost
Mechanism Triggers on original table Reads binlog, no triggers
Trigger overhead ~10% write overhead None
Pause mid-migration No (must kill) Yes — echo throttle > /tmp/gh-ost.sock
Progress visibility Limited ETA, rows%, lag display
Works with triggers already on table No (pre-8.0) Yes
Replica lag throttling Yes (--max-lag) Yes (--max-lag-millis)
Best for Simpler ops, smaller tables Large tables, high-write primaries

Step 3: Estimate duration before you run

-- 1. Measure table size and row count
SELECT
  table_name,
  table_rows                                              AS est_rows,
  ROUND(data_length  / 1024 / 1024, 0)                  AS data_mb,
  ROUND(index_length / 1024 / 1024, 0)                  AS index_mb,
  ROUND((data_length + index_length) / 1024 / 1024 / 1024, 2) AS total_gb
FROM information_schema.tables
WHERE table_schema = DATABASE()
ORDER BY data_length DESC;
-- 2. Measure current write rate on the target table (run for 60s)
SELECT variable_value INTO @before FROM performance_schema.global_status
  WHERE variable_name = 'Innodb_rows_inserted';
SELECT SLEEP(60);
SELECT variable_value - @before AS inserts_per_min FROM performance_schema.global_status
  WHERE variable_name = 'Innodb_rows_inserted';

Duration estimates (rough; varies by disk I/O, row size, write load):

Tool Typical speed 10M rows 100M rows 500M rows
pt-osc (default chunk 1000) 200k–1M rows/min 10–50 min 1–8 h 8–40 h
gh-ost (chunk 1000, auto-tune) 300k–2M rows/min 5–30 min 1–6 h 5–30 h
INPLACE online DDL 5–50 GB/h minutes 1–3 h 3–15 h

Rule: always measure on a production-sized replica before running on primary. 1 hour on a 64GB RAM replica ≠ 1 hour on a loaded primary.

# Dry-run on replica to measure actual throughput (no writes to production)
gh-ost \
  --host=replica1 --user=root --password=rootpass \
  --database=shopdb --table=orders \
  --alter="ADD COLUMN notes TEXT" \
  --test-on-replica \           # runs against replica, stops at cutover
  --exact-rowcount \            # COUNT(*) for accurate ETA
  --chunk-size=1000 \
  --execute
# Watch output: "ETA: 2h14m  rows: 3,821,000/45,000,000 (8.5%)"

Step 4: Run pt-online-schema-change

# Dry run first — prints what it would do, touches nothing
pt-online-schema-change \
  --host=mysql-primary --user=root --password=rootpass \
  --database=shopdb \
  --table=orders \
  --alter="ADD COLUMN notes TEXT" \
  --dry-run

# Actual run with safety flags
pt-online-schema-change \
  --host=mysql-primary --user=root --password=rootpass \
  --database=shopdb \
  --table=orders \
  --alter="ADD COLUMN notes TEXT" \
  --chunk-size=1000 \           # rows per chunk (start conservative)
  --chunk-time=0.5 \            # target 0.5s per chunk (auto-adjusts chunk-size)
  --max-lag=5 \                 # pause if any replica is >5s behind
  --check-interval=5 \          # check replica lag every 5s
  --critical-load="Threads_running=50" \   # abort if load too high
  --max-load="Threads_running=25" \         # pause if load too high
  --set-vars="lock_wait_timeout=5" \        # don't wait >5s for MDL on cutover
  --no-drop-old-table \         # keep _orders_old for manual verification
  --execute

# Progress output:
# Copied rows: 50000/458000 (10.9%), 8000 rows/s, ETA 51s
# Copying rows...  100% 00:00 remain
# Swapping tables...
# Dropped triggers...

pt-osc what happens under the hood:

1. Creates _orders_new (copy of schema + your ALTER applied)
2. Adds 3 triggers on orders: AFTER INSERT/UPDATE/DELETE → replicate to _orders_new
3. Copies existing rows in chunks: INSERT INTO _orders_new SELECT ... FROM orders LIMIT chunk-size
4. When copy is 100%, atomically renames: orders → _orders_old, _orders_new → orders
5. Drops triggers. Optionally drops _orders_old.

Abort pt-osc mid-run: Ctrl+C — it cleans up triggers and temp table automatically.


Step 5: Run gh-ost

# Full production run
gh-ost \
  --host=mysql-primary --user=root --password=rootpass \
  --database=shopdb \
  --table=orders \
  --alter="ADD COLUMN notes TEXT" \
  --chunk-size=1000 \
  --max-lag-millis=1500 \         # pause if any replica lag > 1.5s
  --throttle-control-replicas="replica1,replica2" \  # watch these replicas
  --max-load="Threads_running=30" \
  --critical-load="Threads_running=80" \
  --serve-socket-file=/tmp/gh-ost.sock \   # control socket
  --postpone-cut-over-flag-file=/tmp/gh-ost.postpone \  # delay final cutover
  --ok-to-drop-table \
  --exact-rowcount \
  --verbose \
  --execute

# Progress output:
# [migrating] 2026/04/05 12:00:00 copy iteration 4500/45231; eta: 01:12:33; ETA: 13:12:33;
#   copied rows: 4,500,000; backlog: 0/100; lag: 0.3s

Control gh-ost mid-run via the socket:

# Pause migration (e.g. during peak traffic)
echo throttle | nc -U /tmp/gh-ost.sock

# Resume
echo no-throttle | nc -U /tmp/gh-ost.sock

# Change chunk-size on the fly
echo chunk-size=500 | nc -U /tmp/gh-ost.sock

# Delay final table cutover (gives you time to verify)
touch /tmp/gh-ost.postpone          # create flag file before running
# migration copies rows but waits at 100% — verify app still works
rm /tmp/gh-ost.postpone             # remove flag → cutover proceeds

# Abort completely (drops ghost table, no changes to original)
echo panic | nc -U /tmp/gh-ost.sock

gh-ost what happens under the hood:

1. Creates _orders_ghc (changelog table) and _orders_gho (shadow table with ALTER)
2. Connects to binlog stream on primary
3. Copies existing rows in chunks to _orders_gho
4. Simultaneously applies binlog events to _orders_gho (no triggers)
5. When copy is done, waits for binlog to catch up (lag → 0)
6. Atomic cutover: brief MDL (~0.5s), renames orders → _orders_del, _orders_gho → orders
7. Drops _orders_del and _orders_ghc

Step 6: Monitor during the change

Open these terminals while any large schema change runs:

-- Terminal 1: watch MDL (metadata lock) waits — anything waiting on the migrated table?
SELECT
  r.trx_mysql_thread_id                        AS waiting_thread,
  r.trx_query                                  AS waiting_query,
  b.trx_mysql_thread_id                        AS blocked_by,
  b.trx_query                                  AS blocker,
  TIMESTAMPDIFF(SECOND, r.trx_wait_started, NOW()) AS wait_sec
FROM information_schema.innodb_lock_waits w
JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id
JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id;
-- Terminal 2: replication lag on all replicas (pause pt-osc/gh-ost if > 5s)
SELECT @@hostname, Seconds_Behind_Source FROM performance_schema.replication_applier_status_by_worker\G
-- or simpler:
SHOW REPLICA STATUS\G   -- look for Seconds_Behind_Source
-- Terminal 3: disk usage — schema changes can temporarily double table size
SELECT
  table_name,
  ROUND((data_length + index_length) / 1024 / 1024, 0) AS total_mb,
  ROUND(data_free              / 1024 / 1024, 0) AS free_mb
FROM information_schema.tables
WHERE table_schema = 'shopdb'
ORDER BY data_length DESC;
# Terminal 4: watch disk at OS level
iostat -xm 2   # watch %util and await on MySQL data disk
df -h /var/lib/mysql

Stop thresholds: - Replication lag > 10s → pause (gh-ost auto-pauses; pt-osc auto-pauses with --max-lag) - Threads_running > 30 sustained → pause - Disk usage > 85% → stop immediately, free space before continuing


Schema Change Checklist (MySQL) [A]

  • [ ] Checked ALGORITHM=INSTANT — fastest, no risk
  • [ ] Checked ALGORITHM=INPLACE, LOCK=NONE — brief MDL only
  • [ ] If COPY needed: chose pt-osc or gh-ost based on table size/write rate
  • [ ] Table row count and size measured (information_schema.tables)
  • [ ] Duration estimated on staging with production-sized data
  • [ ] Pre-migration snapshot taken (aws rds create-db-snapshot or mysqldump)
  • [ ] Application code backward-compatible with both old and new schema
  • [ ] gh-ost postpone flag set — do NOT auto-cutover on first attempt
  • [ ] Replica lag monitoring terminal open
  • [ ] Disk usage terminal open
  • [ ] Replication lag thresholds configured (--max-lag / --max-lag-millis)
  • [ ] Runbook documented: what to do if abort is needed