Skip to content

DBRE — Database Reliability Engineering

← Home

DBRE applies SRE principles to databases. Databases are the most critical and least replaceable part of most systems. Your goal: keep them fast, available, correct, and operable.


Topics

Topic What you'll learn
Database Landscape Every major engine — relational, NoSQL, cloud-native; pros/cons, decision framework
Fundamentals DB reliability concepts, connection pooling, monitoring
SQL Best Practices Query writing, anti-patterns, optimization
Anti-Patterns Design and query anti-patterns (sqlcheck reference)
Security Access control, encryption, secrets management, audit logging, PII/compliance
Performance Tuning Query optimization, indexes, execution plans
Migrations & Schema Changes INSTANT/INPLACE/pt-osc/gh-ost selection, duration estimation, zero-downtime execution
Backup & Recovery mysqldump, XtraBackup, mydumper, PITR with GTIDs, zero-downtime table swap
Observability Prometheus + mysqld_exporter, Grafana dashboards, alert hierarchy, replication lag
HA & Failover HAProxy, ProxySQL routing, replica promotion, failover runbooks
Load Testing & VM Optimization sysbench, mysqlslap, InnoDB tuning, OS settings, GCP VM sizing
Scaling Databases Read replicas, ProxySQL connection pooling, sharding
Lab Hands-on Docker lab — everything below in a running cluster
Best Practices Schema design, queries, indexing, tools, naming conventions
External Links PostgreSQL docs, Don't Do This wiki, sqlblog bad habits, modern-sql
Postmortem Template Operational failures — outage, failover, replication break, backup failure
DEA Template Defect Escape Analysis — defects that slipped through quality gates to production

Lab

The lab is a self-contained Docker environment covering every topic end-to-end. No cloud account needed.

mysql-primary ──GTID──► mysql-replica1
              └─GTID──► mysql-replica2

HAProxy    :3306 writes → primary
           :3307 reads  → replicas (round-robin)

ProxySQL   :6033 auto read/write split by query pattern
           :6032 admin (MySQL protocol)
           :6080 web UI (HTTPS, Digest auth)

Monitoring  mysqld_exporter ×3 → Prometheus :9090 → Grafana :3000
            MySQL Overview dashboard   — time-series metrics
            MySQL Processlist dashboard — live processlist, top queries, locks

toolkit container  pt-*, gh-ost, xtrabackup, mydumper
mysql-tools        mysqlbinlog, full MySQL 8.0.23 client suite
mysql57            MySQL 5.7 source for migration lab

Lab scripts

Script Topic
01-setup-replication.sh GTID replication, auth-compat for ProxySQL + HAProxy
02-test-replication.sh Verify replication, lag, read-only enforcement
03-test-haproxy.sh Static read/write split, health checks, failover
04-test-proxysql.sh Query-aware routing, admin interface, web UI
05-backups.sh mysqldump, restore, PITR walkthrough, RENAME TABLE swap
06-fast-backups.sh XtraBackup (full + incremental), mydumper, snapshot approach
07-failover.sh Crash simulation, replica promotion, re-topology
08-parallel-writes.sh Lock contention, deadlocks, INNODB STATUS
09-percona-toolkit.sh pt-mysql-summary, pt-duplicate-key-checker, pt-table-checksum, pt-table-sync, pt-osc, pt-query-digest
10-schema-changes.sh INSTANT/INPLACE dry runs, pt-osc, gh-ost with postponed cutover
11-mysql5-to-8-migration.sh Zero-downtime major version upgrade via cross-version replication + ProxySQL cutover

→ Lab runbook — full setup, commands, one-liners, tear down.


Learning Path

[B] Database Landscape → Fundamentals → SQL Best Practices → Backup & Recovery → Security
[I] Performance Tuning → Migrations → Observability → HA & Failover → Best Practices
[A] Scaling → Load Testing → Lab (hands-on) → Postmortem practice

Key Resources


← SRE | ← Platform