DBRE — Database Reliability Engineering¶
DBRE applies SRE principles to databases. Databases are the most critical and least replaceable part of most systems. Your goal: keep them fast, available, correct, and operable.
Topics¶
| Topic | What you'll learn |
|---|---|
| Database Landscape | Every major engine — relational, NoSQL, cloud-native; pros/cons, decision framework |
| Fundamentals | DB reliability concepts, connection pooling, monitoring |
| SQL Best Practices | Query writing, anti-patterns, optimization |
| Anti-Patterns | Design and query anti-patterns (sqlcheck reference) |
| Security | Access control, encryption, secrets management, audit logging, PII/compliance |
| Performance Tuning | Query optimization, indexes, execution plans |
| Migrations & Schema Changes | INSTANT/INPLACE/pt-osc/gh-ost selection, duration estimation, zero-downtime execution |
| Backup & Recovery | mysqldump, XtraBackup, mydumper, PITR with GTIDs, zero-downtime table swap |
| Observability | Prometheus + mysqld_exporter, Grafana dashboards, alert hierarchy, replication lag |
| HA & Failover | HAProxy, ProxySQL routing, replica promotion, failover runbooks |
| Load Testing & VM Optimization | sysbench, mysqlslap, InnoDB tuning, OS settings, GCP VM sizing |
| Scaling Databases | Read replicas, ProxySQL connection pooling, sharding |
| Lab | Hands-on Docker lab — everything below in a running cluster |
| Best Practices | Schema design, queries, indexing, tools, naming conventions |
| External Links | PostgreSQL docs, Don't Do This wiki, sqlblog bad habits, modern-sql |
| Postmortem Template | Operational failures — outage, failover, replication break, backup failure |
| DEA Template | Defect Escape Analysis — defects that slipped through quality gates to production |
Lab¶
The lab is a self-contained Docker environment covering every topic end-to-end. No cloud account needed.
mysql-primary ──GTID──► mysql-replica1
└─GTID──► mysql-replica2
HAProxy :3306 writes → primary
:3307 reads → replicas (round-robin)
ProxySQL :6033 auto read/write split by query pattern
:6032 admin (MySQL protocol)
:6080 web UI (HTTPS, Digest auth)
Monitoring mysqld_exporter ×3 → Prometheus :9090 → Grafana :3000
MySQL Overview dashboard — time-series metrics
MySQL Processlist dashboard — live processlist, top queries, locks
toolkit container pt-*, gh-ost, xtrabackup, mydumper
mysql-tools mysqlbinlog, full MySQL 8.0.23 client suite
mysql57 MySQL 5.7 source for migration lab
Lab scripts¶
| Script | Topic |
|---|---|
01-setup-replication.sh |
GTID replication, auth-compat for ProxySQL + HAProxy |
02-test-replication.sh |
Verify replication, lag, read-only enforcement |
03-test-haproxy.sh |
Static read/write split, health checks, failover |
04-test-proxysql.sh |
Query-aware routing, admin interface, web UI |
05-backups.sh |
mysqldump, restore, PITR walkthrough, RENAME TABLE swap |
06-fast-backups.sh |
XtraBackup (full + incremental), mydumper, snapshot approach |
07-failover.sh |
Crash simulation, replica promotion, re-topology |
08-parallel-writes.sh |
Lock contention, deadlocks, INNODB STATUS |
09-percona-toolkit.sh |
pt-mysql-summary, pt-duplicate-key-checker, pt-table-checksum, pt-table-sync, pt-osc, pt-query-digest |
10-schema-changes.sh |
INSTANT/INPLACE dry runs, pt-osc, gh-ost with postponed cutover |
11-mysql5-to-8-migration.sh |
Zero-downtime major version upgrade via cross-version replication + ProxySQL cutover |
→ Lab runbook — full setup, commands, one-liners, tear down.
Learning Path¶
[B] Database Landscape → Fundamentals → SQL Best Practices → Backup & Recovery → Security
[I] Performance Tuning → Migrations → Observability → HA & Failover → Best Practices
[A] Scaling → Load Testing → Lab (hands-on) → Postmortem practice
Key Resources¶
- percona-toolkit — Battle-tested MySQL/PostgreSQL tools
- sql-guide — SQL interview Q&A
- sqlcheck — SQL anti-pattern detection
- sql-tips-and-tricks — Practical SQL tips and tricks
- awesome-mysql — MySQL queries, commands and snippets
- awesome-postgres — PostgreSQL resources and tools
- awesome-mongodb — MongoDB resources and tools
- awesome-redis — Redis resources and tools
- awesome-nosql-guides — NoSQL patterns and guides
- sqlstyle-guide — SQL style guide for consistent formatting
- data-engineer-handbook — Data engineering context
- awesome-scalability — DB scaling patterns
- atlassian-incident-handbook — Postmortem framework, Five Whys, blameless culture