Scalability¶
Scalability Fundamentals [B]¶
Scalability = the ability of a system to handle growing load.
Two dimensions: - Vertical scaling (scale up) — bigger machine (more CPU/RAM) - Horizontal scaling (scale out) — more machines
| Vertical | Horizontal | |
|---|---|---|
| Limit | Hardware ceiling | Theoretically unlimited |
| Complexity | Simple | Requires statelessness, load balancing |
| Cost | Expensive at scale | More predictable |
| Downtime | Often requires restart | Rolling updates possible |
Capacity Planning [I]¶
Capacity planning = knowing how much headroom you have before hitting limits.
Process: 1. Define resource metrics (CPU, memory, connections, QPS) 2. Measure current utilization and growth rate 3. Project when you'll hit limits 4. Plan scaling actions before you get there
Current QPS: 10,000
Growth: +15% / month
System limit: 25,000 QPS
Time to limit: ~6 months
Action: start horizontal scaling project in 3 months
SLO-based capacity planning: - Run load tests to find breaking points - Set capacity targets at 70% utilization (30% headroom)
→ See DBRE: Scaling Databases for database-specific capacity planning.
Load Testing [I]¶
Test your system before traffic tests it for you.
Types: - Load test — expected peak traffic - Stress test — beyond expected peak (find breaking point) - Soak test — sustained load over time (find memory leaks) - Spike test — sudden traffic surge
Tools: k6, Locust, Apache JMeter, Gatling, wrk, hey
# k6 example
k6 run --vus 100 --duration 30s load-test.js
# hey example
hey -n 10000 -c 200 https://api.example.com/endpoint
What to watch during load tests: - Error rate (should stay < SLO threshold) - p99 latency - CPU and memory trend - Connection pool exhaustion - Garbage collection pauses
Scalability Patterns [I]¶
Caching¶
Reduce load on downstream services by caching responses.
| Strategy | When | Tools |
|---|---|---|
| CDN | Static assets, public API responses | CloudFront, Fastly |
| Application cache | Repeated computation | Redis, Memcached |
| DB query cache | Expensive reads | Redis, query result cache |
Cache invalidation is hard. Consider TTL, write-through, or cache-aside patterns.
Rate Limiting¶
Protect services from overload: - Per-user or per-IP rate limits - Token bucket / leaky bucket algorithms - 429 Too Many Requests responses
Circuit Breakers¶
Prevent cascade failures: - If downstream is failing, stop calling it - Return cached/default response instead - Allow periodic probes to check recovery
Tools: Hystrix, Resilience4j, Envoy, Istio
Bulkhead Pattern¶
Isolate failures to parts of the system: - Separate thread pools per downstream - Separate connection pools per tenant - Prevents one bad service from exhausting all resources
Database Scalability [I]¶
Quick reference — see DBRE: Scaling for depth.
- Read replicas — scale reads horizontally
- Connection pooling — PgBouncer, ProxySQL
- Sharding — partition data across multiple DBs
- Caching layer — Redis in front of DB
- CQRS — separate read/write models
Latency Numbers Every Engineer Should Know [I]¶
L1 cache reference: 0.5 ns
L2 cache reference: 7 ns
Main memory reference: 100 ns
SSD random read: 150 µs
Network round trip (same DC): 500 µs
Network round trip (cross-DC): 5-150 ms
SSD sequential read (1MB): 1 ms
HDD seek: 10 ms
Practical implications: - DB call in same DC: ~1ms (network + query) - Avoid cross-region DB calls in hot paths - In-process cache (L1/L2): essentially free vs memory lookup
Distributed Systems Concepts [A]¶
CAP Theorem¶
A distributed system can only guarantee 2 of 3: - Consistency — all nodes see the same data - Availability — every request gets a response - Partition tolerance — system works despite network splits
In practice: partitions happen, so choose between C and A.
PACELC Extension¶
Even without partitions, tradeoff between latency (L) and consistency (C).
Consistency Models¶
| Model | Guarantee | Example |
|---|---|---|
| Strong | All reads see latest write | Single leader DB |
| Eventual | All nodes converge eventually | DNS, Cassandra |
| Causal | Causally related ops ordered | DynamoDB |
Consistent Hashing¶
Used in distributed caches and sharded systems to minimize reshuffling when nodes are added/removed.
Traditional: add node → remap ~50% of keys
Consistent hashing: add node → remap only 1/N keys (N = node count)
Used by: Amazon DynamoDB, Apache Cassandra, Redis Cluster, CDN routing.
ACID vs BASE¶
| ACID | BASE | |
|---|---|---|
| Stands for | Atomicity, Consistency, Isolation, Durability | Basically Available, Soft state, Eventually consistent |
| Approach | Strong guarantees, pessimistic | High availability, optimistic |
| Examples | PostgreSQL, MySQL | Cassandra, DynamoDB, CouchDB |
| Use for | Financial data, orders | User sessions, analytics, recommendations |
System Design Checklist [A]¶
When designing for scale, ask:
- [ ] What's the expected QPS? Peak? p99 latency requirement?
- [ ] Which components are stateful? How is state managed?
- [ ] What happens if service X goes down? (Graceful degradation)
- [ ] Where are the synchronous call chains? (Latency cliff)
- [ ] Where is the data? Can it be cached?
- [ ] What are the DB bottlenecks? (Connections, locks, hot rows)
- [ ] Are there single points of failure?
- [ ] How does the system behave at 10x current load?
Real-World Scalability Patterns [A]¶
From awesome-scalability and howtheysre:
| Company | Scale Problem | Solution |
|---|---|---|
| Netflix | Global video streaming | Regionalized architecture, EVCache (Memcached at scale), Hystrix circuit breaker |
| 2B users, messaging | Erlang, 2M connections/server, minimal moving parts | |
| Discord | 2.5M concurrent voice users | Go → Rust for hot path, sharded guild model |
| 1B users on PostgreSQL | Sharding by user_id, Django + PostgreSQL (no NoSQL) | |
| 150M users, timeline fanout | Redis timelines, hybrid push/pull model | |
| Cloudflare | 50M+ req/s | Anycast routing, edge caching, eBPF for DDoS mitigation |
Key lesson: Most companies solved scale with boring technology + smart architecture, not exotic databases.
Related Topics¶
- SLOs / SLIs / SLAs
- Observability
- DBRE: Scaling Databases
- Platform: Kubernetes
- awesome-scalability — deep resource library
- howtheysre — how companies scaled