Scalability¶

Scalability Fundamentals `[B]`¶

Scalability = the ability of a system to handle growing load.

Two dimensions: - Vertical scaling (scale up) — bigger machine (more CPU/RAM) - Horizontal scaling (scale out) — more machines

	Vertical	Horizontal
Limit	Hardware ceiling	Theoretically unlimited
Complexity	Simple	Requires statelessness, load balancing
Cost	Expensive at scale	More predictable
Downtime	Often requires restart	Rolling updates possible

Capacity Planning `[I]`¶

Capacity planning = knowing how much headroom you have before hitting limits.

Process: 1. Define resource metrics (CPU, memory, connections, QPS) 2. Measure current utilization and growth rate 3. Project when you'll hit limits 4. Plan scaling actions before you get there

Current QPS: 10,000
Growth: +15% / month
System limit: 25,000 QPS
Time to limit: ~6 months
Action: start horizontal scaling project in 3 months

SLO-based capacity planning: - Run load tests to find breaking points - Set capacity targets at 70% utilization (30% headroom)

→ See DBRE: Scaling Databases for database-specific capacity planning.

Load Testing `[I]`¶

Test your system before traffic tests it for you.

Types: - Load test — expected peak traffic - Stress test — beyond expected peak (find breaking point) - Soak test — sustained load over time (find memory leaks) - Spike test — sudden traffic surge

Tools: k6, Locust, Apache JMeter, Gatling, wrk, hey

# k6 example
k6 run --vus 100 --duration 30s load-test.js

# hey example
hey -n 10000 -c 200 https://api.example.com/endpoint

What to watch during load tests: - Error rate (should stay < SLO threshold) - p99 latency - CPU and memory trend - Connection pool exhaustion - Garbage collection pauses

Scalability Patterns `[I]`¶

Caching¶

Reduce load on downstream services by caching responses.

Strategy	When	Tools
CDN	Static assets, public API responses	CloudFront, Fastly
Application cache	Repeated computation	Redis, Memcached
DB query cache	Expensive reads	Redis, query result cache

Cache invalidation is hard. Consider TTL, write-through, or cache-aside patterns.

Rate Limiting¶

Protect services from overload: - Per-user or per-IP rate limits - Token bucket / leaky bucket algorithms - 429 Too Many Requests responses

Circuit Breakers¶

Prevent cascade failures: - If downstream is failing, stop calling it - Return cached/default response instead - Allow periodic probes to check recovery

Closed → Open → Half-Open → Closed
(normal) (failing) (testing)  (recovered)

Tools: Hystrix, Resilience4j, Envoy, Istio

Bulkhead Pattern¶

Isolate failures to parts of the system: - Separate thread pools per downstream - Separate connection pools per tenant - Prevents one bad service from exhausting all resources

Database Scalability `[I]`¶

Quick reference — see DBRE: Scaling for depth.

Read replicas — scale reads horizontally
Connection pooling — PgBouncer, ProxySQL
Sharding — partition data across multiple DBs
Caching layer — Redis in front of DB
CQRS — separate read/write models

Latency Numbers Every Engineer Should Know `[I]`¶

L1 cache reference:              0.5 ns
L2 cache reference:              7 ns
Main memory reference:           100 ns
SSD random read:                 150 µs
Network round trip (same DC):    500 µs
Network round trip (cross-DC):   5-150 ms
SSD sequential read (1MB):       1 ms
HDD seek:                        10 ms

Practical implications: - DB call in same DC: ~1ms (network + query) - Avoid cross-region DB calls in hot paths - In-process cache (L1/L2): essentially free vs memory lookup

Distributed Systems Concepts `[A]`¶

CAP Theorem¶

A distributed system can only guarantee 2 of 3: - Consistency — all nodes see the same data - Availability — every request gets a response - Partition tolerance — system works despite network splits

In practice: partitions happen, so choose between C and A.

PACELC Extension¶

Even without partitions, tradeoff between latency (L) and consistency (C).

Consistency Models¶

Model	Guarantee	Example
Strong	All reads see latest write	Single leader DB
Eventual	All nodes converge eventually	DNS, Cassandra
Causal	Causally related ops ordered	DynamoDB

Consistent Hashing¶

Used in distributed caches and sharded systems to minimize reshuffling when nodes are added/removed.

Traditional: add node → remap ~50% of keys
Consistent hashing: add node → remap only 1/N keys (N = node count)

Used by: Amazon DynamoDB, Apache Cassandra, Redis Cluster, CDN routing.

ACID vs BASE¶

	ACID	BASE
Stands for	Atomicity, Consistency, Isolation, Durability	Basically Available, Soft state, Eventually consistent
Approach	Strong guarantees, pessimistic	High availability, optimistic
Examples	PostgreSQL, MySQL	Cassandra, DynamoDB, CouchDB
Use for	Financial data, orders	User sessions, analytics, recommendations

System Design Checklist `[A]`¶

When designing for scale, ask:

[ ] What's the expected QPS? Peak? p99 latency requirement?
[ ] Which components are stateful? How is state managed?
[ ] What happens if service X goes down? (Graceful degradation)
[ ] Where are the synchronous call chains? (Latency cliff)
[ ] Where is the data? Can it be cached?
[ ] What are the DB bottlenecks? (Connections, locks, hot rows)
[ ] Are there single points of failure?
[ ] How does the system behave at 10x current load?

Real-World Scalability Patterns `[A]`¶

From awesome-scalability and howtheysre:

Company	Scale Problem	Solution
Netflix	Global video streaming	Regionalized architecture, EVCache (Memcached at scale), Hystrix circuit breaker
WhatsApp	2B users, messaging	Erlang, 2M connections/server, minimal moving parts
Discord	2.5M concurrent voice users	Go → Rust for hot path, sharded guild model
Instagram	1B users on PostgreSQL	Sharding by user_id, Django + PostgreSQL (no NoSQL)
Twitter	150M users, timeline fanout	Redis timelines, hybrid push/pull model
Cloudflare	50M+ req/s	Anycast routing, edge caching, eBPF for DDoS mitigation

Key lesson: Most companies solved scale with boring technology + smart architecture, not exotic databases.

SLOs / SLIs / SLAs
Observability
DBRE: Scaling Databases
Platform: Kubernetes
awesome-scalability — deep resource library
howtheysre — how companies scaled

Scalability¶

Scalability Fundamentals [B]¶

Capacity Planning [I]¶

Load Testing [I]¶

Scalability Patterns [I]¶