Database Anti-Patterns¶
Based on sqlcheck — detects these automatically. Run it on your SQL before deploying.
# Install and run sqlcheck
sqlcheck -f queries.sql
sqlcheck -f queries.sql -v # verbose, explains each anti-pattern
Category 1: Logical Design Anti-Patterns [I]¶
Problems in how you model data — hard to fix later.
Multi-Valued Attributes¶
Storing multiple values in a single column:
-- BAD: comma-separated tags in one column
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
tags VARCHAR(500) -- "python,postgres,sre"
);
-- Can't index, can't query efficiently, hard to join
-- GOOD: separate table
CREATE TABLE article_tags (
article_id INTEGER REFERENCES articles(id),
tag VARCHAR(50) NOT NULL,
PRIMARY KEY (article_id, tag)
);
Entity-Attribute-Value (EAV)¶
-- BAD: EAV pattern — looks flexible, is a nightmare
CREATE TABLE user_properties (
user_id INTEGER,
property_name VARCHAR(100),
property_value TEXT
);
-- No type safety, terrible query performance, no constraints
-- GOOD: explicit columns, or JSONB for truly dynamic attributes
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
email VARCHAR(255) NOT NULL,
metadata JSONB -- for genuinely flexible attributes
);
CREATE INDEX CONCURRENTLY idx_users_metadata ON users USING GIN(metadata);
Missing Primary Keys¶
Every table must have a primary key. Without it: - Replication can break (MySQL/Postgres) - Duplicate rows can silently accumulate - Joins become ambiguous
Generic Primary Keys¶
-- BAD: generic column name obscures meaning
CREATE TABLE orders (
id INTEGER PRIMARY KEY, -- id of what?
-- ...
);
-- BETTER: use id conventionally but be consistent
-- If you use "id" everywhere, it's fine — just be consistent
-- Avoid: entity_id, record_id, key, pk
Recursive Dependencies¶
Self-referencing foreign keys without care:
-- Hierarchical data (categories, org charts)
-- Fine to do, but needs care with deletes and queries
CREATE TABLE categories (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
parent_id INTEGER REFERENCES categories(id) ON DELETE SET NULL
);
-- Query with recursive CTE (see sql.md)
Category 2: Physical Design Anti-Patterns [I]¶
How data is stored and indexed.
Imprecise Data Types¶
-- BAD: storing money as float (floating point errors)
price FLOAT
-- GOOD: use NUMERIC/DECIMAL for money
price NUMERIC(10, 2)
-- BAD: storing dates as strings
created_at VARCHAR(20) -- "2024-01-15"
-- GOOD: use proper date types
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
-- BAD: storing booleans as 0/1 integers
is_active INTEGER
-- GOOD:
is_active BOOLEAN NOT NULL DEFAULT TRUE
Too Many Indexes¶
Every index: - Slows down writes (INSERT/UPDATE/DELETE must update all indexes) - Consumes disk space and memory
-- Audit index usage periodically
SELECT indexname, idx_scan
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan ASC;
-- Drop indexes with 0 scans (after confirming not needed)
DROP INDEX CONCURRENTLY idx_orders_old_column;
Rule of thumb: If an index hasn't been used in 30 days, it's probably not needed.
Wrong Index Column Order¶
-- Query pattern: WHERE status = 'active' AND created_at > '2024-01-01'
-- Equality column FIRST, range column SECOND
CREATE INDEX idx_orders_status_created ON orders(status, created_at);
-- The reverse index (created_at, status) is far less useful for this query
Category 3: Query Anti-Patterns [B]¶
Problems in how you write SQL — catchable in code review.
See SQL Best Practices for full treatment. Quick reference:
SELECT * (Implicit Columns)¶
Why: Wastes network, prevents index-only scans, breaks if columns change.
NULL Misuse¶
-- BAD: this doesn't catch NULLs
WHERE status <> 'cancelled'
-- GOOD
WHERE status <> 'cancelled' OR status IS NULL
-- BAD: NULL comparison always returns NULL (never TRUE)
WHERE last_login = NULL
-- GOOD
WHERE last_login IS NULL
ORDER BY RAND()¶
-- BAD: full table scan + sort every time
SELECT * FROM products ORDER BY RAND() LIMIT 10;
-- GOOD: keyset random sampling
SELECT * FROM products
WHERE id >= (SELECT FLOOR(RANDOM() * MAX(id)) FROM products)
LIMIT 10;
Pattern Matching with Leading Wildcard¶
-- BAD: can't use index
WHERE email LIKE '%@gmail.com'
-- GOOD: use pg_trgm for arbitrary pattern matching with indexing
CREATE EXTENSION pg_trgm;
CREATE INDEX CONCURRENTLY idx_users_email_trgm ON users USING GIN(email gin_trgm_ops);
-- Now LIKE '%@gmail.com' can use the trigram index
Excessive JOINs¶
-- If you have > 5 JOINs in a query, ask:
-- 1. Is the schema denormalized enough for the query pattern?
-- 2. Should this be a materialized view?
-- 3. Is this OLAP query running against an OLTP database?
Unnecessary DISTINCT¶
-- BAD: DISTINCT masking a JOIN problem (cartesian product)
SELECT DISTINCT o.id
FROM orders o
JOIN order_items oi ON oi.order_id = o.id;
-- DISTINCT here hides that the JOIN multiplies rows
-- GOOD: use EXISTS or subquery
SELECT o.id
FROM orders o
WHERE EXISTS (SELECT 1 FROM order_items oi WHERE oi.order_id = o.id);
Nested Subqueries (Correlated)¶
-- BAD: correlated subquery runs once per row (N×M)
SELECT o.id,
(SELECT SUM(quantity) FROM order_items oi WHERE oi.order_id = o.id) AS item_count
FROM orders o;
-- GOOD: JOIN with aggregation
SELECT o.id, COALESCE(oi.item_count, 0)
FROM orders o
LEFT JOIN (
SELECT order_id, SUM(quantity) AS item_count
FROM order_items
GROUP BY order_id
) oi ON oi.order_id = o.id;
Category 4: Application Anti-Patterns [B]¶
Readable Passwords¶
-- NEVER store plaintext passwords
CREATE TABLE users (
password VARCHAR(255) -- if storing the actual password text, THIS IS WRONG
);
-- ALWAYS store hashed passwords
-- Use bcrypt, argon2, or scrypt at the application layer
-- Never hash in SQL (MD5/SHA are not password hashes)
No Connection Pooling¶
Connecting directly from app to DB without pooling: - Each connection costs ~5-10MB RAM on the DB - Connection establishment adds ~10-50ms latency - DB crashes under connection storms
→ See Fundamentals: Connection Pooling
String Building SQL (SQL Injection)¶
# BAD: SQL injection vulnerability
query = f"SELECT * FROM users WHERE name = '{user_input}'"
# GOOD: parameterized queries
cursor.execute("SELECT * FROM users WHERE name = %s", (user_input,))
Running SQLCheck¶
# Run against a file
sqlcheck -f schema.sql
sqlcheck -f queries.sql -v
# Common findings and what they mean:
# [critical] SELECT * usage → specify columns
# [critical] NULL comparison → use IS NULL / IS NOT NULL
# [major] No primary key → add PK to every table
# [major] Multi-valued attribute → normalize the data model
# [minor] Implicit column usage → be explicit in INSERTs
→ sqlcheck for full documentation and installation.
Related Topics¶
- SQL Best Practices
- Performance Tuning — indexing in depth
- Migrations & Schema Changes — fixing anti-patterns safely
- sqlcheck submodule
- sql-guide submodule