PostgreSQL Performance Tuning: Indexing, Vacuum, Partitioning, and pgBadger
When to tune
Default PostgreSQL config is conservative — safe for any hardware but underuses modern VPS resources. Tune when:
- Queries suddenly slow after table grew past 1M rows
pg_stat_activityshows manywaitingsessions- CPU spikes during business hours
- Monthly bandwidth climbs (often: bloat + lack of VACUUM)
- Long-running maintenance tasks blocking writes
If your site is small (<100K rows) and fast, don't tune. Default PG works.
Level 1 — Right-size memory settings
/var/lib/pgsql/data/postgresql.conf (or /etc/postgresql/16/main/postgresql.conf on Ubuntu):
| Setting | Default | 4 GB VPS | 8 GB VPS | 16 GB VPS |
|---|---|---|---|---|
shared_buffers | 128MB | 1GB | 2GB | 4GB |
effective_cache_size | 4GB | 3GB | 6GB | 12GB |
work_mem | 4MB | 8MB | 16MB | 32MB |
maintenance_work_mem | 64MB | 256MB | 512MB | 1GB |
max_connections | 100 | 100 | 200 | 300 |
random_page_cost | 4.0 | 1.1 (SSD) | 1.1 | 1.1 |
effective_io_concurrency | 1 | 200 (SSD) | 200 | 200 |
Apply and restart:
sudo systemctl restart postgresqlpgtune online tool gives a starting config for your RAM/workload.
Level 2 — Indexing strategies
b-tree (default) — equality + range
-- Fast for:
-- WHERE email = ...
-- WHERE created_at > '2026-01-01'
-- ORDER BY price
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_created ON orders(created_at);Composite — multi-column
-- Fast for WHERE status='paid' AND user_id=42 ORDER BY created_at
CREATE INDEX idx_orders_status_user_created
ON orders(status, user_id, created_at DESC);Column order matters. Most selective first, then by query pattern.
Partial — index only relevant rows
Huge win for sparse columns:
-- 95% of orders are 'paid' — only index unusual ones
CREATE INDEX idx_orders_pending
ON orders(id)
WHERE status != 'paid';Index is smaller → faster writes, smaller reads, better cache use.
GIN — full-text search, JSONB, arrays
-- Search JSON metadata
CREATE INDEX idx_products_meta
ON products USING gin(metadata);
SELECT * FROM products WHERE metadata @> '{"category": "electronics"}';
-- Full-text search
ALTER TABLE articles ADD COLUMN tsv tsvector
GENERATED ALWAYS AS (to_tsvector('english', title || ' ' || body)) STORED;
CREATE INDEX idx_articles_tsv ON articles USING gin(tsv);
SELECT * FROM articles WHERE tsv @@ plainto_tsquery('english', 'hosting tips');BRIN — huge append-only tables
-- Log table with 100M rows by time
CREATE INDEX idx_logs_created_brin
ON logs USING brin(created_at);BRIN index is tiny (KB) vs GB for b-tree. Best when data is physically clustered by the indexed column (append-only).
Covering — index-only scans
-- Query: SELECT id, email FROM users WHERE status = 'active'
CREATE INDEX idx_users_active_covering
ON users(status) INCLUDE(id, email)
WHERE status = 'active';PostgreSQL returns results without touching the table — huge speedup.
Level 3 — EXPLAIN ANALYZE
Before tuning, measure. Always.
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT u.name, COUNT(o.id)
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > now() - interval '30 days'
GROUP BY u.name;Look for:
Seq Scanon big tables = missing indexNested Loopon big tables = wrong join strategy; force withenable_hashjoinor add stats- Actual rows >> estimated rows = outdated statistics (run
ANALYZE) Buffers: shared read=10000= data not cached; largeshared_buffershelps
Visualise plans at explain.depesz.com or pgmustard.com.
Level 4 — Vacuum strategy
PostgreSQL's MVCC keeps "dead tuples" after UPDATE/DELETE. Autovacuum removes them. Without it: bloat grows, queries slow down.
Check bloat:
SELECT schemaname, relname,
pg_size_pretty(pg_relation_size(relid)) AS size,
n_dead_tup,
last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;If n_dead_tup > 20% of rowcount, autovacuum is behind. Tune:
# postgresql.conf
autovacuum_max_workers = 4
autovacuum_vacuum_scale_factor = 0.05 # vacuum at 5% dead (default 20%)
autovacuum_analyze_scale_factor = 0.02
autovacuum_vacuum_cost_limit = 2000 # aggressiveFor high-write tables, set per-table:
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.02,
autovacuum_analyze_scale_factor = 0.01
);Manual vacuum for recovery:
VACUUM (VERBOSE, ANALYZE) orders;
-- Nuclear option (locks table briefly, reclaims disk):
VACUUM FULL orders;Don't VACUUM FULL on live production large tables — it takes an exclusive lock for potentially hours.
Level 5 — Partitioning huge tables
When a single table > 50M rows, partitioning by time often helps.
-- Range partition by month
CREATE TABLE events (
id bigserial,
created_at timestamptz NOT NULL,
user_id uuid,
data jsonb
) PARTITION BY RANGE (created_at);
CREATE TABLE events_2026_04 PARTITION OF events
FOR VALUES FROM ('2026-04-01') TO ('2026-05-01');
CREATE TABLE events_2026_05 PARTITION OF events
FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');
-- Indexes on each partition
CREATE INDEX ON events_2026_04 (user_id);
CREATE INDEX ON events_2026_05 (user_id);Use pg_partman extension to auto-create future partitions + drop old ones.
Benefits:
- Query for "last 30 days" hits 1 partition, not all
DROP TABLE events_2023_01instantly reclaims old data- Individual partition vacuum/index maintenance
Level 6 — Connection pooling
Every new connection costs 1-2 MB RAM + startup time. Use PgBouncer for pooling:
sudo dnf install pgbouncer
# /etc/pgbouncer/pgbouncer.ini[databases]
mydb = host=localhost port=5432 dbname=mydb
[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20App connects to localhost:6432 instead of :5432. 1000 app connections → 20 real PG connections.
Level 7 — Slow query log + pgBadger
Log slow queries:
# postgresql.conf
log_min_duration_statement = 500 # ms — log queries slower than this
log_line_prefix = '%t [%p]: db=%d,user=%u,app=%a,client=%h '
log_statement = 'ddl'
log_duration = offAnalyse logs with pgBadger (HTML reports):
sudo dnf install pgbadger
pgbadger /var/log/postgresql/postgresql-*.log -o /var/www/html/pgbadger.htmlServe via nginx with basic auth. See: slowest queries, most-called, peak hours, errors.
Level 8 — Replication + read replicas
For read-heavy apps, add a streaming replica:
# On primary
wal_level = replica
max_wal_senders = 10
wal_keep_size = 1GB
# On replica, base backup + start:
pg_basebackup -h primary-ip -D /var/lib/pgsql/17/data -U replicator -P -R
sudo systemctl start postgresqlRoute read queries to replica in your app:
write_engine = create_engine(PRIMARY_URL)
read_engine = create_engine(REPLICA_URL)
# Use read_engine for SELECTs, write_engine for writesCommon pitfalls
FAQ
If you have no slow queries and CPU is <30% during peak — don't tune. If queries occasionally exceed 1 second or CPU >70% sustained — start tuning.
For modern apps: Postgres. Better JSON, strong consistency guarantees, GIN indexes, stored procedures, materialised views. MySQL is simpler but less capable. Use MySQL only if you're locked in (WordPress, older apps).
pg_repack rebuilds tables without exclusive lock — production-safe. VACUUM FULL locks — emergency only.
pgbench (ships with Postgres) for generic OLTP load. For real workload, replay slow query log against staging.
For <1 TB data + <1K req/sec, an engineer + these tuning patterns is enough. At scale (multi-TB, 10K+ req/sec), a DBA pays for themselves.
Run production Postgres on a DomainIndia VPS. Start with VPS