PostgreSQL Performance Tuning: Indexing, Vacuum, Partitioning, and pgBadger

Q: How do I benchmark?

`pgbench` (ships with Postgres) for generic OLTP load. For real workload, replay slow query log against staging.

Q: Do I need a DBA?

For <1 TB data + <1K req/sec, an engineer + these tuning patterns is enough. At scale (multi-TB, 10K+ req/sec), a DBA pays for themselves.

ByDomain India Team·DomainIndia Engineering

6 min readPublished 21 Apr 2026Updated 12 Jul 2026401 views

PostgreSQL Performance Tuning: Indexing, Vacuum, Partitioning, and pgBadger

TL;DR

PostgreSQL handles 10× its apparent throughput if tuned right. This guide covers the four levers that matter: correct indexing (b-tree, partial, GIN, BRIN), vacuum + autovacuum strategy, partitioning huge tables, and analysing slow queries with pgBadger — all on DomainIndia VPS.

When to tune

Default PostgreSQL config is conservative — safe for any hardware but underuses modern VPS resources. Tune when:

Queries suddenly slow after table grew past 1M rows
pg_stat_activity shows many waiting sessions
CPU spikes during business hours
Monthly bandwidth climbs (often: bloat + lack of VACUUM)
Long-running maintenance tasks blocking writes

If your site is small (<100K rows) and fast, don't tune. Default PG works.

Level 1 — Right-size memory settings

/var/lib/pgsql/data/postgresql.conf (or /etc/postgresql/16/main/postgresql.conf on Ubuntu):

Setting	Default	4 GB VPS	8 GB VPS	16 GB VPS
`shared_buffers`	128MB	1GB	2GB	4GB
`effective_cache_size`	4GB	3GB	6GB	12GB
`work_mem`	4MB	8MB	16MB	32MB
`maintenance_work_mem`	64MB	256MB	512MB	1GB
`max_connections`	100	100	200	300
`random_page_cost`	4.0	1.1 (SSD)	1.1	1.1
`effective_io_concurrency`	1	200 (SSD)	200	200

Apply and restart:

bash

sudo systemctl restart postgresql

pgtune online tool gives a starting config for your RAM/workload.

Level 2 — Indexing strategies

b-tree (default) — equality + range

sql

-- Fast for:
--   WHERE email = ...
--   WHERE created_at > '2026-01-01'
--   ORDER BY price
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_created ON orders(created_at);

Composite — multi-column

sql

-- Fast for WHERE status='paid' AND user_id=42 ORDER BY created_at
CREATE INDEX idx_orders_status_user_created
    ON orders(status, user_id, created_at DESC);

Column order matters. Most selective first, then by query pattern.

Partial — index only relevant rows

Huge win for sparse columns:

sql

-- 95% of orders are 'paid' — only index unusual ones
CREATE INDEX idx_orders_pending
    ON orders(id)
    WHERE status != 'paid';

Index is smaller → faster writes, smaller reads, better cache use.

GIN — full-text search, JSONB, arrays

sql

-- Search JSON metadata
CREATE INDEX idx_products_meta
    ON products USING gin(metadata);

SELECT * FROM products WHERE metadata @> '{"category": "electronics"}';

-- Full-text search
ALTER TABLE articles ADD COLUMN tsv tsvector
    GENERATED ALWAYS AS (to_tsvector('english', title || ' ' || body)) STORED;
CREATE INDEX idx_articles_tsv ON articles USING gin(tsv);

SELECT * FROM articles WHERE tsv @@ plainto_tsquery('english', 'hosting tips');

BRIN — huge append-only tables

sql

-- Log table with 100M rows by time
CREATE INDEX idx_logs_created_brin
    ON logs USING brin(created_at);

BRIN index is tiny (KB) vs GB for b-tree. Best when data is physically clustered by the indexed column (append-only).

Covering — index-only scans

sql

-- Query: SELECT id, email FROM users WHERE status = 'active'
CREATE INDEX idx_users_active_covering
    ON users(status) INCLUDE(id, email)
    WHERE status = 'active';

PostgreSQL returns results without touching the table — huge speedup.

Level 3 — EXPLAIN ANALYZE

Before tuning, measure. Always.

sql

EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT u.name, COUNT(o.id)
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > now() - interval '30 days'
GROUP BY u.name;

Look for:

Seq Scan on big tables = missing index
Nested Loop on big tables = wrong join strategy; force with enable_hashjoin or add stats
Actual rows >> estimated rows = outdated statistics (run ANALYZE)
Buffers: shared read=10000 = data not cached; large shared_buffers helps

Visualise plans at explain.depesz.com or pgmustard.com.

Level 4 — Vacuum strategy

PostgreSQL's MVCC keeps "dead tuples" after UPDATE/DELETE. Autovacuum removes them. Without it: bloat grows, queries slow down.

Check bloat:

sql

SELECT schemaname, relname,
       pg_size_pretty(pg_relation_size(relid)) AS size,
       n_dead_tup,
       last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;

If n_dead_tup > 20% of rowcount, autovacuum is behind. Tune:

ini

# postgresql.conf
autovacuum_max_workers = 4
autovacuum_vacuum_scale_factor = 0.05    # vacuum at 5% dead (default 20%)
autovacuum_analyze_scale_factor = 0.02
autovacuum_vacuum_cost_limit = 2000      # aggressive

For high-write tables, set per-table:

sql

ALTER TABLE orders SET (
    autovacuum_vacuum_scale_factor = 0.02,
    autovacuum_analyze_scale_factor = 0.01
);

Manual vacuum for recovery:

sql

VACUUM (VERBOSE, ANALYZE) orders;

-- Nuclear option (locks table briefly, reclaims disk):
VACUUM FULL orders;

Don't VACUUM FULL on live production large tables — it takes an exclusive lock for potentially hours.

Level 5 — Partitioning huge tables

When a single table > 50M rows, partitioning by time often helps.

sql

-- Range partition by month
CREATE TABLE events (
    id bigserial,
    created_at timestamptz NOT NULL,
    user_id uuid,
    data jsonb
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2026_04 PARTITION OF events
    FOR VALUES FROM ('2026-04-01') TO ('2026-05-01');

CREATE TABLE events_2026_05 PARTITION OF events
    FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');

-- Indexes on each partition
CREATE INDEX ON events_2026_04 (user_id);
CREATE INDEX ON events_2026_05 (user_id);

Use pg_partman extension to auto-create future partitions + drop old ones.

Benefits:

Query for "last 30 days" hits 1 partition, not all
DROP TABLE events_2023_01 instantly reclaims old data
Individual partition vacuum/index maintenance

Level 6 — Connection pooling

Every new connection costs 1-2 MB RAM + startup time. Use PgBouncer for pooling:

bash

sudo dnf install pgbouncer
# /etc/pgbouncer/pgbouncer.ini

ini

[databases]
mydb = host=localhost port=5432 dbname=mydb

[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20

App connects to localhost:6432 instead of :5432. 1000 app connections → 20 real PG connections.

Level 7 — Slow query log + pgBadger

Log slow queries:

ini

# postgresql.conf
log_min_duration_statement = 500  # ms — log queries slower than this
log_line_prefix = '%t [%p]: db=%d,user=%u,app=%a,client=%h '
log_statement = 'ddl'
log_duration = off

Analyse logs with pgBadger (HTML reports):

bash

sudo dnf install pgbadger
pgbadger /var/log/postgresql/postgresql-*.log -o /var/www/html/pgbadger.html

Serve via nginx with basic auth. See: slowest queries, most-called, peak hours, errors.

Level 8 — Replication + read replicas

For read-heavy apps, add a streaming replica:

ini

# On primary
wal_level = replica
max_wal_senders = 10
wal_keep_size = 1GB

# On replica, base backup + start:
pg_basebackup -h primary-ip -D /var/lib/pgsql/17/data -U replicator -P -R
sudo systemctl start postgresql

Route read queries to replica in your app:

python

write_engine = create_engine(PRIMARY_URL)
read_engine = create_engine(REPLICA_URL)
# Use read_engine for SELECTs, write_engine for writes

Common pitfalls

FAQ

Q How do I know if I need tuning?

If you have no slow queries and CPU is <30% during peak — don't tune. If queries occasionally exceed 1 second or CPU >70% sustained — start tuning.

Q Postgres or MySQL?

For modern apps: Postgres. Better JSON, strong consistency guarantees, GIN indexes, stored procedures, materialised views. MySQL is simpler but less capable. Use MySQL only if you're locked in (WordPress, older apps).

Q pg_repack or VACUUM FULL for bloat?

pg_repack rebuilds tables without exclusive lock — production-safe. VACUUM FULL locks — emergency only.

Q How do I benchmark?

pgbench (ships with Postgres) for generic OLTP load. For real workload, replay slow query log against staging.

Q Do I need a DBA?

For <1 TB data + <1K req/sec, an engineer + these tuning patterns is enough. At scale (multi-TB, 10K+ req/sec), a DBA pays for themselves.

Run production Postgres on a DomainIndia VPS. Start with VPS