Client Area

PostgreSQL Performance Tuning: Indexing, Vacuum, Partitioning, and pgBadger

ByDomain India Team·DomainIndia Engineering
6 min readPublished 21 Apr 2026Updated 23 Jun 2026339 views

In this article

  • 1When to tune
  • 2Level 1 — Right-size memory settings
  • 3Level 2 — Indexing strategies
  • 4b-tree (default) — equality + range
  • 5Composite — multi-column

PostgreSQL Performance Tuning: Indexing, Vacuum, Partitioning, and pgBadger

TL;DR
PostgreSQL handles 10× its apparent throughput if tuned right. This guide covers the four levers that matter: correct indexing (b-tree, partial, GIN, BRIN), vacuum + autovacuum strategy, partitioning huge tables, and analysing slow queries with pgBadger — all on DomainIndia VPS.

When to tune

Default PostgreSQL config is conservative — safe for any hardware but underuses modern VPS resources. Tune when:

  • Queries suddenly slow after table grew past 1M rows
  • pg_stat_activity shows many waiting sessions
  • CPU spikes during business hours
  • Monthly bandwidth climbs (often: bloat + lack of VACUUM)
  • Long-running maintenance tasks blocking writes

If your site is small (<100K rows) and fast, don't tune. Default PG works.

Level 1 — Right-size memory settings

/var/lib/pgsql/data/postgresql.conf (or /etc/postgresql/16/main/postgresql.conf on Ubuntu):

SettingDefault4 GB VPS8 GB VPS16 GB VPS
shared_buffers128MB1GB2GB4GB
effective_cache_size4GB3GB6GB12GB
work_mem4MB8MB16MB32MB
maintenance_work_mem64MB256MB512MB1GB
max_connections100100200300
random_page_cost4.01.1 (SSD)1.11.1
effective_io_concurrency1200 (SSD)200200

Apply and restart:

bash
sudo systemctl restart postgresql

pgtune online tool gives a starting config for your RAM/workload.

Level 2 — Indexing strategies

b-tree (default) — equality + range

sql
-- Fast for:
--   WHERE email = ...
--   WHERE created_at > '2026-01-01'
--   ORDER BY price
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_created ON orders(created_at);

Composite — multi-column

sql
-- Fast for WHERE status='paid' AND user_id=42 ORDER BY created_at
CREATE INDEX idx_orders_status_user_created
    ON orders(status, user_id, created_at DESC);

Column order matters. Most selective first, then by query pattern.

Partial — index only relevant rows

Huge win for sparse columns:

sql
-- 95% of orders are 'paid' — only index unusual ones
CREATE INDEX idx_orders_pending
    ON orders(id)
    WHERE status != 'paid';

Index is smaller → faster writes, smaller reads, better cache use.

GIN — full-text search, JSONB, arrays

sql
-- Search JSON metadata
CREATE INDEX idx_products_meta
    ON products USING gin(metadata);

SELECT * FROM products WHERE metadata @> '{"category": "electronics"}';

-- Full-text search
ALTER TABLE articles ADD COLUMN tsv tsvector
    GENERATED ALWAYS AS (to_tsvector('english', title || ' ' || body)) STORED;
CREATE INDEX idx_articles_tsv ON articles USING gin(tsv);

SELECT * FROM articles WHERE tsv @@ plainto_tsquery('english', 'hosting tips');

BRIN — huge append-only tables

sql
-- Log table with 100M rows by time
CREATE INDEX idx_logs_created_brin
    ON logs USING brin(created_at);

BRIN index is tiny (KB) vs GB for b-tree. Best when data is physically clustered by the indexed column (append-only).

Covering — index-only scans

sql
-- Query: SELECT id, email FROM users WHERE status = 'active'
CREATE INDEX idx_users_active_covering
    ON users(status) INCLUDE(id, email)
    WHERE status = 'active';

PostgreSQL returns results without touching the table — huge speedup.

Level 3 — EXPLAIN ANALYZE

Before tuning, measure. Always.

sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT u.name, COUNT(o.id)
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > now() - interval '30 days'
GROUP BY u.name;

Look for:

  • Seq Scan on big tables = missing index
  • Nested Loop on big tables = wrong join strategy; force with enable_hashjoin or add stats
  • Actual rows >> estimated rows = outdated statistics (run ANALYZE)
  • Buffers: shared read=10000 = data not cached; large shared_buffers helps

Visualise plans at explain.depesz.com or pgmustard.com.

Level 4 — Vacuum strategy

PostgreSQL's MVCC keeps "dead tuples" after UPDATE/DELETE. Autovacuum removes them. Without it: bloat grows, queries slow down.

Check bloat:

sql
SELECT schemaname, relname,
       pg_size_pretty(pg_relation_size(relid)) AS size,
       n_dead_tup,
       last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;

If n_dead_tup > 20% of rowcount, autovacuum is behind. Tune:

ini
# postgresql.conf
autovacuum_max_workers = 4
autovacuum_vacuum_scale_factor = 0.05    # vacuum at 5% dead (default 20%)
autovacuum_analyze_scale_factor = 0.02
autovacuum_vacuum_cost_limit = 2000      # aggressive

For high-write tables, set per-table:

sql
ALTER TABLE orders SET (
    autovacuum_vacuum_scale_factor = 0.02,
    autovacuum_analyze_scale_factor = 0.01
);

Manual vacuum for recovery:

sql
VACUUM (VERBOSE, ANALYZE) orders;

-- Nuclear option (locks table briefly, reclaims disk):
VACUUM FULL orders;

Don't VACUUM FULL on live production large tables — it takes an exclusive lock for potentially hours.

Level 5 — Partitioning huge tables

When a single table > 50M rows, partitioning by time often helps.

sql
-- Range partition by month
CREATE TABLE events (
    id bigserial,
    created_at timestamptz NOT NULL,
    user_id uuid,
    data jsonb
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2026_04 PARTITION OF events
    FOR VALUES FROM ('2026-04-01') TO ('2026-05-01');

CREATE TABLE events_2026_05 PARTITION OF events
    FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');

-- Indexes on each partition
CREATE INDEX ON events_2026_04 (user_id);
CREATE INDEX ON events_2026_05 (user_id);

Use pg_partman extension to auto-create future partitions + drop old ones.

Benefits:

  • Query for "last 30 days" hits 1 partition, not all
  • DROP TABLE events_2023_01 instantly reclaims old data
  • Individual partition vacuum/index maintenance

Level 6 — Connection pooling

Every new connection costs 1-2 MB RAM + startup time. Use PgBouncer for pooling:

bash
sudo dnf install pgbouncer
# /etc/pgbouncer/pgbouncer.ini
ini
[databases]
mydb = host=localhost port=5432 dbname=mydb

[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20

App connects to localhost:6432 instead of :5432. 1000 app connections → 20 real PG connections.

Level 7 — Slow query log + pgBadger

Log slow queries:

ini
# postgresql.conf
log_min_duration_statement = 500  # ms — log queries slower than this
log_line_prefix = '%t [%p]: db=%d,user=%u,app=%a,client=%h '
log_statement = 'ddl'
log_duration = off

Analyse logs with pgBadger (HTML reports):

bash
sudo dnf install pgbadger
pgbadger /var/log/postgresql/postgresql-*.log -o /var/www/html/pgbadger.html

Serve via nginx with basic auth. See: slowest queries, most-called, peak hours, errors.

Level 8 — Replication + read replicas

For read-heavy apps, add a streaming replica:

ini
# On primary
wal_level = replica
max_wal_senders = 10
wal_keep_size = 1GB

# On replica, base backup + start:
pg_basebackup -h primary-ip -D /var/lib/pgsql/17/data -U replicator -P -R
sudo systemctl start postgresql

Route read queries to replica in your app:

python
write_engine = create_engine(PRIMARY_URL)
read_engine = create_engine(REPLICA_URL)
# Use read_engine for SELECTs, write_engine for writes

Common pitfalls

FAQ

Q How do I know if I need tuning?

If you have no slow queries and CPU is <30% during peak — don't tune. If queries occasionally exceed 1 second or CPU >70% sustained — start tuning.

Q Postgres or MySQL?

For modern apps: Postgres. Better JSON, strong consistency guarantees, GIN indexes, stored procedures, materialised views. MySQL is simpler but less capable. Use MySQL only if you're locked in (WordPress, older apps).

Q pg_repack or VACUUM FULL for bloat?

pg_repack rebuilds tables without exclusive lock — production-safe. VACUUM FULL locks — emergency only.

Q How do I benchmark?

pgbench (ships with Postgres) for generic OLTP load. For real workload, replay slow query log against staging.

Q Do I need a DBA?

For <1 TB data + <1K req/sec, an engineer + these tuning patterns is enough. At scale (multi-TB, 10K+ req/sec), a DBA pays for themselves.

Run production Postgres on a DomainIndia VPS. Start with VPS

Was this article helpful?

Your feedback helps us improve our documentation

Still need help? Submit a support ticket