Client Area

PostgreSQL Performance Tuning: Indexing, Vacuum, Partitioning, and pgBadger

ByDomain India Team·DomainIndia Engineering
6 min read24 Apr 20264 views
# PostgreSQL Performance Tuning: Indexing, Vacuum, Partitioning, and pgBadger
TL;DR
PostgreSQL handles 10× its apparent throughput if tuned right. This guide covers the four levers that matter: correct indexing (b-tree, partial, GIN, BRIN), vacuum + autovacuum strategy, partitioning huge tables, and analysing slow queries with pgBadger — all on DomainIndia VPS.
## When to tune Default PostgreSQL config is conservative — safe for any hardware but underuses modern VPS resources. Tune when: - Queries suddenly slow after table grew past 1M rows - `pg_stat_activity` shows many `waiting` sessions - CPU spikes during business hours - Monthly bandwidth climbs (often: bloat + lack of VACUUM) - Long-running maintenance tasks blocking writes If your site is small (<100K rows) and fast, don't tune. Default PG works. ## Level 1 — Right-size memory settings `/var/lib/pgsql/data/postgresql.conf` (or `/etc/postgresql/16/main/postgresql.conf` on Ubuntu):
SettingDefault4 GB VPS8 GB VPS16 GB VPS
shared_buffers128MB1GB2GB4GB
effective_cache_size4GB3GB6GB12GB
work_mem4MB8MB16MB32MB
maintenance_work_mem64MB256MB512MB1GB
max_connections100100200300
random_page_cost4.01.1 (SSD)1.11.1
effective_io_concurrency1200 (SSD)200200
Apply and restart: ```bash sudo systemctl restart postgresql ``` `pgtune` online tool gives a starting config for your RAM/workload. ## Level 2 — Indexing strategies ### b-tree (default) — equality + range ```sql -- Fast for: -- WHERE email = ... -- WHERE created_at > '2026-01-01' -- ORDER BY price CREATE INDEX idx_users_email ON users(email); CREATE INDEX idx_orders_created ON orders(created_at); ``` ### Composite — multi-column ```sql -- Fast for WHERE status='paid' AND user_id=42 ORDER BY created_at CREATE INDEX idx_orders_status_user_created ON orders(status, user_id, created_at DESC); ``` Column order matters. Most selective first, then by query pattern. ### Partial — index only relevant rows Huge win for sparse columns: ```sql -- 95% of orders are 'paid' — only index unusual ones CREATE INDEX idx_orders_pending ON orders(id) WHERE status != 'paid'; ``` Index is smaller → faster writes, smaller reads, better cache use. ### GIN — full-text search, JSONB, arrays ```sql -- Search JSON metadata CREATE INDEX idx_products_meta ON products USING gin(metadata); SELECT * FROM products WHERE metadata @> '{"category": "electronics"}'; -- Full-text search ALTER TABLE articles ADD COLUMN tsv tsvector GENERATED ALWAYS AS (to_tsvector('english', title || ' ' || body)) STORED; CREATE INDEX idx_articles_tsv ON articles USING gin(tsv); SELECT * FROM articles WHERE tsv @@ plainto_tsquery('english', 'hosting tips'); ``` ### BRIN — huge append-only tables ```sql -- Log table with 100M rows by time CREATE INDEX idx_logs_created_brin ON logs USING brin(created_at); ``` BRIN index is tiny (KB) vs GB for b-tree. Best when data is physically clustered by the indexed column (append-only). ### Covering — index-only scans ```sql -- Query: SELECT id, email FROM users WHERE status = 'active' CREATE INDEX idx_users_active_covering ON users(status) INCLUDE(id, email) WHERE status = 'active'; ``` PostgreSQL returns results without touching the table — huge speedup. ## Level 3 — EXPLAIN ANALYZE Before tuning, measure. Always. ```sql EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) SELECT u.name, COUNT(o.id) FROM users u LEFT JOIN orders o ON o.user_id = u.id WHERE u.created_at > now() - interval '30 days' GROUP BY u.name; ``` Look for: - `Seq Scan` on big tables = missing index - `Nested Loop` on big tables = wrong join strategy; force with `enable_hashjoin` or add stats - Actual rows >> estimated rows = outdated statistics (run `ANALYZE`) - `Buffers: shared read=10000` = data not cached; large `shared_buffers` helps Visualise plans at [explain.depesz.com](https://explain.depesz.com) or [pgmustard.com](https://pgmustard.com). ## Level 4 — Vacuum strategy PostgreSQL's MVCC keeps "dead tuples" after UPDATE/DELETE. Autovacuum removes them. Without it: bloat grows, queries slow down. Check bloat: ```sql SELECT schemaname, relname, pg_size_pretty(pg_relation_size(relid)) AS size, n_dead_tup, last_autovacuum FROM pg_stat_user_tables ORDER BY n_dead_tup DESC LIMIT 10; ``` If `n_dead_tup > 20% of rowcount`, autovacuum is behind. Tune: ```ini # postgresql.conf autovacuum_max_workers = 4 autovacuum_vacuum_scale_factor = 0.05 # vacuum at 5% dead (default 20%) autovacuum_analyze_scale_factor = 0.02 autovacuum_vacuum_cost_limit = 2000 # aggressive ``` For high-write tables, set per-table: ```sql ALTER TABLE orders SET ( autovacuum_vacuum_scale_factor = 0.02, autovacuum_analyze_scale_factor = 0.01 ); ``` **Manual vacuum for recovery:** ```sql VACUUM (VERBOSE, ANALYZE) orders; -- Nuclear option (locks table briefly, reclaims disk): VACUUM FULL orders; ``` Don't `VACUUM FULL` on live production large tables — it takes an exclusive lock for potentially hours. ## Level 5 — Partitioning huge tables When a single table > 50M rows, partitioning by time often helps. ```sql -- Range partition by month CREATE TABLE events ( id bigserial, created_at timestamptz NOT NULL, user_id uuid, data jsonb ) PARTITION BY RANGE (created_at); CREATE TABLE events_2026_04 PARTITION OF events FOR VALUES FROM ('2026-04-01') TO ('2026-05-01'); CREATE TABLE events_2026_05 PARTITION OF events FOR VALUES FROM ('2026-05-01') TO ('2026-06-01'); -- Indexes on each partition CREATE INDEX ON events_2026_04 (user_id); CREATE INDEX ON events_2026_05 (user_id); ``` Use `pg_partman` extension to auto-create future partitions + drop old ones. Benefits: - Query for "last 30 days" hits 1 partition, not all - `DROP TABLE events_2023_01` instantly reclaims old data - Individual partition vacuum/index maintenance ## Level 6 — Connection pooling Every new connection costs 1-2 MB RAM + startup time. Use **PgBouncer** for pooling: ```bash sudo dnf install pgbouncer # /etc/pgbouncer/pgbouncer.ini ``` ```ini [databases] mydb = host=localhost port=5432 dbname=mydb [pgbouncer] listen_addr = 127.0.0.1 listen_port = 6432 pool_mode = transaction max_client_conn = 1000 default_pool_size = 20 ``` App connects to `localhost:6432` instead of `:5432`. 1000 app connections → 20 real PG connections. ## Level 7 — Slow query log + pgBadger Log slow queries: ```ini # postgresql.conf log_min_duration_statement = 500 # ms — log queries slower than this log_line_prefix = '%t [%p]: db=%d,user=%u,app=%a,client=%h ' log_statement = 'ddl' log_duration = off ``` Analyse logs with pgBadger (HTML reports): ```bash sudo dnf install pgbadger pgbadger /var/log/postgresql/postgresql-*.log -o /var/www/html/pgbadger.html ``` Serve via nginx with basic auth. See: slowest queries, most-called, peak hours, errors. ## Level 8 — Replication + read replicas For read-heavy apps, add a streaming replica: ```ini # On primary wal_level = replica max_wal_senders = 10 wal_keep_size = 1GB # On replica, base backup + start: pg_basebackup -h primary-ip -D /var/lib/pgsql/17/data -U replicator -P -R sudo systemctl start postgresql ``` Route read queries to replica in your app: ```python write_engine = create_engine(PRIMARY_URL) read_engine = create_engine(REPLICA_URL) # Use read_engine for SELECTs, write_engine for writes ``` ## Common pitfalls ## FAQ
Q How do I know if I need tuning?

If you have no slow queries and CPU is <30% during peak — don't tune. If queries occasionally exceed 1 second or CPU >70% sustained — start tuning.

Q Postgres or MySQL?

For modern apps: Postgres. Better JSON, strong consistency guarantees, GIN indexes, stored procedures, materialised views. MySQL is simpler but less capable. Use MySQL only if you're locked in (WordPress, older apps).

Q pg_repack or VACUUM FULL for bloat?

pg_repack rebuilds tables without exclusive lock — production-safe. VACUUM FULL locks — emergency only.

Q How do I benchmark?

pgbench (ships with Postgres) for generic OLTP load. For real workload, replay slow query log against staging.

Q Do I need a DBA?

For <1 TB data + <1K req/sec, an engineer + these tuning patterns is enough. At scale (multi-TB, 10K+ req/sec), a DBA pays for themselves.

Run production Postgres on a DomainIndia VPS. Start with VPS

Was this article helpful?

Your feedback helps us improve our documentation

Still need help? Submit a support ticket