Knowledge Base

PostgreSQL Performance Guides

Practical guides from the engineers at PGFlare. Everything you need to diagnose, fix, and prevent the most expensive PostgreSQL performance problems on AWS RDS.

Performance 8 min read

Autovacuum Tuning for Large RDS Instances

The default autovacuum settings were designed for small tables. Here's how to configure cost_delay, vacuum_scale_factor, and analyze_scale_factor for production RDS instances with hundreds of millions of rows.

Cost 6 min read

Why Your RDS Bill Doubled After a Schema Migration

Table rewrites from ALTER TABLE, bloated pg_toast tables, and unnecessary IOPS. How schema migrations silently inflate your AWS bill and how to avoid it.

Performance 10 min read

EXPLAIN ANALYZE: Reading the Query Plan Without a PhD

Rows=10000 vs actual rows=1.5M. Seq Scan on a 200M-row table. How to read the actual vs estimated rows mismatch and what to do about stale statistics.

Cost 5 min read

Right-Sizing Your RDS Instance Without Downtime

How to validate that your workload supports a smaller instance class, prepare the migration, and flip the instance type during a maintenance window. Real numbers from 4 migrations.

Operations 7 min read

Connection Pool Configuration for pgBouncer on RDS

Why max_connections=you're-doing-it-wrong. Pool mode selection (transaction vs session), pool_size calculation, and the hidden cost of idle connections eating shared_buffers.

Emergency 4 min read

P1 Runbook: High RDS CPU During Peak Traffic

Step-by-step incident response when RDS CPU hits 80% and queries are timing out. What to check first, how to identify the offending query, and how to get back to green in under 30 minutes.

Performance 9 min read

Index Bloat: When Your Indexes Are Making Things Worse

Dead tuples in indexes. Bloat that defeats the purpose of the index. How to identify bloated indexes, reindex concurrently, and set maintenance_work_mem to prevent recurrence.

Compliance 5 min read

PostgreSQL Audit Logging for PCI DSS and ISO 27001

Enabling pgaudit on RDS, what to log vs. what not to log, log retention to S3, and tooling to alert on suspicious query patterns. Tested against PCI DSS v4 controls.

Operations 6 min read

CloudWatch Alarms That Actually Matter for PostgreSQL

Stop alarming on CPU alone. The five CloudWatch metrics that predict PostgreSQL performance problems 24 hours before they become incidents — with specific threshold recommendations per instance family.


Performance 8 min read · Autovacuum · AWS RDS

Autovacuum Tuning for Large RDS Instances

PostgreSQL's autovacuum daemon exists to reclaim storage from dead tuples — rows that have been updated or deleted and are no longer visible to any active transaction. When autovacuum falls behind, tables bloat, IOPS spike, and queries slow down as the planner navigates through ever-larger data files. On large RDS production instances with millions of rows per table, the default autovacuum settings will almost certainly be inadequate.

This guide explains the key parameters, how to identify when autovacuum is failing to keep up, and what to change in your RDS parameter group.

Why the defaults are wrong for production

The default autovacuum is throttled aggressively to avoid impacting production workloads on modest hardware. The critical setting is autovacuum_vacuum_cost_delay, which defaults to 2ms in PostgreSQL 13+ (it was 20ms before that). Every time vacuum reads or writes a "cost unit" of data, it sleeps for this duration. The problem: on a modern NVMe-backed RDS instance (gp3, io1), your disks can handle thousands of IOPS, but autovacuum is sleeping 2ms between every 200 cost units.

Rule of thumb: If your instance processes more than ~50,000 row updates/deletes per minute, the default autovacuum will be unable to keep pace. On db.r5.xlarge+ instances, this threshold is routinely exceeded in normal e-commerce or SaaS workloads.

Diagnosing autovacuum lag

Run this query to find tables where autovacuum is falling behind:

SELECT
  schemaname || '.' || relname                          AS table_name,
  n_dead_tup,
  n_live_tup,
  round(n_dead_tup::numeric / nullif(n_live_tup + n_dead_tup, 0) * 100, 1) AS dead_pct,
  last_autovacuum,
  last_autoanalyze,
  pg_size_pretty(pg_total_relation_size(relid)) AS total_size
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC
LIMIT 20;

A dead_pct above 10% is concerning. Above 25% means autovacuum is significantly lagging and you'll be seeing table bloat affect query plans. Above 50% is a critical situation requiring immediate manual VACUUM ANALYZE.

Key parameters — and why the defaults fail

There are six parameters that control how aggressively autovacuum runs. Each has a PostgreSQL default tuned for modest, general-purpose hardware — not the NVMe-backed, high-write RDS instances that run production SaaS or e-commerce workloads. Understanding what each parameter controls tells you where the problem is; calibrating the right values requires profiling your specific workload.

Parameter Default What goes wrong
autovacuum_vacuum_cost_delay 2ms Vacuum sleeps 2ms after every cost cycle. On gp3/io1 storage that can handle thousands of IOPS, autovacuum barely scratches the surface of accumulated dead tuples.
autovacuum_vacuum_scale_factor 0.2 Vacuum only triggers after 20% of a table's rows are dead. On a 100M-row table that means 20 million dead rows accumulate before vacuum even starts — by which point bloat is already affecting query plans.
autovacuum_analyze_scale_factor 0.1 Statistics become stale relative to actual data distribution, causing the query planner to choose wrong execution plans — often manifesting as unexplained query regressions.
autovacuum_vacuum_threshold 50 The absolute floor of dead tuples before vacuum fires. Interacts with scale_factor — both conditions must be exceeded, so miscalibrating either one locks out vacuum entirely on smaller tables.
autovacuum_max_workers 3 Only three tables can be vacuumed in parallel. High-write workloads with dozens of active tables will see a backlog build faster than workers can process it, regardless of how the other parameters are set.
autovacuum_vacuum_cost_limit 200 Low cost budget per cycle means vacuum hits its limit quickly and sleeps. On dedicated RDS instances the CPU and I/O headroom exists for far more aggressive work — the default leaves most of it unused.
Why you can't just copy published values: The right settings for cost_delay, scale_factor, and max_workers depend on your instance class, storage type, write throughput per table, row width, and current bloat level. Values that eliminate bloat on a db.r5.2xlarge with gp3 storage will over-vacuum a db.t3.large and spike IOPS costs. Calibration requires profiling your pg_stat_user_tables, not applying a generic recipe.

Per-table calibration matters more than instance-wide defaults

PostgreSQL supports per-table autovacuum parameters set directly via ALTER TABLE ... SET (…) storage options. This is where real-world tuning lives — instance-wide parameter group changes set a baseline, but your highest-write tables (order items, events, audit logs) almost always need their own tighter thresholds, while low-write reference tables need looser ones to avoid wasting IOPS.

Getting per-table overrides right requires knowing each table's write volume, average row width, vacuum history from pg_stat_user_tables, and how it competes with other tables for the autovacuum worker pool. Set them too aggressively across the board and you trade bloat-driven IOPS for vacuum-driven IOPS — often with no net saving on io1/io2 provisioned storage.

How to validate the diagnostic output

The diagnostic query above tells you that you have a problem. To understand how severe it is, re-run it after noting the current dead_pct and last_autovacuum timestamp for your top tables. If the dead count is growing faster than autovacuum is clearing it — or if last_autovacuum is hours old on a high-write table — your autovacuum configuration is falling behind your workload.

In CloudWatch, a rising Read IOPS trend on a stable-traffic instance is the signature of accumulating bloat: PostgreSQL is scanning increasingly large data files to find live rows. Once you've confirmed the pattern, the fix requires calibrated parameter changes — not a generic recipe.

Your autovacuum settings need to match your specific workload — not a blog post.
During a Remediation+ session, PGFlare profiles every table's vacuum history, write rate, and bloat level, then delivers exact parameter values for your RDS instance and per-table overrides for your highest-write tables — with the changes applied safely in a maintenance window.

Book a Diagnostic — £497 →