PostgreSQL Knowledge Base

PostgreSQL's autovacuum daemon exists to reclaim storage from dead tuples — rows that have been updated or deleted and are no longer visible to any active transaction. When autovacuum falls behind, tables bloat, IOPS spike, and queries slow down as the planner navigates through ever-larger data files. On large RDS production instances with millions of rows per table, the default autovacuum settings will almost certainly be inadequate.

This guide explains the key parameters, how to identify when autovacuum is failing to keep up, and what to change in your RDS parameter group.

Why the defaults are wrong for production

The default autovacuum is throttled aggressively to avoid impacting production workloads on modest hardware. The critical setting is autovacuum_vacuum_cost_delay, which defaults to 2ms in PostgreSQL 13+ (it was 20ms before that). Every time vacuum reads or writes a "cost unit" of data, it sleeps for this duration. The problem: on a modern NVMe-backed RDS instance (gp3, io1), your disks can handle thousands of IOPS, but autovacuum is sleeping 2ms between every 200 cost units.

Rule of thumb: If your instance processes more than ~50,000 row updates/deletes per minute, the default autovacuum will be unable to keep pace. On db.r5.xlarge+ instances, this threshold is routinely exceeded in normal e-commerce or SaaS workloads.

Diagnosing autovacuum lag

Run this query to find tables where autovacuum is falling behind:

SELECT
  schemaname || '.' || relname                          AS table_name,
  n_dead_tup,
  n_live_tup,
  round(n_dead_tup::numeric / nullif(n_live_tup + n_dead_tup, 0) * 100, 1) AS dead_pct,
  last_autovacuum,
  last_autoanalyze,
  pg_size_pretty(pg_total_relation_size(relid)) AS total_size
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC
LIMIT 20;

A dead_pct above 10% is concerning. Above 25% means autovacuum is significantly lagging and you'll be seeing table bloat affect query plans. Above 50% is a critical situation requiring immediate manual VACUUM ANALYZE.

Key parameters — and why the defaults fail

There are six parameters that control how aggressively autovacuum runs. Each has a PostgreSQL default tuned for modest, general-purpose hardware — not the NVMe-backed, high-write RDS instances that run production SaaS or e-commerce workloads. Understanding what each parameter controls tells you where the problem is; calibrating the right values requires profiling your specific workload.

Parameter	Default	What goes wrong
`autovacuum_vacuum_cost_delay`	2ms	Vacuum sleeps 2ms after every cost cycle. On gp3/io1 storage that can handle thousands of IOPS, autovacuum barely scratches the surface of accumulated dead tuples.
`autovacuum_vacuum_scale_factor`	0.2	Vacuum only triggers after 20% of a table's rows are dead. On a 100M-row table that means 20 million dead rows accumulate before vacuum even starts — by which point bloat is already affecting query plans.
`autovacuum_analyze_scale_factor`	0.1	Statistics become stale relative to actual data distribution, causing the query planner to choose wrong execution plans — often manifesting as unexplained query regressions.
`autovacuum_vacuum_threshold`	50	The absolute floor of dead tuples before vacuum fires. Interacts with scale_factor — both conditions must be exceeded, so miscalibrating either one locks out vacuum entirely on smaller tables.
`autovacuum_max_workers`	3	Only three tables can be vacuumed in parallel. High-write workloads with dozens of active tables will see a backlog build faster than workers can process it, regardless of how the other parameters are set.
`autovacuum_vacuum_cost_limit`	200	Low cost budget per cycle means vacuum hits its limit quickly and sleeps. On dedicated RDS instances the CPU and I/O headroom exists for far more aggressive work — the default leaves most of it unused.

Why you can't just copy published values: The right settings for cost_delay, scale_factor, and max_workers depend on your instance class, storage type, write throughput per table, row width, and current bloat level. Values that eliminate bloat on a db.r5.2xlarge with gp3 storage will over-vacuum a db.t3.large and spike IOPS costs. Calibration requires profiling your pg_stat_user_tables, not applying a generic recipe.

Per-table calibration matters more than instance-wide defaults

PostgreSQL supports per-table autovacuum parameters set directly via ALTER TABLE ... SET (…) storage options. This is where real-world tuning lives — instance-wide parameter group changes set a baseline, but your highest-write tables (order items, events, audit logs) almost always need their own tighter thresholds, while low-write reference tables need looser ones to avoid wasting IOPS.

Getting per-table overrides right requires knowing each table's write volume, average row width, vacuum history from pg_stat_user_tables, and how it competes with other tables for the autovacuum worker pool. Set them too aggressively across the board and you trade bloat-driven IOPS for vacuum-driven IOPS — often with no net saving on io1/io2 provisioned storage.

How to validate the diagnostic output

The diagnostic query above tells you that you have a problem. To understand how severe it is, re-run it after noting the current dead_pct and last_autovacuum timestamp for your top tables. If the dead count is growing faster than autovacuum is clearing it — or if last_autovacuum is hours old on a high-write table — your autovacuum configuration is falling behind your workload.

In CloudWatch, a rising Read IOPS trend on a stable-traffic instance is the signature of accumulating bloat: PostgreSQL is scanning increasingly large data files to find live rows. Once you've confirmed the pattern, the fix requires calibrated parameter changes — not a generic recipe.

Your autovacuum settings need to match your specific workload — not a blog post.
During a Remediation+ session, PGFlare profiles every table's vacuum history, write rate, and bloat level, then delivers exact parameter values for your RDS instance and per-table overrides for your highest-write tables — with the changes applied safely in a maintenance window.

Book a Diagnostic — £497 →

PostgreSQL Performance Guides

Autovacuum Tuning for Large RDS Instances

Why Your RDS Bill Doubled After a Schema Migration

EXPLAIN ANALYZE: Reading the Query Plan Without a PhD

Right-Sizing Your RDS Instance Without Downtime

Connection Pool Configuration for pgBouncer on RDS

P1 Runbook: High RDS CPU During Peak Traffic

Index Bloat: When Your Indexes Are Making Things Worse

PostgreSQL Audit Logging for PCI DSS and ISO 27001

CloudWatch Alarms That Actually Matter for PostgreSQL