PostgreSQL's autovacuum daemon exists to reclaim storage from dead tuples — rows that have been updated or deleted and are no longer visible to any active transaction. When autovacuum falls behind, tables bloat, IOPS spike, and queries slow down as the planner navigates through ever-larger data files. On large RDS production instances with millions of rows per table, the default autovacuum settings will almost certainly be inadequate.
This guide explains the key parameters, how to identify when autovacuum is failing to keep up, and what to change in your RDS parameter group.
Why the defaults are wrong for production
The default autovacuum is throttled aggressively to avoid impacting production workloads on modest
hardware.
The critical setting is autovacuum_vacuum_cost_delay, which defaults to 2ms in
PostgreSQL 13+ (it was 20ms before that). Every time vacuum reads or writes a "cost unit" of data, it
sleeps
for this duration. The problem: on a modern NVMe-backed RDS instance (gp3, io1), your disks can handle
thousands of IOPS, but autovacuum is sleeping 2ms between every 200 cost units.
Diagnosing autovacuum lag
Run this query to find tables where autovacuum is falling behind:
SELECT schemaname || '.' || relname AS table_name, n_dead_tup, n_live_tup, round(n_dead_tup::numeric / nullif(n_live_tup + n_dead_tup, 0) * 100, 1) AS dead_pct, last_autovacuum, last_autoanalyze, pg_size_pretty(pg_total_relation_size(relid)) AS total_size FROM pg_stat_user_tables WHERE n_dead_tup > 10000 ORDER BY n_dead_tup DESC LIMIT 20;
A dead_pct above 10% is concerning. Above 25% means autovacuum is significantly lagging and
you'll be seeing table bloat affect query plans. Above 50% is a critical situation requiring
immediate manual VACUUM ANALYZE.
Key parameters — and why the defaults fail
There are six parameters that control how aggressively autovacuum runs. Each has a PostgreSQL default tuned for modest, general-purpose hardware — not the NVMe-backed, high-write RDS instances that run production SaaS or e-commerce workloads. Understanding what each parameter controls tells you where the problem is; calibrating the right values requires profiling your specific workload.
| Parameter | Default | What goes wrong |
|---|---|---|
autovacuum_vacuum_cost_delay |
2ms | Vacuum sleeps 2ms after every cost cycle. On gp3/io1 storage that can handle thousands of IOPS, autovacuum barely scratches the surface of accumulated dead tuples. |
autovacuum_vacuum_scale_factor |
0.2 | Vacuum only triggers after 20% of a table's rows are dead. On a 100M-row table that means 20 million dead rows accumulate before vacuum even starts — by which point bloat is already affecting query plans. |
autovacuum_analyze_scale_factor |
0.1 | Statistics become stale relative to actual data distribution, causing the query planner to choose wrong execution plans — often manifesting as unexplained query regressions. |
autovacuum_vacuum_threshold |
50 | The absolute floor of dead tuples before vacuum fires. Interacts with scale_factor — both conditions must be exceeded, so miscalibrating either one locks out vacuum entirely on smaller tables. |
autovacuum_max_workers |
3 | Only three tables can be vacuumed in parallel. High-write workloads with dozens of active tables will see a backlog build faster than workers can process it, regardless of how the other parameters are set. |
autovacuum_vacuum_cost_limit |
200 | Low cost budget per cycle means vacuum hits its limit quickly and sleeps. On dedicated RDS instances the CPU and I/O headroom exists for far more aggressive work — the default leaves most of it unused. |
cost_delay, scale_factor, and max_workers depend on your
instance class, storage type, write throughput per table, row width, and current bloat level. Values
that eliminate bloat on a db.r5.2xlarge with gp3 storage will over-vacuum a db.t3.large and spike IOPS
costs. Calibration requires profiling your pg_stat_user_tables, not applying a generic recipe.
Per-table calibration matters more than instance-wide defaults
PostgreSQL supports per-table autovacuum parameters set directly via ALTER TABLE ... SET (…)
storage options. This is where real-world tuning lives — instance-wide parameter group changes set a
baseline, but your highest-write tables (order items, events, audit logs) almost always need their own
tighter thresholds, while low-write reference tables need looser ones to avoid wasting IOPS.
Getting per-table overrides right requires knowing each table's write volume, average row width,
vacuum history from pg_stat_user_tables, and how it competes with other tables for
the autovacuum worker pool. Set them too aggressively across the board and you trade bloat-driven
IOPS for vacuum-driven IOPS — often with no net saving on io1/io2 provisioned storage.
How to validate the diagnostic output
The diagnostic query above tells you that you have a problem. To understand how severe it is,
re-run it after noting the current dead_pct and last_autovacuum timestamp
for your top tables. If the dead count is growing faster than autovacuum is clearing it — or if
last_autovacuum is hours old on a high-write table — your autovacuum configuration is
falling behind your workload.
In CloudWatch, a rising Read IOPS trend on a stable-traffic instance is the signature of accumulating bloat: PostgreSQL is scanning increasingly large data files to find live rows. Once you've confirmed the pattern, the fix requires calibrated parameter changes — not a generic recipe.
Your autovacuum settings need to match your specific workload — not a blog
post.
During a Remediation+ session, PGFlare profiles
every table's vacuum history, write rate, and bloat level, then delivers exact parameter values
for your RDS instance and per-table overrides for your highest-write tables — with the changes
applied safely in a maintenance window.