The most common scenario when teams report "autovacuum isn't running" is actually this: autovacuum is running — it's just running far too slowly to keep up with the write rate of a production workload. The symptoms are identical: dead tuple counts climb, table bloat grows, query plans degrade. Before concluding that autovacuum is broken, the first step is to confirm whether it's absent or just throttled.
Step 1 — Check if autovacuum is actually running
On RDS, autovacuum is always enabled and cannot be disabled via parameter group. But it can appear silent. Check for active autovacuum workers:
SELECT pid, wait_event_type, wait_event, state, query FROM pg_stat_activity WHERE backend_type = 'autovacuum worker';
If this returns zero rows, autovacuum has no active workers at this moment — not necessarily a problem. Autovacuum is event-driven: it fires when a table crosses its threshold, does its work, and exits. The absence of a worker right now doesn't mean autovacuum is broken.
To see when autovacuum last ran on your tables:
SELECT schemaname || '.' || relname AS table_name, n_dead_tup, last_autovacuum, last_autoanalyze, autovacuum_count, pg_size_pretty(pg_total_relation_size(relid)) AS total_size FROM pg_stat_user_tables WHERE n_dead_tup > 1000 ORDER BY n_dead_tup DESC LIMIT 20;
If last_autovacuum is NULL or hours old on a high-write table, autovacuum
is not keeping pace. The question becomes: why?
The four most common causes on RDS
1. autovacuum_vacuum_cost_delay is throttling vacuum to a crawl.
The default is 2ms per cost cycle (200 cost units). On an io1 or gp3
RDS instance capable of 3000+ IOPS, this is like parking a lorry on the motorway. Autovacuum
runs — just nowhere near fast enough. Check with
SHOW autovacuum_vacuum_cost_delay. The fix is to reduce it in the parameter group
(0ms = no throttle, but only safe on dedicated instances with headroom).
2. The table hasn't crossed its vacuum threshold yet.
Autovacuum fires when dead tuples exceed
autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor × n_live_tup.
With the default scale_factor of 0.2, a table with 50 million live rows won't
trigger autovacuum until 10 million dead tuples accumulate. That can take a long time — and look
identical to "not running" during the accumulation phase.
3. Long-running transactions are blocking vacuum.
Autovacuum cannot remove dead tuples that might still be visible to an open transaction.
A single idle-in-transaction connection that's been open for hours prevents vacuum from
cleaning up anything created after that transaction began. Check:
SELECT pid, usename, state, wait_event, now() - xact_start AS txn_age, left(query, 60) AS query FROM pg_stat_activity WHERE state IN ('idle in transaction', 'active') AND xact_start < now() - INTERVAL '5 minutes' ORDER BY xact_start;
KILL. To terminate a blocking
connection use SELECT pg_terminate_backend(pid). For read replicas, the blocking
transaction is on the primary — the replica has no ability to terminate it.
4. autovacuum_max_workers is exhausted by other tables.
The default of 3 workers means only 3 tables can be vacuumed simultaneously. In a schema
with hundreds of high-write tables, the worker pool queues up. Any given table may wait
a long time before a worker becomes available — even if the overall autovacuum process is
healthy. Check with SHOW autovacuum_max_workers and watch the worker count
in pg_stat_activity over time.
Emergency relief — manual VACUUM
If dead tuples are critically high and autovacuum cannot keep pace, run a manual VACUUM to clear the backlog immediately. This runs in addition to autovacuum, not instead of it:
-- Target a specific table: VACUUM (ANALYZE, VERBOSE) schema_name.table_name; -- If the table is critically bloated, VACUUM FULL reclaims disk space -- but takes an ACCESS EXCLUSIVE lock — causes downtime: VACUUM FULL schema_name.table_name; -- only in a maintenance window
VACUUM (without FULL) marks
dead tuples as reusable but does not shrink the physical file. VACUUM FULL
rewrites the table to disk — it reclaims actual storage but takes a full table lock for
the duration. On large tables this can be hours. Use it only in a planned maintenance window.
Permanent fix — parameter group tuning
The right parameter changes depend on your instance class, storage type, and write rate per table. The general direction on production RDS instances:
- Reduce
autovacuum_vacuum_cost_delayfrom 2ms toward 0–1ms on io1/gp3 storage - Reduce
autovacuum_vacuum_scale_factorfrom 0.2 toward 0.01–0.05 on large tables - Increase
autovacuum_max_workersfrom 3 toward 5–8 on multi-table schemas - Set per-table storage parameters on your highest-write tables via
ALTER TABLE ... SET (autovacuum_vacuum_scale_factor = 0.01)
Getting these values wrong in the other direction (too aggressive) will spike your CloudWatch IOPS and potentially increase your RDS storage bill. Calibration requires profiling your specific workload — not copying generic values from a blog post.
Autovacuum tuning is one of the highest-leverage fixes in PostgreSQL.
PGFlare's Remediation+ session includes full
autovacuum calibration: we profile every table's dead tuple rate, write throughput, and vacuum
history, then deliver exact parameter values for your parameter group and per-table ALTER TABLE
overrides — applied safely in a maintenance window.