Autovacuum Not Running on AWS RDS — Diagnosis and Fix

The most common scenario when teams report "autovacuum isn't running" is actually this: autovacuum is running — it's just running far too slowly to keep up with the write rate of a production workload. The symptoms are identical: dead tuple counts climb, table bloat grows, query plans degrade. Before concluding that autovacuum is broken, the first step is to confirm whether it's absent or just throttled.

Step 1 — Check if autovacuum is actually running

On RDS, autovacuum is always enabled and cannot be disabled via parameter group. But it can appear silent. Check for active autovacuum workers:

SELECT pid, wait_event_type, wait_event, state, query
FROM   pg_stat_activity
WHERE  backend_type = 'autovacuum worker';

If this returns zero rows, autovacuum has no active workers at this moment — not necessarily a problem. Autovacuum is event-driven: it fires when a table crosses its threshold, does its work, and exits. The absence of a worker right now doesn't mean autovacuum is broken.

To see when autovacuum last ran on your tables:

SELECT
  schemaname || '.' || relname           AS table_name,
  n_dead_tup,
  last_autovacuum,
  last_autoanalyze,
  autovacuum_count,
  pg_size_pretty(pg_total_relation_size(relid)) AS total_size
FROM   pg_stat_user_tables
WHERE  n_dead_tup > 1000
ORDER BY n_dead_tup DESC
LIMIT  20;

If last_autovacuum is NULL or hours old on a high-write table, autovacuum is not keeping pace. The question becomes: why?

The four most common causes on RDS

1. autovacuum_vacuum_cost_delay is throttling vacuum to a crawl.
The default is 2ms per cost cycle (200 cost units). On an io1 or gp3 RDS instance capable of 3000+ IOPS, this is like parking a lorry on the motorway. Autovacuum runs — just nowhere near fast enough. Check with SHOW autovacuum_vacuum_cost_delay. The fix is to reduce it in the parameter group (0ms = no throttle, but only safe on dedicated instances with headroom).

2. The table hasn't crossed its vacuum threshold yet.
Autovacuum fires when dead tuples exceed autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor × n_live_tup. With the default scale_factor of 0.2, a table with 50 million live rows won't trigger autovacuum until 10 million dead tuples accumulate. That can take a long time — and look identical to "not running" during the accumulation phase.

3. Long-running transactions are blocking vacuum.
Autovacuum cannot remove dead tuples that might still be visible to an open transaction. A single idle-in-transaction connection that's been open for hours prevents vacuum from cleaning up anything created after that transaction began. Check:

SELECT
  pid,
  usename,
  state,
  wait_event,
  now() - xact_start                      AS txn_age,
  left(query, 60)                           AS query
FROM   pg_stat_activity
WHERE  state IN ('idle in transaction', 'active')
  AND  xact_start < now() - INTERVAL '5 minutes'
ORDER BY xact_start;

RDS note: On RDS you cannot run KILL. To terminate a blocking connection use SELECT pg_terminate_backend(pid). For read replicas, the blocking transaction is on the primary — the replica has no ability to terminate it.

4. autovacuum_max_workers is exhausted by other tables.
The default of 3 workers means only 3 tables can be vacuumed simultaneously. In a schema with hundreds of high-write tables, the worker pool queues up. Any given table may wait a long time before a worker becomes available — even if the overall autovacuum process is healthy. Check with SHOW autovacuum_max_workers and watch the worker count in pg_stat_activity over time.

Emergency relief — manual VACUUM

If dead tuples are critically high and autovacuum cannot keep pace, run a manual VACUUM to clear the backlog immediately. This runs in addition to autovacuum, not instead of it:

-- Target a specific table:
VACUUM (ANALYZE, VERBOSE) schema_name.table_name;

-- If the table is critically bloated, VACUUM FULL reclaims disk space
-- but takes an ACCESS EXCLUSIVE lock — causes downtime:
VACUUM FULL schema_name.table_name; -- only in a maintenance window

VACUUM ANALYZE vs VACUUM FULL: VACUUM (without FULL) marks dead tuples as reusable but does not shrink the physical file. VACUUM FULL rewrites the table to disk — it reclaims actual storage but takes a full table lock for the duration. On large tables this can be hours. Use it only in a planned maintenance window.

Permanent fix — parameter group tuning

The right parameter changes depend on your instance class, storage type, and write rate per table. The general direction on production RDS instances:

Reduce autovacuum_vacuum_cost_delay from 2ms toward 0–1ms on io1/gp3 storage
Reduce autovacuum_vacuum_scale_factor from 0.2 toward 0.01–0.05 on large tables
Increase autovacuum_max_workers from 3 toward 5–8 on multi-table schemas
Set per-table storage parameters on your highest-write tables via ALTER TABLE ... SET (autovacuum_vacuum_scale_factor = 0.01)

Getting these values wrong in the other direction (too aggressive) will spike your CloudWatch IOPS and potentially increase your RDS storage bill. Calibration requires profiling your specific workload — not copying generic values from a blog post.

Autovacuum tuning is one of the highest-leverage fixes in PostgreSQL.
PGFlare's Remediation+ session includes full autovacuum calibration: we profile every table's dead tuple rate, write throughput, and vacuum history, then deliver exact parameter values for your parameter group and per-table ALTER TABLE overrides — applied safely in a maintenance window.

Book a Remediation+ Session — £897 →