Free RDS PostgreSQL Diagnostic Toolkit

How to use this: Connect to your RDS instance via psql or your preferred client with a read-only monitoring user (see the IAM role setup guide if you don't have one). Run queries 1–4 first — they give a snapshot of what is happening right now. Queries 5–8 give the historical picture. None of these queries modify data.

Top slow queries by total CPU time

The single most useful starting point. Shows which query shapes have consumed the most cumulative execution time since the last pg_stat_statements_reset().

SELECT
  round(total_exec_time::numeric, 1)    AS total_ms,
  calls,
  round(mean_exec_time::numeric, 1)     AS mean_ms,
  round(stddev_exec_time::numeric, 1)   AS stddev_ms,
  rows,
  round(100.0 * total_exec_time /
    sum(total_exec_time) OVER (), 1)     AS pct_of_total,
  left(query, 120)                        AS query
FROM   pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT  20;

Requires pg_stat_statements extension. → How to enable it on RDS

Active connections — state summary

Connection state breakdown. High idle in transaction counts are a leading indicator of lock contention and autovacuum blockage.

SELECT
  state,
  wait_event_type,
  count(*)                                                  AS cnt,
  count(*) FILTER(WHERE query_start < now() - INTERVAL '5m') AS older_5m
FROM   pg_stat_activity
WHERE  backend_type = 'client backend'
GROUP BY state, wait_event_type
ORDER BY cnt DESC;

older_5m highlights connections stuck in a state for more than 5 minutes.

Long-running queries and idle-in-transaction sessions

The queries currently running or holding open transactions. Idle-in-transaction sessions older than a few minutes will block autovacuum and accumulate locks.

SELECT
  pid,
  usename,
  application_name,
  state,
  wait_event_type,
  wait_event,
  now() - xact_start            AS txn_age,
  now() - query_start           AS query_age,
  left(query, 100)               AS query
FROM   pg_stat_activity
WHERE  state IN ('active', 'idle in transaction')
  AND  xact_start < now() - INTERVAL '30 seconds'
ORDER BY xact_start ASC;

To cancel a query without killing the connection: SELECT pg_cancel_backend(pid)

Lock contention — blocking chain

Shows which sessions are blocking other sessions. The blocking_pid column identifies the root blocker in a lock chain.

SELECT
  blocked.pid                            AS blocked_pid,
  blocked.usename                        AS blocked_user,
  blocking.pid                           AS blocking_pid,
  blocking.usename                       AS blocking_user,
  blocking.state                         AS blocking_state,
  now() - blocked.query_start           AS blocked_duration,
  left(blocked.query, 80)                AS blocked_query,
  left(blocking.query, 80)               AS blocking_query
FROM   pg_stat_activity blocked
JOIN   pg_stat_activity blocking
       ON blocking.pid = ANY(pg_blocking_pids(blocked.pid))
WHERE  cardinality(pg_blocking_pids(blocked.pid)) > 0
ORDER BY blocked_duration DESC;

Returns no rows when there is no lock contention — which is the result you want.

Autovacuum status — tables with high dead tuple counts

Tables where dead tuple accumulation is ahead of autovacuum. last_autovacuum being NULL or hours old on a high-write table indicates autovacuum is not keeping up.

SELECT
  schemaname || '.' || relname         AS table_name,
  n_live_tup,
  n_dead_tup,
  round(n_dead_tup * 100.0 /
    nullif(n_live_tup + n_dead_tup, 0), 1) AS dead_pct,
  last_autovacuum,
  last_autoanalyze,
  autovacuum_count,
  pg_size_pretty(pg_total_relation_size(relid)) AS total_size
FROM   pg_stat_user_tables
WHERE  n_dead_tup > 10000
ORDER BY n_dead_tup DESC
LIMIT  20;

dead_pct above 20% on a large table warrants investigation. → Autovacuum diagnosis guide

Unused indexes — candidates for removal

Indexes that have never been scanned (or rarely) since the last statistics reset. Every unused index adds write overhead on INSERT/UPDATE/DELETE and wastes storage.

SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM   pg_stat_user_indexes
WHERE  idx_scan = 0
  AND  pg_relation_size(indexrelid) > 5 * 1024 * 1024   -- > 5MB only
  AND  indexname NOT LIKE '%_pkey'                       -- keep primary keys
  AND  indexname NOT LIKE '%_unique'                     -- keep unique constraints
ORDER BY pg_relation_size(indexrelid) DESC;

Don't drop without checking — statistics reset after a restart. Verify with a week of data before removing.

Cache hit rate — buffer and index

Buffer cache hit rate below 95% on a memory-adequate instance often indicates a query doing a sequential scan on a large table. Index hit rate below 99% suggests index bloat or missing indexes.

SELECT
  sum(heap_blks_hit)  AS heap_hit,
  sum(heap_blks_read) AS heap_read,
  round(sum(heap_blks_hit) * 100.0 /
    nullif(sum(heap_blks_hit) + sum(heap_blks_read), 0), 2) AS heap_hit_pct,
  round(sum(idx_blks_hit) * 100.0 /
    nullif(sum(idx_blks_hit) + sum(idx_blks_read), 0), 2) AS idx_hit_pct
FROM   pg_statio_user_tables;

Target: heap_hit_pct > 95%, idx_hit_pct > 99%. Below 90% on either — check for seq scans on large tables.

Table sizes — largest tables and indexes

A rapid view of which tables and their indexes are consuming the most storage. The ratio of table_size to index_size being unexpectedly high often indicates index bloat.

SELECT
  schemaname,
  tablename,
  pg_size_pretty(pg_table_size(schemaname || '.' || tablename))      AS table_size,
  pg_size_pretty(pg_indexes_size(schemaname || '.' || tablename))    AS index_size,
  pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size
FROM   information_schema.tables
WHERE  table_schema NOT IN ('pg_catalog', 'information_schema')
  AND  table_type = 'BASE TABLE'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT  25;

When index_size is larger than table_size, run Query 6 to check for unused indexes. → Index bloat diagnosis guide

Results pointing at something you can't explain?

These queries surface the symptoms. Knowing what to do about them — which indexes to add, which parameters to change, which connections to terminate — is a different skill. That is what a PGFlare session delivers.

Book a Diagnostic Session — £497 →

RDS PostgreSQL Diagnostic Queries