Free Toolkit

RDS PostgreSQL Diagnostic Queries

These are the 8 SQL queries we open in the first five minutes of every audit. After running several hundred RDS diagnostic sessions, these are the ones that always run — regardless of what the client describes as the symptom.

View on GitHub — DevOpsDBAMaverick/rds-postgres-quick-check
How to use this: Connect to your RDS instance via psql or your preferred client with a read-only monitoring user (see the IAM role setup guide if you don't have one). Run queries 1–4 first — they give a snapshot of what is happening right now. Queries 5–8 give the historical picture. None of these queries modify data.
1

Top slow queries by total CPU time

The single most useful starting point. Shows which query shapes have consumed the most cumulative execution time since the last pg_stat_statements_reset().

SELECT
  round(total_exec_time::numeric, 1)    AS total_ms,
  calls,
  round(mean_exec_time::numeric, 1)     AS mean_ms,
  round(stddev_exec_time::numeric, 1)   AS stddev_ms,
  rows,
  round(100.0 * total_exec_time /
    sum(total_exec_time) OVER (), 1)     AS pct_of_total,
  left(query, 120)                        AS query
FROM   pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT  20;
2

Active connections — state summary

Connection state breakdown. High idle in transaction counts are a leading indicator of lock contention and autovacuum blockage.

SELECT
  state,
  wait_event_type,
  count(*)                                                  AS cnt,
  count(*) FILTER(WHERE query_start < now() - INTERVAL '5m') AS older_5m
FROM   pg_stat_activity
WHERE  backend_type = 'client backend'
GROUP BY state, wait_event_type
ORDER BY cnt DESC;
3

Long-running queries and idle-in-transaction sessions

The queries currently running or holding open transactions. Idle-in-transaction sessions older than a few minutes will block autovacuum and accumulate locks.

SELECT
  pid,
  usename,
  application_name,
  state,
  wait_event_type,
  wait_event,
  now() - xact_start            AS txn_age,
  now() - query_start           AS query_age,
  left(query, 100)               AS query
FROM   pg_stat_activity
WHERE  state IN ('active', 'idle in transaction')
  AND  xact_start < now() - INTERVAL '30 seconds'
ORDER BY xact_start ASC;
4

Lock contention — blocking chain

Shows which sessions are blocking other sessions. The blocking_pid column identifies the root blocker in a lock chain.

SELECT
  blocked.pid                            AS blocked_pid,
  blocked.usename                        AS blocked_user,
  blocking.pid                           AS blocking_pid,
  blocking.usename                       AS blocking_user,
  blocking.state                         AS blocking_state,
  now() - blocked.query_start           AS blocked_duration,
  left(blocked.query, 80)                AS blocked_query,
  left(blocking.query, 80)               AS blocking_query
FROM   pg_stat_activity blocked
JOIN   pg_stat_activity blocking
       ON blocking.pid = ANY(pg_blocking_pids(blocked.pid))
WHERE  cardinality(pg_blocking_pids(blocked.pid)) > 0
ORDER BY blocked_duration DESC;
5

Autovacuum status — tables with high dead tuple counts

Tables where dead tuple accumulation is ahead of autovacuum. last_autovacuum being NULL or hours old on a high-write table indicates autovacuum is not keeping up.

SELECT
  schemaname || '.' || relname         AS table_name,
  n_live_tup,
  n_dead_tup,
  round(n_dead_tup * 100.0 /
    nullif(n_live_tup + n_dead_tup, 0), 1) AS dead_pct,
  last_autovacuum,
  last_autoanalyze,
  autovacuum_count,
  pg_size_pretty(pg_total_relation_size(relid)) AS total_size
FROM   pg_stat_user_tables
WHERE  n_dead_tup > 10000
ORDER BY n_dead_tup DESC
LIMIT  20;
6

Unused indexes — candidates for removal

Indexes that have never been scanned (or rarely) since the last statistics reset. Every unused index adds write overhead on INSERT/UPDATE/DELETE and wastes storage.

SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM   pg_stat_user_indexes
WHERE  idx_scan = 0
  AND  pg_relation_size(indexrelid) > 5 * 1024 * 1024   -- > 5MB only
  AND  indexname NOT LIKE '%_pkey'                       -- keep primary keys
  AND  indexname NOT LIKE '%_unique'                     -- keep unique constraints
ORDER BY pg_relation_size(indexrelid) DESC;
7

Cache hit rate — buffer and index

Buffer cache hit rate below 95% on a memory-adequate instance often indicates a query doing a sequential scan on a large table. Index hit rate below 99% suggests index bloat or missing indexes.

SELECT
  sum(heap_blks_hit)  AS heap_hit,
  sum(heap_blks_read) AS heap_read,
  round(sum(heap_blks_hit) * 100.0 /
    nullif(sum(heap_blks_hit) + sum(heap_blks_read), 0), 2) AS heap_hit_pct,
  round(sum(idx_blks_hit) * 100.0 /
    nullif(sum(idx_blks_hit) + sum(idx_blks_read), 0), 2) AS idx_hit_pct
FROM   pg_statio_user_tables;
8

Table sizes — largest tables and indexes

A rapid view of which tables and their indexes are consuming the most storage. The ratio of table_size to index_size being unexpectedly high often indicates index bloat.

SELECT
  schemaname,
  tablename,
  pg_size_pretty(pg_table_size(schemaname || '.' || tablename))      AS table_size,
  pg_size_pretty(pg_indexes_size(schemaname || '.' || tablename))    AS index_size,
  pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size
FROM   information_schema.tables
WHERE  table_schema NOT IN ('pg_catalog', 'information_schema')
  AND  table_type = 'BASE TABLE'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT  25;

Results pointing at something you can't explain?

These queries surface the symptoms. Knowing what to do about them — which indexes to add, which parameters to change, which connections to terminate — is a different skill. That is what a PGFlare session delivers.

Book a Diagnostic Session — £497 →