EXPLAIN ANALYZE on AWS RDS PostgreSQL — A Practical Guide

EXPLAIN ANALYZE is the definitive tool for understanding why a PostgreSQL query is slow. Unlike EXPLAIN alone (which shows the planner's estimate), EXPLAIN ANALYZE actually executes the query and shows both the estimated and actual costs — making it possible to identify exactly where the planner's model diverges from reality. On AWS RDS PostgreSQL 16, 17, and 18, the same tool works identically to community PostgreSQL, with a few RDS-specific considerations.

EXPLAIN ANALYZE executes the query. On SELECT statements this is safe. On INSERT, UPDATE, and DELETE, the statement runs and modifies data. Wrap in a transaction and rollback if you don't want changes committed: BEGIN; EXPLAIN ANALYZE UPDATE ...; ROLLBACK;

The full diagnostic form

For production diagnosis, always use all four options:

EXPLAIN (
  ANALYZE,     -- actually execute and show real timings
  BUFFERS,     -- show shared buffer hits/misses (disk vs cache)
  VERBOSE,     -- show output columns and schema-qualified names
  FORMAT TEXT  -- TEXT is readable; JSON is machine-parseable
)
SELECT ... -- your query;

Reading the output — what each line means

A typical plan node looks like:

Index Scan using orders_user_id_idx on orders
  (cost=0.43..8.45 rows=1 width=72)
  (actual time=0.032..0.033 rows=1 loops=1)
  Index Cond: (user_id = 12345)
  Buffers: shared hit=3

Field	Meaning
`cost=0.43..8.45`	Planner's estimated startup cost .. total cost (arbitrary units). Not wall-clock time.
`rows=1` (estimate)	Planner's estimate of rows returned. Divergence from actual is the key signal.
`actual time=0.032..0.033`	Real startup ms .. total ms for this node. Cumulative — includes child nodes.
`rows=1` (actual)	Rows actually returned. Compare to estimate — large gaps mean stale statistics.
`loops=1`	How many times this node executed. In nested loops, multiply time × loops for true cost.
`shared hit=3`	Pages served from shared_buffers (fast, no disk I/O).
`shared read=N`	Pages read from disk (slow). High values = table/index not in cache on RDS.

The most important signal: estimated vs actual rows

When the planner estimates 1 row but the query actually returns 50,000 — or vice versa — it will choose the wrong execution strategy. This is the root cause of most unexplained query slowdowns on RDS.

The fix is almost always ANALYZE on the relevant table(s) to update statistics. For highly skewed distributions (e.g. a status column where 99% of rows have status = 'completed'), increase the statistics target on that column:

-- Default statistics target is 100. Increase for skewed columns:
ALTER TABLE orders ALTER COLUMN status
  SET STATISTICS 500;

-- Then refresh:
ANALYZE orders;

Spotting a Seq Scan that should be an Index Scan

A Seq Scan on a large table (millions of rows) where you'd expect an index scan is a common finding. Causes:

No index exists on the filter column(s)
The planner estimates the query returns a large fraction of the table (sequential is faster than random index I/O above ~10% of table size)
The index exists but is bloated — check with pgstattuple
Implicit type cast prevents index use (e.g. WHERE id = '12345' on an integer column)

Using auto_explain on RDS to capture production plans

auto_explain logs query plans automatically for queries exceeding a time threshold — without requiring developer intervention. Enable it in the RDS parameter group (requires a reboot to add to shared_preload_libraries):

# In RDS parameter group:
shared_preload_libraries    = pg_stat_statements,auto_explain

auto_explain.log_min_duration = 1000   # log plans for queries taking >1s
auto_explain.log_analyze      = on     # include actual timings (executes query)
auto_explain.log_buffers      = on     # include buffer statistics
auto_explain.log_nested_statements = on  # capture plans inside functions

Plans appear in the RDS PostgreSQL log (available in CloudWatch Logs if log export is enabled). Search for duration: followed by Query Text: and the QUERY PLAN block.

auto_explain.log_analyze = on executes each query twice (once to get the plan, once to execute normally). This doubles CPU for logged queries. Set log_min_duration high enough (1000–5000ms) to avoid logging the majority of fast queries, or use auto_explain.sample_rate (PG12+) to log a percentage of all queries rather than only slow ones.

Reading one plan is easy. Reading fifty of them systematically is where the value is.
PGFlare's Diagnostic session analyses pg_stat_statements output to identify your highest-impact queries, runs EXPLAIN ANALYZE on each, and delivers a prioritised remediation plan — query rewrites, index additions, and statistics fixes — with expected impact estimates per change.

Book a Diagnostic — £497 →