PostgreSQL Query Planner Regressions on AWS RDS 16, 17, and 18

A planner regression is when a PostgreSQL upgrade — including a minor version patch like 16.1 → 16.4 or a major version upgrade from 15 to 16 — causes a query that previously ran fast to suddenly become slow. The query hasn't changed. The data hasn't changed significantly. But PostgreSQL's query planner is now choosing a different execution plan.

These are among the most disorienting performance problems on RDS because they appear suddenly, correlate with maintenance events, and are invisible to standard monitoring until queries are already timing out in production.

Why planner regressions happen on RDS minor version upgrades

RDS applies minor version patches during maintenance windows. Between minor versions, PostgreSQL ships planner cost model adjustments, statistics collection improvements, and occasionally new join strategy heuristics. Any of these can cause the planner to re-evaluate a query and choose a different plan than before.

Common regression triggers on PostgreSQL 16, 17, and 18:

Statistics reset after upgrade: RDS minor version upgrades may clear the pg_statistic table. The planner falls back to default row estimates until ANALYZE runs — which may choose suboptimal plans on tables with skewed data distributions.
enable_memoize (PG14+): Memoisation of inner-loop results was introduced in PostgreSQL 14. On certain join patterns, memoize can be slower than the planner predicts. It can be disabled per-session with SET enable_memoize = off.
Parallel query threshold changes: PostgreSQL 16 and 17 adjusted the cost thresholds for parallel query initiation. A query that previously ran in a single backend process may now spawn parallel workers — adding overhead on small result sets.
JIT compilation defaults: JIT was enabled by default from PostgreSQL 12. On short-running OLTP queries, JIT compilation overhead exceeds execution time savings. Check SHOW jit and consider SET jit = off for OLTP workloads.

Step 1 — Identify regressed queries

Compare pg_stat_statements before and after a maintenance event to surface regressions:

-- Queries where average execution time is significantly higher than stddev suggests
-- (indicating a plan change rather than data volume change):
SELECT
  left(query, 80)                                 AS query_snippet,
  calls,
  round(mean_exec_time::numeric, 2)              AS avg_ms,
  round(stddev_exec_time::numeric, 2)            AS stddev_ms,
  round(max_exec_time::numeric, 2)               AS max_ms
FROM   pg_stat_statements
WHERE  calls > 50
  AND  mean_exec_time > 100   -- queries averaging over 100ms
ORDER BY mean_exec_time DESC
LIMIT  25;

Use CloudWatch QueryExecutionTime via Performance Insights to spot queries that changed cost profile around the time of the maintenance event. A step-change in average execution time for a specific query fingerprint is the signature of a plan change.

Step 2 — Confirm a plan change with EXPLAIN

Run EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) on the suspect query and compare the plan to what you'd expect. Look for:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT ... -- your suspect query here;

Key signals of a regression:

Seq Scan on a large table where you'd expect an index scan
Estimated rows wildly different from actual rows (planner is guessing wrong)
Hash Join replaced by Nested Loop on large tables
High Buffers: shared read (disk reads) where previous plan was all-cache

Step 3 — Fix without rewriting the query

Before rewriting queries or adding hints (PostgreSQL doesn't have native hints), try these in order:

Run ANALYZE to refresh statistics:

ANALYZE VERBOSE schema_name.table_name;
-- Or analyze the whole database (takes time on large schemas):
ANALYZE VERBOSE;

Disable JIT for the regressed query pattern:

-- At session level (useful for testing):
SET jit = off;
-- In the parameter group for all connections:
jit = off   -- set in RDS parameter group

Disable memoize if the regression is on a nested-loop join:

SET enable_memoize = off;

Force a specific join strategy for a session:

SET enable_hashjoin = off;    -- force nested loop or merge join
SET enable_nestloop = off;   -- force hash or merge join
SET enable_seqscan = off;    -- force index usage (use with caution)

Caution with enable_xxx = off: These are blunt instruments. Disabling a join type instance-wide affects every query, not just the regressed one. Use them at session level for diagnosis, or apply targeted per-function settings if your ORM allows it. A proper fix is either updated statistics, an adjusted index, or a targeted parameter group change after confirming the impact.

PostgreSQL 16 and 17 specific changes to be aware of

PostgreSQL 16 introduced incremental sort improvements and changed the cost model for merge joins in certain cases. PostgreSQL 17 refined the planner's handling of parameterised nested loops and adjusted partition pruning logic. Both versions improved statistics for range types and multi-column statistics — but the improvements assume fresh statistics. Running ANALYZE immediately after any minor version upgrade is good practice on RDS.

Planner regressions need a before/after comparison that most teams don't have.
PGFlare's Diagnostic session uses pg_stat_statements and Performance Insights data to identify regressed query fingerprints, produce before/after EXPLAIN comparisons, and recommend targeted fixes — including parameter group changes, statistics updates, and index additions — with the lowest-risk mitigation path first.

Book a Diagnostic — £497 →