Client Case Studies — PGFlare PostgreSQL Performance Engineering

FinTech & Payments

UK Payment Processor Cuts RDS Bill by 65%

Diagnostic + Remediation+ session pair on a db.r5.4xlarge Multi-AZ PostgreSQL 16.9 cluster

65% RDS cost reduction

14× Query speedup (P95)

£3.1k Monthly saving

Challenge

Transaction reporting queries were hitting 8–12 second latency during peak hours. The team had right-sized twice in six months (from r5.2xlarge to r5.4xlarge) without improvement. AWS support recommended moving to r5.8xlarge at an additional £2,400/month.

What PGFlare Found

Three sequential scans on a 180 GB transactions table (missing composite index)
Autovacuum dead tuple threshold 4× too high — 38 GB of table bloat
Connection pooling absent — 900+ direct connections exhausting shared_buffers
Two N+1 query patterns in ORM layer (87,000 queries per report run)

Results Timeline

Day 1 — Diagnostic: findings report delivered. 12 priority issues ranked by impact. Immediate: indexes created CONCURRENTLY — query time 8s → 1.4s

Day 2 — Remediation+: autovacuum tuned, bloat cleared, PgBouncer configured. CPU utilisation: 78% → 22%

Week 3 — Downsize to db.r5.xlarge completed. SLA maintained. Monthly bill: £4,820 → £1,690

"We'd been told the instance was too small. PGFlare showed us in one day that the queries were the problem, not the hardware. We saved more in the first month than the entire engagement cost."
— CTO, UK Payment Processor (anonymised)

🏢 Series B FinTech, 45 engineers ⚙️ db.r5.4xlarge → db.r5.xlarge Multi-AZ 💾 io1 / 2TB / 20,000 IOPS

See Diagnostic pricing →

HealthTech

P1 Database Incident Resolved: 4-Hour Emergency Response

Emergency response + follow-up Diagnostic on a db.m5.2xlarge PostgreSQL 15.6 patient records system

4h Time to resolution

0 Data loss

47% RDS cost reduction

Challenge

At 07:42 on a Tuesday, all write operations began failing. A patient scheduling module serving 12 NHS-connected clinics went dark. The engineering team had no PostgreSQL specialist in-house and couldn't identify the root cause. ICB escalation risk was active within 2 hours.

What PGFlare Found

Long-running VACUUM blocking autovacuum — table bloat 180% of table data
Lock chain: 1 stale connection holding AccessExclusiveLock for 4.2 hours
pg_toast table grown to 94 GB (JSONB column storing large blobs inline)
Dead connection accumulation depleting max_connections (configured at 100)

Results Timeline

08:15 — PGFlare emergency engaged. IAM access granted. Root cause identified within 22 minutes

09:40 — Writes restored after controlled lock termination + connection limit tuning. JSONB overflow fix deployed without schema change

Following week — Full Diagnostic session: structural fixes implemented. Instance right-sized from m5.2xlarge to m5.large at 3 weeks

"We were staring at a regulatory incident. PGFlare was on the call within minutes, had a diagnosis in 22 minutes, and writes were back in under 2 hours. That's exactly what you need at 8am on a Tuesday."
— Lead Platform Engineer, HealthTech SaaS (anonymised)

🏢 HealthTech SaaS, 22 engineers ⚙️ db.m5.2xlarge → db.m5.large Single-AZ 💾 gp3 / 500 GB

Emergency response: £350/hr →

E-Commerce & Retail

Black Friday Prep: Autovacuum Bloat Crisis Averted

Diagnostic session on a db.r6g.2xlarge PostgreSQL 16.9 order management system — 6 weeks before peak

83% Table bloat reduction

11× Checkout query speedup

£1.8k Monthly saving

Challenge

Six weeks before Black Friday, checkout query P99 latency had grown from 380ms to 2.1 seconds vs. the prior year. A Grafana alert showed autovacuum running continuously but not catching up. Storage had grown 180 GB in 3 months with no new data volume increase.

What PGFlare Found

Autovacuum cost_delay set to 20ms (default) — too slow for write-heavy orders table
orders table: 74% dead tuples by row count (2.1 GB live, 8.7 GB dead)
Missing partial index on (status, created_at) — all active-order queries doing full scan
Three FK constraints with no backing indexes — UPDATE cascades triggering seq scans
pg_stat_user_tables showed 0 autovacuum runs completing cleanly in 14 days

Results Timeline

Day 1 — Diagnostic report delivered. Autovacuum tuned: cost_delay 0ms, vacuum_scale_factor reduced to 0.01. Bloat cleared within 6 hours

Day 1-2 — Partial index and FK indexes added CONCURRENTLY during low-traffic window. Checkout latency: 2.1s → 190ms P99

Black Friday — Record transaction volume processed without incident. CPU peaked at 34% vs. 89% the prior year

"PGFlare found things in one day that our team had been chasing for weeks. The autovacuum config alone halved our weekly storage growth. Black Friday went smoothly for the first time in three years."
— VP Engineering, UK E-Commerce Platform (anonymised)

🏢 Mid-market retailer, 38 engineers ⚙️ db.r6g.2xlarge → db.r6g.large Single-AZ (post-BF) 💾 gp3 / 1 TB

Calculate your savings →

What PGFlare delivers
in practice

UK Payment Processor Cuts RDS Bill by 65%

Challenge

What PGFlare Found

Results Timeline

P1 Database Incident Resolved: 4-Hour Emergency Response

Challenge

What PGFlare Found

Results Timeline

Black Friday Prep: Autovacuum Bloat Crisis Averted

Challenge

What PGFlare Found

Results Timeline

Ready to see what PGFlare finds in your RDS instance?

What PGFlare deliversin practice

UK Payment Processor Cuts RDS Bill by 65%

Challenge

What PGFlare Found

Results Timeline

P1 Database Incident Resolved: 4-Hour Emergency Response

Challenge

What PGFlare Found

Results Timeline

Black Friday Prep: Autovacuum Bloat Crisis Averted

Challenge

What PGFlare Found

Results Timeline

Ready to see what PGFlare finds in your RDS instance?

What PGFlare delivers
in practice