Optimize a slow SQL query

intermediateClaude SonnetEngineeringCodingsqldatabaseperformanceoptimizationengineering

Use case

Use this prompt when a query is too slow for production, when you're adding a feature that needs to be database-efficient, or when you're reviewing a colleague's PR that includes database queries. Works best when you include the query plan and table schema.

The prompt

You are a database performance expert. Analyze and optimize the SQL query below.

**Database system:** {{database}} (e.g., PostgreSQL 15, MySQL 8, SQLite, BigQuery)
**Table schemas (include column types):**
```sql
{{table_schemas}}
```
**The slow query:**
```sql
{{slow_query}}
```
**Query plan output (EXPLAIN / EXPLAIN ANALYZE):** {{query_plan}}
**Approximate row counts:** {{row_counts}} (e.g., "users: 2M rows, orders: 15M rows")
**Query frequency:** {{query_frequency}} (e.g., "runs on every page load for logged-in users, ~500/sec peak")
**Current performance:** {{current_performance}} (e.g., "average 4.2 seconds, P99 12 seconds")
**Target performance:** {{target_performance}} (e.g., "under 100ms", "fast enough for a real-time search box")

Provide:

## 1. Performance Diagnosis
- What is making this query slow? (full table scan, missing index, N+1, bad join order, etc.)
- Identify the specific bottleneck in the query plan
- Explain *why* the database is executing it this way

## 2. Optimized Query
```sql
-- Optimized version with inline comments explaining each change
[optimized query]
```

## 3. Required Index Changes
For each index to add or remove:
```sql
-- What this index does and why it's needed
CREATE INDEX ...
```
- Estimated write-overhead impact of new indexes
- Any existing indexes that are now redundant and can be dropped

## 4. Change-by-Change Explanation
For every meaningful change to the query or schema:
- What changed
- Why it's faster
- Any trade-offs (e.g., index write cost, less readable query)

## 5. Expected Performance Impact
- Estimated improvement (order of magnitude, not a precise number)
- What workload conditions make this improvement most significant
- What conditions could reduce the benefit

## 6. How to Verify the Improvement
- The EXPLAIN/EXPLAIN ANALYZE command to run
- What to look for in the new query plan (seq scan → index scan, rows estimate accuracy, etc.)
- How to benchmark in production safely

## 7. Further Optimizations (If Still Not Fast Enough)
If the query-level optimizations aren't enough, what architectural changes would help?
- Caching strategy (Redis, materialized views, etc.)
- Data model changes
- Application-level changes (pagination, lazy loading, denormalization)

Variables

{{{{database}}}}Replace with your {{database}}
{{{{table_schemas}}}}Replace with your {{table schemas}}
{{{{slow_query}}}}Replace with your {{slow query}}
{{{{query_plan}}}}Replace with your {{query plan}}
{{{{row_counts}}}}Replace with your {{row counts}}
{{{{query_frequency}}}}Replace with your {{query frequency}}
{{{{current_performance}}}}Replace with your {{current performance}}
{{{{target_performance}}}}Replace with your {{target performance}}

Example

Input

database: PostgreSQL 15
table_schemas: |
  CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    name VARCHAR(255),
    plan VARCHAR(50) DEFAULT 'free',
    created_at TIMESTAMPTZ DEFAULT NOW(),
    deleted_at TIMESTAMPTZ
  );

  CREATE TABLE events (
    id BIGSERIAL PRIMARY KEY,
    user_id BIGINT REFERENCES users(id),
    event_type VARCHAR(100) NOT NULL,
    properties JSONB,
    created_at TIMESTAMPTZ DEFAULT NOW()
  );

  -- Existing indexes: users(email), events(user_id)
slow_query: |
  SELECT
    u.id,
    u.email,
    u.name,
    COUNT(e.id) AS event_count,
    MAX(e.created_at) AS last_event_at
  FROM users u
  LEFT JOIN events e ON e.user_id = u.id
  WHERE u.plan = 'pro'
    AND u.deleted_at IS NULL
    AND e.event_type = 'page_view'
    AND e.created_at > NOW() - INTERVAL '30 days'
  GROUP BY u.id, u.email, u.name
  ORDER BY last_event_at DESC NULLS LAST
  LIMIT 100;
query_plan: |
  Hash Left Join (cost=189234.55..412876.32 rows=847 width=72) (actual time=3821.442..4102.887 rows=100 loops=1)
    Hash Cond: (e.user_id = u.id)
    -> Seq Scan on events e (cost=0.00..198234.12 rows=1247832 width=24) (actual time=0.021..1876.543 rows=1284912 loops=1)
          Filter: ((event_type = 'page_view') AND (created_at > (now() - '30 days'::interval)))
          Rows Removed by Filter: 13715088
    -> Hash (cost=10234.55..10234.55 rows=12345 width=48) (actual time=234.123..234.123 rows=12456 loops=1)
          -> Seq Scan on users u (cost=0.00..10234.55 rows=12345 width=48)
                Filter: ((plan = 'pro') AND (deleted_at IS NULL))
                Rows Removed by Filter: 187544
row_counts: "users: 200K rows, events: 15M rows"
query_frequency: "Powers a dashboard widget, runs once per dashboard load per account manager, ~50/min during business hours"
current_performance: "Average 4.1 seconds, P99 over 8 seconds"
target_performance: "Under 500ms average"

Output

## 1. Performance Diagnosis

**Root cause: Two sequential scans on large tables — the query reads every row in both tables.**

From the query plan:
- `Seq Scan on events`: Reading all 15M event rows, keeping only 1.28M (91% discarded). This is the primary bottleneck — over 3.8 seconds just for this scan.
- `Seq Scan on users`: Reading all 200K users, keeping only ~12K (94% discarded). A secondary problem.

**Why the database chooses sequential scans:** PostgreSQL uses sequential scans when there's no index that covers the filter columns. `events` has an index on `user_id`, but the filter is on `event_type` and `created_at` — neither is indexed. Similarly, `users` has no index on `plan` or `deleted_at`.

**Additional issue:** The LEFT JOIN is semantically wrong. Because `e.event_type = 'page_view'` is in the WHERE clause (not the JOIN ON clause), it implicitly converts the LEFT JOIN to an INNER JOIN — users with no matching events are excluded. This is a logic bug AND it makes query planning harder.

## 2. Optimized Query

```sql
SELECT
u.id,
u.email,
u.name,
COUNT(e.id) AS event_count,
MAX(e.created_at) AS last_event_at
FROM users u
-- Moved event filters into JOIN condition so LEFT JOIN semantics are correct
-- and the planner can use nested loop with index if selectivity is high
LEFT JOIN events e
ON e.user_id = u.id
AND e.event_type = 'page_view'
AND e.created_at > NOW() - INTERVAL '30 days'
WHERE u.plan = 'pro'
AND u.deleted_at IS NULL
GROUP BY u.id, u.email, u.name
ORDER BY last_event_at DESC NULLS LAST
LIMIT 100;
```

## 3. Required Index Changes

```sql
-- Covers the events filter: event_type + created_at, with user_id for the join
-- PostgreSQL will use this for: WHERE event_type = 'page_view' AND created_at > ...
-- user_id included so the index covers the join without a table heap lookup
CREATE INDEX idx_events_type_created_user
ON events (event_type, created_at DESC, user_id)
WHERE created_at > NOW() - INTERVAL '90 days'; -- partial index: only recent events

-- Covers the users filter: plan + deleted_at
CREATE INDEX idx_users_plan_active
ON users (plan)
WHERE deleted_at IS NULL; -- partial index: only non-deleted users
```

**Write overhead:** Both indexes add ~5–15ms to INSERT/UPDATE on their respective tables. At typical event ingestion rates, this is negligible. The partial index on events covers only ~3M rows (30-day window out of 15M total) — it stays small and fast to update.

**Existing indexes:** `events(user_id)` is now superseded by the composite index for this query pattern. Keep it if other queries use it; otherwise it's safe to drop.

## 4. Change-by-Change Explanation

**Composite index on events (event_type, created_at, user_id):** Instead of scanning 15M rows and filtering, PostgreSQL can seek directly to `event_type = 'page_view'` entries within the 30-day window — roughly 1.3M rows of 15M. The partial index (`WHERE created_at > NOW() - '90 days'`) keeps the index small and avoids indexing historical data nobody queries.

**Partial index on users (plan WHERE deleted_at IS NULL):** Reduces the users scan from 200K rows to ~12K active pro users. Partial indexes are more selective and smaller than full indexes on low-cardinality columns.

**Moved event filters to JOIN ON clause:** Fixes the logic bug (restores true LEFT JOIN behavior) and gives the planner flexibility to choose a nested loop join — look up each pro user's events via index — instead of a hash join on the full events table.

## 5. Expected Performance Impact

- **Estimated improvement:** 10–30x — from ~4s to ~100–400ms
- **Best case:** When there are few pro users (high selectivity on users) and the planner chooses a nested loop index scan on events per user
- **Worst case:** If the 30-day page_view event volume is very high per user, the events index still reads many rows per user, but still far fewer than a full table scan
- **The LIMIT 100 helps:** PostgreSQL can stop early once 100 rows are produced if the ORDER BY aligns with index ordering

## 6. How to Verify the Improvement

```sql
-- Run with ANALYZE to get actual timing (note: runs the query for real)
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.id, u.email, u.name, ... -- your optimized query
;
```

Look for:
- `Index Scan` or `Index Only Scan` on events (not `Seq Scan`)
- `Bitmap Index Scan` on users using the partial index
- Actual rows much closer to estimated rows (bad estimates cause bad plans)
- `Buffers: shared hit=` — lower buffer reads means less I/O

**Safe production benchmark:** Run the query with `EXPLAIN ANALYZE` during low-traffic hours first. The partial index can be built `CONCURRENTLY` to avoid locking:
```sql
CREATE INDEX CONCURRENTLY idx_events_type_created_user ON events ...;
```

## 7. Further Optimizations (If Still Not Fast Enough)

- **Materialized view:** Pre-aggregate `event_count` and `last_event_at` per user nightly. The dashboard query becomes a simple lookup on a 12K-row view instead of aggregating 1.3M events.
- **Pagination with keyset:** Replace `OFFSET/LIMIT` with `WHERE last_event_at < :cursor` to avoid the full sort on each page.
- **Caching:** Cache the dashboard result in Redis with a 5-minute TTL — account managers don't need real-time data for this widget.

Tips for best results

1Always include the EXPLAIN output — query optimization without the query plan is guesswork. PostgreSQL: EXPLAIN ANALYZE; MySQL: EXPLAIN FORMAT=JSON.
2Include approximate row counts. An index that's perfect for 10K rows might be wrong for 100M rows — size changes the math.
3Specify the database version — query planner behavior and available index types differ significantly between versions.
4If you're hitting the limit of single-query optimization, describe the feature it powers — sometimes the right fix is a data model change, not a faster query.

Related prompts

Review and improve a database data model

advanced

Critically evaluate a database schema or data model — identifying normalization issues, missing constraints, indexing gaps, and design patterns that will cause problems at scale — with concrete improvements.

Engineeringdatabaseschemaarchitecture

Review code for performance issues

intermediate

Analyze a function, endpoint, or system component for performance bottlenecks — identifying N+1 queries, unnecessary computation, memory leaks, and algorithmic inefficiencies with specific fixes and benchmarking guidance.

Engineeringperformancecode-reviewdatabase

Refactor a function for readability and performance

intermediate

Analyze a function or module and produce a refactored version that improves readability, reduces complexity, and addresses performance issues — with a full explanation of every change.

Engineeringrefactoringcode-qualityengineering

Need help implementing this prompt in your workflow?

Book a call

You are a database performance expert. Analyze and optimize the SQL query below. **Database system:** {{database}} (e.g., PostgreSQL 15, MySQL 8, SQLite, BigQuery) **Table schemas (include column types):** ```sql {{table_schemas}} ``` **The slow query:** ```sql {{slow_query}} ``` **Query plan output (EXPLAIN / EXPLAIN ANALYZE):** {{query_plan}} **Approximate row counts:** {{row_counts}} (e.g., "users: 2M rows, orders: 15M rows") **Query frequency:** {{query_frequency}} (e.g., "runs on every page load for logged-in users, ~500/sec peak") **Current performance:** {{current_performance}} (e.g., "average 4.2 seconds, P99 12 seconds") **Target performance:** {{target_performance}} (e.g., "under 100ms", "fast enough for a real-time search box") Provide: ## 1. Performance Diagnosis - What is making this query slow? (full table scan, missing index, N+1, bad join order, etc.) - Identify the specific bottleneck in the query plan - Explain *why* the database is executing it this way ## 2. Optimized Query ```sql -- Optimized version with inline comments explaining each change [optimized query] ``` ## 3. Required Index Changes For each index to add or remove: ```sql -- What this index does and why it's needed CREATE INDEX ... ``` - Estimated write-overhead impact of new indexes - Any existing indexes that are now redundant and can be dropped ## 4. Change-by-Change Explanation For every meaningful change to the query or schema: - What changed - Why it's faster - Any trade-offs (e.g., index write cost, less readable query) ## 5. Expected Performance Impact - Estimated improvement (order of magnitude, not a precise number) - What workload conditions make this improvement most significant - What conditions could reduce the benefit ## 6. How to Verify the Improvement - The EXPLAIN/EXPLAIN ANALYZE command to run - What to look for in the new query plan (seq scan → index scan, rows estimate accuracy, etc.) - How to benchmark in production safely ## 7. Further Optimizations (If Still Not Fast Enough) If the query-level optimizations aren't enough, what architectural changes would help? - Caching strategy (Redis, materialized views, etc.) - Data model changes - Application-level changes (pagination, lazy loading, denormalization)

Variables

{{{{database}}}}Replace with your {{database}}

{{{{table_schemas}}}}Replace with your {{table schemas}}

{{{{slow_query}}}}Replace with your {{slow query}}

{{{{query_plan}}}}Replace with your {{query plan}}

{{{{row_counts}}}}Replace with your {{row counts}}

{{{{query_frequency}}}}Replace with your {{query frequency}}

{{{{current_performance}}}}Replace with your {{current performance}}

{{{{target_performance}}}}Replace with your {{target performance}}

Example

Input

database: PostgreSQL 15
table_schemas: |
  CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    name VARCHAR(255),
    plan VARCHAR(50) DEFAULT 'free',
    created_at TIMESTAMPTZ DEFAULT NOW(),
    deleted_at TIMESTAMPTZ
  );

  CREATE TABLE events (
    id BIGSERIAL PRIMARY KEY,
    user_id BIGINT REFERENCES users(id),
    event_type VARCHAR(100) NOT NULL,
    properties JSONB,
    created_at TIMESTAMPTZ DEFAULT NOW()
  );

  -- Existing indexes: users(email), events(user_id)
slow_query: |
  SELECT
    u.id,
    u.email,
    u.name,
    COUNT(e.id) AS event_count,
    MAX(e.created_at) AS last_event_at
  FROM users u
  LEFT JOIN events e ON e.user_id = u.id
  WHERE u.plan = 'pro'
    AND u.deleted_at IS NULL
    AND e.event_type = 'page_view'
    AND e.created_at > NOW() - INTERVAL '30 days'
  GROUP BY u.id, u.email, u.name
  ORDER BY last_event_at DESC NULLS LAST
  LIMIT 100;
query_plan: |
  Hash Left Join (cost=189234.55..412876.32 rows=847 width=72) (actual time=3821.442..4102.887 rows=100 loops=1)
    Hash Cond: (e.user_id = u.id)
    -> Seq Scan on events e (cost=0.00..198234.12 rows=1247832 width=24) (actual time=0.021..1876.543 rows=1284912 loops=1)
          Filter: ((event_type = 'page_view') AND (created_at > (now() - '30 days'::interval)))
          Rows Removed by Filter: 13715088
    -> Hash (cost=10234.55..10234.55 rows=12345 width=48) (actual time=234.123..234.123 rows=12456 loops=1)
          -> Seq Scan on users u (cost=0.00..10234.55 rows=12345 width=48)
                Filter: ((plan = 'pro') AND (deleted_at IS NULL))
                Rows Removed by Filter: 187544
row_counts: "users: 200K rows, events: 15M rows"
query_frequency: "Powers a dashboard widget, runs once per dashboard load per account manager, ~50/min during business hours"
current_performance: "Average 4.1 seconds, P99 over 8 seconds"
target_performance: "Under 500ms average"

Output

## 1. Performance Diagnosis

**Root cause: Two sequential scans on large tables — the query reads every row in both tables.**

## 2. Optimized Query

## 3. Required Index Changes

-- Covers the users filter: plan + deleted_at
CREATE INDEX idx_users_plan_active
ON users (plan)
WHERE deleted_at IS NULL; -- partial index: only non-deleted users
```

**Existing indexes:** `events(user_id)` is now superseded by the composite index for this query pattern. Keep it if other queries use it; otherwise it's safe to drop.

## 4. Change-by-Change Explanation

## 5. Expected Performance Impact

## 6. How to Verify the Improvement

```sql
-- Run with ANALYZE to get actual timing (note: runs the query for real)
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.id, u.email, u.name, ... -- your optimized query
;
```

## 7. Further Optimizations (If Still Not Fast Enough)

Tips for best results

1Always include the EXPLAIN output — query optimization without the query plan is guesswork. PostgreSQL: EXPLAIN ANALYZE; MySQL: EXPLAIN FORMAT=JSON.

2Include approximate row counts. An index that's perfect for 10K rows might be wrong for 100M rows — size changes the math.

3Specify the database version — query planner behavior and available index types differ significantly between versions.

4If you're hitting the limit of single-query optimization, describe the feature it powers — sometimes the right fix is a data model change, not a faster query.