Write a capacity planning memo with growth assumptions
intermediateClaude SonnetIT & SecuritySrecapacity-planningsreinfrastructurecostplanning
Use case
Use this prompt at the start of a planning cycle, before a known traffic event (Black Friday, product launch), or when finance pushes back on cloud spend. The output names the assumptions, the math, and the explicit asks — so leadership can decide rather than negotiate.
The prompt
You are an SRE writing a capacity planning memo. Audience: engineering leadership and finance. Tone: numerate, opinionated, willing to name uncertainty. Inputs: - Service or system:{{system}}- Time horizon:{{horizon}}(e.g., next 12 months) - Current utilization metrics:{{current_metrics}}- Growth assumptions:{{growth_assumptions}}- Known events / launches in horizon:{{events}}- Current cost and infra footprint:{{current_cost}}- Constraints:{{constraints}}(region, vendor, contract) Produce: 1. **TL;DR** (3–5 bullets): the ask, the cost, the headline risk 2. **Current state**: utilization across the dimensions that matter (CPU, memory, connections, IOPS, throughput, $) — with current peak vs avg 3. **Demand model**: how baseline traffic translates to resource needs (e.g., 1 RPS = X CPU-ms, Y DB connections). Show the unit economics. 4. **Growth scenarios**: low / base / high. For each: assumed growth rate, derived peak demand, infra needed 5. **Headroom and breakpoints**: where the system breaks under each scenario; explicit "we hit the wall at X RPS" 6. **Recommended plan**: what to provision/scale, when, in what order. Pre-event vs steady-state. 7. **Cost**: monthly delta and one-time, broken out by component 8. **Risks and unknowns**: list the assumptions you're least confident in; what would change the recommendation 9. **Decision needed**: a clear ask of the reader — approval, budget, sequencing Use tables. Show your arithmetic. Auditors and engineers should both find this credible.
Variables
{{{{system}}}}Replace with your {{system}}{{{{horizon}}}}Replace with your {{horizon}}{{{{current_metrics}}}}Replace with your {{current metrics}}{{{{growth_assumptions}}}}Replace with your {{growth assumptions}}{{{{events}}}}Replace with your {{events}}{{{{current_cost}}}}Replace with your {{current cost}}{{{{constraints}}}}Replace with your {{constraints}}
Example
Input
system: search-api + OpenSearch cluster (4-node m6g.2xlarge) horizon: 12 months (FY26) current_metrics: | - search-api: 800 RPS peak, 200 avg; 60% CPU peak; 18 pods - OpenSearch: 70% heap peak, 55% CPU peak, 1.2TB indexed of 2TB cluster capacity - Postgres connections: 220/500 peak growth_assumptions: | - Product PM forecasts 40% YoY query growth (aggressive launch in Q3) - International expansion (EU launch) adds ~25% traffic in Q4 - SKU catalog doubles by mid-year (impacts index size, not query volume) events: | - EU launch Sept 2026 - Black Friday Nov 2026 - New mobile app launch Q2 (expected 15% lift in queries) current_cost: | - search-api EKS: ~$3,200/mo - OpenSearch: ~$4,800/mo - data transfer + ancillary: ~$600/mo - total: ~$8,600/mo constraints: | - Single region today (us-east-1); EU launch will require eu-west-1 footprint - 3-year reserved instances on OpenSearch nodes purchased Q4 2025 - Finance wants total infra delta capped at +35% YoY
Output
# Capacity Plan: search-api + OpenSearch — FY26 Author: SRE. Reviewers: Eng leadership, Finance, Search PM. Date: 2026-05-07. ## TL;DR - **Asking for**: $4,200/mo additional steady-state by Q3, $1,800/mo one-time pre-Black-Friday burst capacity, plus eu-west-1 footprint of $5,800/mo from Sept (offset $0 today) - **Total FY26 ask**: ~$11,800/mo by year-end vs $8,600/mo today (+37% YoY) — slightly above the 35% Finance cap; case for the overage below - **Headline risk**: SKU catalog doubling will outgrow current OpenSearch storage by Q3 regardless of traffic growth — this is a non-negotiable scale-up - **Plan**: scale OpenSearch to 6 nodes by August, add eu-west-1 cluster in September, +50% search-api replicas for Black Friday week - **Decision needed**: approve OpenSearch scale-up by July 1 (4-week procurement lead time on reserved capacity) ## Current state | Metric | Avg | Peak | Capacity | Headroom | |---|---|---|---|---| | search-api RPS | 200 | 800 | ~1,400 (replica-bound) | 75% | | search-api CPU | 22% | 60% | 100% | 40% | | OpenSearch CPU | 30% | 55% | 100% | 45% | | OpenSearch heap | 50% | 70% | 85% (safe ceiling) | 15% | | OpenSearch storage | n/a | 1.2 TB | 2.0 TB | 40% | | Postgres connections | 120 | 220 | 500 | 56% | Tightest current constraint: OpenSearch heap (15% headroom). ## Demand model Empirical from production: - 1 RPS query traffic ≈ 0.014 OpenSearch CPU-cores at current index size - 1 RPS ≈ 0.27 Postgres connection-equivalent (transient) - Index size scales linearly with SKU catalog - Heap usage scales sublinearly with index size — ~0.6× factor in our use pattern - Cross-region adds ~30ms latency that we'll absorb in budget; doesn't change CPU model ## Growth scenarios Three scenarios for combined query growth (PM + mobile + EU): | Scenario | YoY growth | Q4 peak RPS | OpenSearch CPU at peak | Headroom | |---|---|---|---|---| | Low (only PM) | 40% | 1,120 | 77% | 23% | | Base (PM + mobile + half EU) | 60% | 1,280 | 88% | 12% — over safe | | High (PM + mobile + full EU + BF surge 1.6×) | 100% | 2,050 | 144% — failure | negative | Base case puts us above safe heap and CPU ceilings without scale-up. High case is a guaranteed outage at current capacity. ## Headroom and breakpoints At current 4-node OpenSearch: - **Heap wall**: 1.6 TB index size (catalog double = ~Q3) — heap pressure begins regardless of traffic - **CPU wall**: ~1,200 RPS sustained — base scenario hits this in Q4 - **search-api wall**: ~1,400 RPS at current 18-pod cap; raising replica count is cheap if EKS node group has room (it does, today) Black Friday compounds: 1.6× surge over baseline puts even Low scenario near the wall. ## Recommended plan | When | Action | Why | Cost delta | |---|---|---|---| | July | Scale OpenSearch to 6 nodes (m6g.2xlarge → 6× same) | Catalog doubling drives index past safe heap | +$2,400/mo | | August | Raise search-api HPA max from 18 → 30 pods | Pre-position for EU launch | +$800/mo (only paid when scaled) | | September | Stand up eu-west-1 OpenSearch (3 nodes) + search-api EKS footprint | EU launch, latency requirement | +$5,800/mo | | Mid-November | Pre-Black-Friday burst: temporarily upsize OpenSearch nodes to m6g.4xlarge for 3 weeks | Surge headroom for Black Friday week | +$1,800 one-time | | January | Reassess — likely keep eu-west-1 at steady state, drop BF burst | Steady state | (return to plan) | Sequencing constraint: OpenSearch scale-up requires 4 weeks procurement lead time on reserved capacity — approval needed by July 1. ## Cost summary | Item | Today | Year-end FY26 | Delta | |---|---|---|---| | search-api EKS (us-east) | $3,200 | $4,000 | +$800 | | search-api EKS (eu-west) | $0 | $1,200 | +$1,200 | | OpenSearch (us-east) | $4,800 | $7,200 | +$2,400 | | OpenSearch (eu-west) | $0 | $4,600 | +$4,600 | | Data transfer + ancillary | $600 | $1,000 | +$400 | | **Steady-state total** | **$8,600** | **$18,000** | **+$9,400 (+109%)** | | One-time BF burst | — | $1,800 | $1,800 | Caveat: the +109% reflects EU footprint, which is a launch decision, not capacity drift. Excluding EU, US-only delta is +$3,600/mo (+42%). The 35% cap is technically breached; case for overage is "EU launch is approved separately and adds revenue, not just cost." ## Risks and unknowns - **PM growth forecast confidence: medium-low.** 40% YoY assumes mobile launch hits its target; if mobile slips to 2027, base scenario looks like low scenario. - **Catalog size assumption is firm** — already in flight; will happen regardless of traffic. - **EU launch traffic mix uncertain**: assuming similar query patterns to US. If EU users have longer-tail queries, OpenSearch CPU per RPS could be 1.2–1.4×. Watching closely once first 1% of traffic ships. - **Reserved capacity decision**: scaling OpenSearch nodes inside the existing 3-year reservation is fine; adding eu-west-1 is on-demand year 1 (committed pricing once stable). ## Decision needed By **July 1, 2026**: - Approve OpenSearch us-east-1 scale-up (+$2,400/mo) - Approve eu-west-1 footprint contingent on EU launch go/no-go decision (+$5,800/mo) - Approve Black Friday burst budget ($1,800 one-time) By **September 1**: - Reconfirm growth model with 60-day actuals from mobile launch and recalibrate
Tips for best results
- 1Show your unit economics. 'We need more capacity' is hard to argue with; '0.014 OpenSearch CPU-cores per RPS' invites a real conversation.
- 2Always present low/base/high scenarios. A single number gets argued; three options get a decision.
- 3Name the breakpoints. 'We hit the wall at 1,200 RPS' is more useful than 'capacity is constrained.' Specifics force action.
- 4Note your assumption confidence. 'Medium-low' on a growth forecast is honest and helps the reader weight your conclusion.
- 5AI assistance is not a replacement for security review by qualified professionals. Have a senior SRE and your finance partner review the unit-economics math and reservation/commitment math before submitting.
Related prompts
Write an SLO definition document for a service
advancedProduce a complete SLO definition with SLIs, error budgets, alerting policy, and consequences when the budget is exhausted.
IT & Securityslosrereliability
Document a deployment strategy (blue-green, canary, rolling)
intermediateProduce a written deployment strategy document with rationale, mechanics, rollback procedure, and risk tradeoffs for a specific service.
IT & Securitydeploymentcanaryblue-green
Generate an incident response playbook for a service
advancedProduce a service-specific incident response playbook covering severity, roles, comms, common failure modes, and recovery steps.
IT & Securityincident-responsesreplaybook
Need help implementing this prompt in your workflow?
Book a call