Draft a status page update during an active incident

intermediateClaude SonnetIT & SecuritySresaasincidentstatus-pagesrereliability

Use case

Use when an incident is underway and the on-call engineer needs to publish to the status page in under five minutes. Most status updates fail in one of two ways: too vague to be useful, or too specific and later contradicted. This prompt produces an update that holds up to a postmortem.

The prompt

You are a senior SRE who has run hundreds of incidents at companies with public status pages. Draft a status page update for an active incident.

Incident context:
- Service or surface affected: {{service}}
- Customer-visible symptom: {{symptom}} (what they experience, not the cause)
- Severity: {{severity}} (e.g., SEV1 full outage, SEV2 partial degradation)
- Time incident detected: {{detected_at}}
- Current incident phase: {{phase}} (Investigating, Identified, Monitoring, Resolved)
- What we know about the cause (if anything): {{known_cause}}
- What we have done so far: {{actions_taken}}
- Next planned action and ETA: {{next_action}}
- Time of next update commitment: {{next_update_time}}
- Affected regions or customer segments (if known): {{affected_scope}}

Write a status page update that:
- Opens with a one-line description of the customer-visible symptom, not the technical cause
- Names the affected service or surface using the same name customers see in their UI
- States the current phase explicitly using the standard status page vocabulary (Investigating / Identified / Monitoring / Resolved)
- Names the affected scope (regions, customer segments) if known, and explicitly says "scope is still being determined" if not
- Says what action is in progress, in plain language an engineering manager at a customer can read in 10 seconds
- Commits to a next update time that is realistic, not aspirational
- Does not name root cause unless it has been confirmed by an engineer with eyes on the system
- Does not apologize on behalf of the company beyond a brief acknowledgment — save the deeper apology for the postmortem
- Does not blame a vendor by name unless leadership has signed off

Output:
1. A one-line headline (under 80 characters) suitable for the status page incident title
2. The status update body (60 to 120 words)
3. A 1-line internal note for the on-call channel flagging anything in the update that needs leadership review before publishing

Variables

{{{{service}}}}Replace with your {{service}}
{{{{symptom}}}}Replace with your {{symptom}}
{{{{severity}}}}Replace with your {{severity}}
{{{{detected_at}}}}Replace with your {{detected at}}
{{{{phase}}}}Replace with your {{phase}}
{{{{known_cause}}}}Replace with your {{known cause}}
{{{{actions_taken}}}}Replace with your {{actions taken}}
{{{{next_action}}}}Replace with your {{next action}}
{{{{next_update_time}}}}Replace with your {{next update time}}
{{{{affected_scope}}}}Replace with your {{affected scope}}

Example

Input

service: Vectorline Query API (us-east-1)
symptom: elevated 5xx error rates and timeouts on /v1/query
severity: SEV2 partial degradation
detected_at: 14:08 UTC
phase: Identified
known_cause: a recent index warm-up job is saturating a shared cache, confirmed by SRE
actions_taken: rolled back the warm-up job, scaling cache replicas
next_action: monitor error rate, expect recovery within 20 minutes
next_update_time: 15:00 UTC
affected_scope: customers querying indexes in us-east-1; eu-west-2 unaffected

Output

Headline: Elevated errors on Query API in us-east-1

Update (Identified — 14:35 UTC):

Customers using the Query API in us-east-1 are seeing elevated 5xx errors and slow responses on /v1/query. eu-west-2 is not affected.

We have identified the cause as a recent internal job that was overloading a shared cache layer. We have rolled the job back and are scaling cache capacity now. Error rates are trending down.

We expect full recovery within 20 minutes. The next update will be posted by 15:00 UTC, sooner if the situation changes.

Thanks for your patience while we work through this.

Internal note: confirms cause and rollback are real before publishing — current draft commits us to "identified" and "rolled back," which the on-call should verify in Datadog before pushing to status.io.

Tips for best results

1Never publish 'identified' before an engineer has eyes on the system and a confirmed cause. Walking back from 'identified' to 'investigating' is worse than starting at 'investigating' for an extra 10 minutes.
2Always name the next update time. Customers can wait 30 minutes if they know they will get an update; they cannot wait 10 minutes in silence.
3Use the same service name customers see in your UI. 'Query API' not 'queryd-prod-east'.
4Keep a small library of canonical phrases for each phase and feed them back to Claude in the input. Consistency across incidents builds customer trust over time.
5After the incident, save the final published updates and run a retrospective prompt that compares them against the timeline. The gaps are where your incident comms playbook needs work.

Related prompts

Write a sincere apology email for a product outage or error

intermediate

Generate a professional, accountable apology email for a product outage, data error, or service failure that rebuilds trust without being defensive.

Customer Serviceapologyoutagecustomer-service

Decide whether and how to escalate a support ticket

intermediate

Evaluate a support ticket against a structured escalation framework and generate a recommended action with rationale.

Customer Serviceescalationsupport-processtriage

Summarize a retrospective meeting into action items and themes

beginner

Transform raw retrospective notes into a structured summary with synthesized themes, prioritized action items, and carry-forward commitments.

Operationsretrospectiveagileteam-management

Need help implementing this prompt in your workflow?

Book a call

You are a senior SRE who has run hundreds of incidents at companies with public status pages. Draft a status page update for an active incident. Incident context: - Service or surface affected: {{service}} - Customer-visible symptom: {{symptom}} (what they experience, not the cause) - Severity: {{severity}} (e.g., SEV1 full outage, SEV2 partial degradation) - Time incident detected: {{detected_at}} - Current incident phase: {{phase}} (Investigating, Identified, Monitoring, Resolved) - What we know about the cause (if anything): {{known_cause}} - What we have done so far: {{actions_taken}} - Next planned action and ETA: {{next_action}} - Time of next update commitment: {{next_update_time}} - Affected regions or customer segments (if known): {{affected_scope}} Write a status page update that: - Opens with a one-line description of the customer-visible symptom, not the technical cause - Names the affected service or surface using the same name customers see in their UI - States the current phase explicitly using the standard status page vocabulary (Investigating / Identified / Monitoring / Resolved) - Names the affected scope (regions, customer segments) if known, and explicitly says "scope is still being determined" if not - Says what action is in progress, in plain language an engineering manager at a customer can read in 10 seconds - Commits to a next update time that is realistic, not aspirational - Does not name root cause unless it has been confirmed by an engineer with eyes on the system - Does not apologize on behalf of the company beyond a brief acknowledgment — save the deeper apology for the postmortem - Does not blame a vendor by name unless leadership has signed off Output: 1. A one-line headline (under 80 characters) suitable for the status page incident title 2. The status update body (60 to 120 words) 3. A 1-line internal note for the on-call channel flagging anything in the update that needs leadership review before publishing

Variables

{{{{service}}}}Replace with your {{service}}

{{{{symptom}}}}Replace with your {{symptom}}

{{{{severity}}}}Replace with your {{severity}}

{{{{detected_at}}}}Replace with your {{detected at}}

{{{{phase}}}}Replace with your {{phase}}

{{{{known_cause}}}}Replace with your {{known cause}}

{{{{actions_taken}}}}Replace with your {{actions taken}}

{{{{next_action}}}}Replace with your {{next action}}

{{{{next_update_time}}}}Replace with your {{next update time}}

{{{{affected_scope}}}}Replace with your {{affected scope}}

Example

Input

service: Vectorline Query API (us-east-1)
symptom: elevated 5xx error rates and timeouts on /v1/query
severity: SEV2 partial degradation
detected_at: 14:08 UTC
phase: Identified
known_cause: a recent index warm-up job is saturating a shared cache, confirmed by SRE
actions_taken: rolled back the warm-up job, scaling cache replicas
next_action: monitor error rate, expect recovery within 20 minutes
next_update_time: 15:00 UTC
affected_scope: customers querying indexes in us-east-1; eu-west-2 unaffected

Output

Headline: Elevated errors on Query API in us-east-1

Update (Identified — 14:35 UTC):

Customers using the Query API in us-east-1 are seeing elevated 5xx errors and slow responses on /v1/query. eu-west-2 is not affected.

We have identified the cause as a recent internal job that was overloading a shared cache layer. We have rolled the job back and are scaling cache capacity now. Error rates are trending down.

We expect full recovery within 20 minutes. The next update will be posted by 15:00 UTC, sooner if the situation changes.

Thanks for your patience while we work through this.

Internal note: confirms cause and rollback are real before publishing — current draft commits us to "identified" and "rolled back," which the on-call should verify in Datadog before pushing to status.io.

Tips for best results

1Never publish 'identified' before an engineer has eyes on the system and a confirmed cause. Walking back from 'identified' to 'investigating' is worse than starting at 'investigating' for an extra 10 minutes.

2Always name the next update time. Customers can wait 30 minutes if they know they will get an update; they cannot wait 10 minutes in silence.

3Use the same service name customers see in your UI. 'Query API' not 'queryd-prod-east'.

4Keep a small library of canonical phrases for each phase and feed them back to Claude in the input. Consistency across incidents builds customer trust over time.

5After the incident, save the final published updates and run a retrospective prompt that compares them against the timeline. The gaps are where your incident comms playbook needs work.