How to Detect Argo Rollouts Rollbacks with Argo Events and Automatically Create Jira Incidents and Confluence Postmortems
It's two in the morning, and a canary deployment is quietly rolling back. Argo Rollouts dutifully reverts to the previous version, but nobody knows it happened. You find out the next morning from a belated Slack message, and the postmortem gets hastily written a week later. I repeated this cycle for quite a long time.
This article covers how to connect the Argo Rollouts Notifications Engine with Argo Events so that the moment a rollback is detected, a Jira incident is automatically opened and a Confluence postmortem page draft is created. If you have a staging environment, you can follow along and verify event reception within 30 minutes.
One prerequisite to mention upfront: you need a Kubernetes cluster with kubectl access and Argo Rollouts installed. We'll assume you already know the basics of Kubernetes — Secrets, ConfigMaps, namespaces — and that Argo Events and the Atlassian API are new territory.
Core Concepts
Three Components, One Flow
The backbone of this automation is two Argo projects. Argo Rollouts is a controller that manages Canary and Blue/Green deployments on Kubernetes and handles automatic rollbacks, while Argo Events is an event-driven automation framework that receives deployment state changes as events and chains them through to external API calls.
The entire flow in one line:
Argo Rollouts detects rollback
→ Notifications Engine sends Webhook
→ Argo Events Webhook EventSource receives it
→ Delivered to Sensor via EventBus (NATS JetStream)
→ Sensor's HTTP Trigger calls Jira REST API → creates incident issue
→ Simultaneously calls Confluence REST API → initializes postmortem pageArgo Events component names can be confusing at first — just remember these three:
| Component | Role | Analogy |
|---|---|---|
| EventSource | Definition that receives external events | Ears |
| EventBus | Internal channel that delivers events | Nerves |
| Sensor | Component that evaluates conditions and executes triggers | Brain + Hands |
EventBus and NATS JetStream: EventBus is the message bus between EventSource and Sensor. It uses NATS JetStream internally, and declaring an
EventBusresource causes Argo Events to automatically spin up NATS pods — no separate installation required. The default is a single node, so you must configure HA for production — this is covered in detail in the pros and cons section.
CloudEvents: The event format standard (v1.0) that Argo Events adheres to. It formalizes fields like
source,type, anddatato simplify integration with other systems.
Argo Rollouts Notifications Engine
Since Rollouts v1.2.x, the Notifications Engine has stabilized and is suitable for production use. The three event types most commonly used for rollbacks are:
on-rollout-degraded: When a Rollout transitions to Degraded stateon-analysis-run-failed: When an AnalysisRun (metric analysis) failson-rollout-aborted: When a rollout is aborted
AnalysisRun: An object that Argo Rollouts uses to validate quality during deployment via Prometheus metrics, Webhooks, etc. A failed AnalysisRun triggers an automatic rollback.
A single annotation on the Rollout object controls notification subscriptions, letting you selectively enable incident automation for specific services only.
metadata:
annotations:
notifications.argoproj.io/subscribe.on-rollout-degraded.webhook: ""Two Strategies for Receiving Events
There are two ways to receive rollback events. It was confusing at first which one to use, but the deciding factor is simpler than it seems.
Method A — Rollouts Notifications Engine → Webhook EventSource
Rollouts sends a Webhook directly, and Argo Events receives it. If you're already running the Notifications Engine, or you want fine-grained control over notification conditions per Rollout object, A is a more natural extension.
Method B — Resource EventSource Watching Rollouts Directly
Argo Events watches Rollout resources directly through the Kubernetes API. If you want to get started quickly without configuring the Notifications Engine, or you want to monitor all Rollouts in a specific namespace in bulk, B has simpler setup. The tradeoff is that you must handle Degraded state filtering at the Sensor level yourself.
Pros and Cons Analysis
Before diving into practical examples, here's information to help you decide whether to adopt this stack.
Advantages
| Item | Details |
|---|---|
| Kubernetes native | Both Argo Events and Rollouts are CNCF projects. Declarative management with YAML, naturally integrates with GitOps |
| Diverse event sources | Beyond Webhooks, you can handle 20+ sources including Kafka, S3, and Pub/Sub on a single platform |
| Automatic context preservation | Image tag, namespace, and timestamp at the time of rollback are automatically recorded in Jira/Confluence |
| Low operational overhead | Workflow changes can be tracked and version-controlled as code, naturally integrating with GitOps pipelines |
| Fine-grained filtering | Label selectors and field selectors let you selectively detect only the Rollouts you want |
Disadvantages and Caveats
| Item | Details | Mitigation |
|---|---|---|
| Learning curve | 4-layer structure of EventSource → EventBus → Sensor → Trigger; initial setup is complex | Better to progressively expand based on samples from the official example repository |
| NATS JetStream dependency | Default is single node, so events can be lost during pod restarts. I experienced this myself at first | Setting nats.native.replicas: 3 or higher in the EventBus resource is much safer |
| Jira API rate limits | Jira Cloud has a limit of 300 requests per minute per user | Set retryStrategy on the Sensor, and consider adding Argo Workflows as a queuing layer for large-scale simultaneous rollbacks |
| Preventing duplicate incidents | Multiple MODIFIED events can fire for a single rollback |
Use filters.data on the Sensor to handle only exact state transitions, and add logic to search for duplicate issues in Jira before creating |
| Postmortem quality limits | Auto-initialized documents only provide structure | Root cause analysis must be filled in by the responsible team — the goal of this automation is to "secure a starting point" |
Most Common Mistakes in Practice
- Leaving EventBus on a single node: Running the default configuration in production can cause you to lose entire events during pod restarts. Setting
nats.native.replicas: 3or higher is essential. - Writing Jira/Confluence API credentials directly in YAML: Putting tokens in a ConfigMap or Sensor YAML means they end up in your GitOps repository as-is. Before this leads to rotating credentials for the entire team, use
secretKeyRefreferences or External Secrets Operator instead. - Applying notifications to all Rollouts in bulk: Turning every rollback in dev and staging environments into an incident creates too much noise. Use Rollout annotations to selectively enable automation for production namespaces only.
Practical Implementation
Apply the YAML in this order: EventBus → EventSource → Sensor. Following this order avoids dependency errors during deployment.
Step 1: Configure Rollout Notifications
Configure Argo Rollouts to send rollback events as Webhooks. Including the required Jira fields (jiraProject, jiraIssueType) in the payload upfront makes the Sensor side much cleaner.
# argo-rollouts-notification-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: argo-rollouts-notification-cm
namespace: argo-rollouts
data:
service.webhook.argo-events: |
url: http://argo-events-webhook-eventsource-svc.argo-events.svc.cluster.local:12000/rollout
headers:
- name: Content-Type
value: application/json
retryMax: 3
retryWaitMin: 5s
retryWaitMax: 30s
template.rollback-alert: |
webhook:
argo-events:
method: POST
body: |
{
"rolloutName": "{{.rollout.metadata.name}}",
"namespace": "{{.rollout.metadata.namespace}}",
"phase": "{{.rollout.status.phase}}",
"message": "{{.rollout.status.message}}",
"timestamp": "{{now | date \"2006-01-02T15:04:05Z07:00\"}}",
"jiraProject": "OPS",
"jiraIssueType": "Incident"
}
trigger.on-rollout-degraded: |
- send: [rollback-alert]If you chose Method A, you also need to deploy the EventSource that will receive this Webhook.
# rollout-webhook-eventsource.yaml
apiVersion: argoproj.io/v1alpha1
kind: EventSource
metadata:
name: rollout-webhook-eventsource
namespace: argo-events
spec:
service:
ports:
- port: 12000
targetPort: 12000
webhook:
rollout:
port: "12000"
endpoint: /rollout
method: POSTIf you chose Method B, deploy an EventSource that watches the Rollout resource directly. Degraded state filtering is handled by the Sensor, so here we only receive MODIFIED events.
# rollout-resource-eventsource.yaml
apiVersion: argoproj.io/v1alpha1
kind: EventSource
metadata:
name: rollout-watcher
namespace: argo-events
spec:
resource:
rollout-degraded:
group: argoproj.io
version: v1alpha1
resource: rollouts
namespace: production
eventTypes:
- MODIFIEDStep 2: Configure EventBus
Deploy the EventBus that will deliver events between EventSource and Sensor. It must be deployed before the Sensor.
# eventbus.yaml
apiVersion: argoproj.io/v1alpha1
kind: EventBus
metadata:
name: default
namespace: argo-events
spec:
nats:
native:
replicas: 3
auth: tokenStep 3: Wire Up the Sensor — Create Jira Incident + Confluence Postmortem Simultaneously
The Sensor receives the Webhook and executes two HTTP Triggers simultaneously. The key part is the status.phase == "Degraded" condition filter that prevents unnecessary duplicate executions.
The Jira Cloud REST API v3 POST /issue requires fields.project.key and fields.issuetype.name as mandatory values. Omitting either causes a 400 error. Since we included them in the payload in Step 1, here we just need to cleanly map them.
ADF (Atlassian Document Format): In Jira Cloud REST API v3, the
descriptionfield requires a JSON tree structure, not a simple string. The minimum structure is{ "type": "doc", "version": 1, "content": [{ "type": "paragraph", "content": [{ "type": "text", "text": "..." }] }] }. You can satisfy this structure by injecting text into thedescription.content[0].content[0].textpath via an Argo Eventspayload.
# rollback-sensor.yaml
apiVersion: argoproj.io/v1alpha1
kind: Sensor
metadata:
name: rollback-incident-sensor
namespace: argo-events
spec:
dependencies:
- name: rollout-dep
eventSourceName: rollout-webhook-eventsource # rollout-watcher for Method B
eventName: rollout
filters:
data:
- path: body.phase
type: string
value:
- "Degraded"
triggers:
# Trigger 1: Create Jira incident
- template:
name: create-jira-incident
http:
url: "https://$(JIRA_DOMAIN)/rest/api/3/issue"
method: POST
headers:
- name: Authorization
valueFrom:
secretKeyRef:
name: atlassian-credentials
key: basic-auth
- name: Content-Type
value: application/json
# payload: copies event data into specific paths in the request body
payload:
- src:
dependencyName: rollout-dep
dataKey: body.jiraProject
dest: body.fields.project.key
- src:
dependencyName: rollout-dep
dataKey: body.jiraIssueType
dest: body.fields.issuetype.name
- src:
dependencyName: rollout-dep
dataKey: body.rolloutName
dest: body.fields.summary
- src:
dependencyName: rollout-dep
dataKey: body.namespace
dest: body.fields.description.content[0].content[0].text
# parameters: modifies body values built by payload using prepend/append/override
# unlike payload, transformation operations are possible; if used with the same dest, parameters overwrite after payload is applied
parameters:
- src:
dependencyName: rollout-dep
dataKey: body.rolloutName
dest: body.fields.summary
operation: prepend
value: "[INCIDENT] Rollback detected: "
retryStrategy:
steps: 3
duration: 10s
# Trigger 2: Initialize Confluence postmortem page
- template:
name: create-confluence-postmortem
http:
url: "https://$(CONFLUENCE_DOMAIN)/wiki/rest/api/content"
method: POST
headers:
- name: Authorization
valueFrom:
secretKeyRef:
name: atlassian-credentials
key: basic-auth
- name: Content-Type
value: application/json
payload:
- src:
dependencyName: rollout-dep
dataKey: body.rolloutName
dest: body.title
parameters:
- src:
dependencyName: rollout-dep
dataKey: body.rolloutName
dest: body.title
operation: prepend
value: "Postmortem — Rollback: "
retryStrategy:
steps: 3
duration: 10sIt's better to inject $(JIRA_DOMAIN) and $(CONFLUENCE_DOMAIN) from a ConfigMap rather than hardcoding them.
# atlassian-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: atlassian-config
namespace: argo-events
data:
JIRA_DOMAIN: "your-team.atlassian.net"
CONFLUENCE_DOMAIN: "your-team.atlassian.net"Step 4: Compose the Confluence Postmortem Page Body
When using POST /wiki/rest/api/content from the Confluence REST API, the page body must be composed in storage format. Managing the template as a ConfigMap outside the Sensor — with rollback context auto-filled — makes maintenance easier.
{
"type": "page",
"title": "Postmortem — Rollback: my-service [2026-05-26]",
"space": { "key": "INCIDENTS" },
"ancestors": [{ "id": "123456" }],
"body": {
"storage": {
"value": "<h2>Incident Summary</h2><table><tr><th>Field</th><th>Details</th></tr><tr><td>Service</td><td>{{rolloutName}}</td></tr><tr><td>Namespace</td><td>{{namespace}}</td></tr><tr><td>Time of Occurrence</td><td>{{timestamp}}</td></tr><tr><td>Related Jira</td><td>{{jiraIssueKey}}</td></tr></table><h2>Timeline</h2><p>(To be filled in)</p><h2>Root Cause</h2><p>(To be filled in)</p><h2>Action Items to Prevent Recurrence</h2><p>(To be filled in)</p>",
"representation": "storage"
}
}
}How to look up
ancestors.id: This value differs per team, so you need to look it up yourself. Query the parent page's ID withGET /wiki/rest/api/content?title=<parent-page-name>&spaceKey=INCIDENTS, then store that value in a ConfigMap for reuse.
Step 5: Secret Management — Integrating External Secrets Operator
To be honest, the first time I set this up I put a Base64-encoded token directly in a ConfigMap, which ended up in the GitOps repo and caused the entire team to rotate credentials. Since then, I always use External Secrets Operator (ESO). The pattern of integrating with AWS Secrets Manager or Vault is widely used in practice.
# atlassian-external-secret.yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: atlassian-credentials
namespace: argo-events
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets-manager
kind: SecretStore
target:
name: atlassian-credentials
data:
- secretKey: basic-auth
remoteRef:
key: prod/atlassian
property: basic-auth-tokenClosing Thoughts
Having incident documentation automatically open the moment a rollback occurs means the first step of response is already taken, even if nobody is awake. The quality of the postmortem ultimately needs to be filled in by people, but securing a starting point automatically alone makes a noticeable difference in incident response speed and documentation completeness.
Three steps you can start right now:
- Check Argo Events installation status: Run
kubectl get pods -n argo-eventsto verify that the EventSource Controller, Sensor Controller, and EventBus are all in Running state. For a fresh install, you can start withkubectl apply -f https://github.com/argoproj/argo-events/releases/download/v1.9.2/install.yaml. Checking the releases page to pin the latest version tag is important for creating a reproducible environment. If you're already running Argo, you can skip straight to step 2. - Connect the Resource EventSource in your staging environment: Apply the
rollout-watcherEventSource from Step 2 to your staging namespace, manually put an actual Rollout into Degraded state, and usekubectl logsto verify events are being received correctly. - Test the Jira Webhook: Connect the Sensor HTTP Trigger's
urlto a temporary endpoint likehttps://webhook.siteinstead of the actual Jira API first — this lets you safely validate the payload structure before switching to the real API.
References
- Argo Events Official Documentation | argoproj.github.io
- Argo Events GitHub Repository | github.com
- Argo Events — Resource EventSource Setup | argoproj.github.io
- Argo Events — HTTP Trigger Documentation | argoproj.github.io
- Argo Rollouts — Notifications Overview | argo-rollouts.readthedocs.io
- Argo Rollouts — Webhook Notification Service | argo-rollouts.readthedocs.io
- Notifications for Argo | Argo Official Blog
- Tracking k8s resource changes using Argo Events | Kinaxis Engineering Blog
- Automating Rollbacks in Spring Boot with Argo Rollouts and Prometheus | Medium
- Jira Cloud Platform REST API — Issues | Atlassian Developer
- Confluence Cloud REST API v2 | Atlassian Developer
- Incident postmortem templates | Atlassian
- Automated Post-Mortem Generation: The Complete Guide for SRE Teams | dev.to