Privacy Policy© 2026 DEV BAK - TECH BLOG. All rights reserved.
DEV BAK - TECH BLOG
DevOps

How to Detect Argo Rollouts Rollbacks with Argo Events and Automatically Create Jira Incidents and Confluence Postmortems

It's two in the morning, and a canary deployment is quietly rolling back. Argo Rollouts dutifully reverts to the previous version, but nobody knows it happened. You find out the next morning from a belated Slack message, and the postmortem gets hastily written a week later. I repeated this cycle for quite a long time.

This article covers how to connect the Argo Rollouts Notifications Engine with Argo Events so that the moment a rollback is detected, a Jira incident is automatically opened and a Confluence postmortem page draft is created. If you have a staging environment, you can follow along and verify event reception within 30 minutes.

One prerequisite to mention upfront: you need a Kubernetes cluster with kubectl access and Argo Rollouts installed. We'll assume you already know the basics of Kubernetes — Secrets, ConfigMaps, namespaces — and that Argo Events and the Atlassian API are new territory.


Core Concepts

Three Components, One Flow

The backbone of this automation is two Argo projects. Argo Rollouts is a controller that manages Canary and Blue/Green deployments on Kubernetes and handles automatic rollbacks, while Argo Events is an event-driven automation framework that receives deployment state changes as events and chains them through to external API calls.

The entire flow in one line:

Argo Rollouts detects rollback
  → Notifications Engine sends Webhook
  → Argo Events Webhook EventSource receives it
  → Delivered to Sensor via EventBus (NATS JetStream)
  → Sensor's HTTP Trigger calls Jira REST API → creates incident issue
  → Simultaneously calls Confluence REST API → initializes postmortem page

Argo Events component names can be confusing at first — just remember these three:

Component Role Analogy
EventSource Definition that receives external events Ears
EventBus Internal channel that delivers events Nerves
Sensor Component that evaluates conditions and executes triggers Brain + Hands

EventBus and NATS JetStream: EventBus is the message bus between EventSource and Sensor. It uses NATS JetStream internally, and declaring an EventBus resource causes Argo Events to automatically spin up NATS pods — no separate installation required. The default is a single node, so you must configure HA for production — this is covered in detail in the pros and cons section.

CloudEvents: The event format standard (v1.0) that Argo Events adheres to. It formalizes fields like source, type, and data to simplify integration with other systems.

Argo Rollouts Notifications Engine

Since Rollouts v1.2.x, the Notifications Engine has stabilized and is suitable for production use. The three event types most commonly used for rollbacks are:

  • on-rollout-degraded: When a Rollout transitions to Degraded state
  • on-analysis-run-failed: When an AnalysisRun (metric analysis) fails
  • on-rollout-aborted: When a rollout is aborted

AnalysisRun: An object that Argo Rollouts uses to validate quality during deployment via Prometheus metrics, Webhooks, etc. A failed AnalysisRun triggers an automatic rollback.

A single annotation on the Rollout object controls notification subscriptions, letting you selectively enable incident automation for specific services only.

yaml
metadata:
  annotations:
    notifications.argoproj.io/subscribe.on-rollout-degraded.webhook: ""

Two Strategies for Receiving Events

There are two ways to receive rollback events. It was confusing at first which one to use, but the deciding factor is simpler than it seems.

Method A — Rollouts Notifications Engine → Webhook EventSource

Rollouts sends a Webhook directly, and Argo Events receives it. If you're already running the Notifications Engine, or you want fine-grained control over notification conditions per Rollout object, A is a more natural extension.

Method B — Resource EventSource Watching Rollouts Directly

Argo Events watches Rollout resources directly through the Kubernetes API. If you want to get started quickly without configuring the Notifications Engine, or you want to monitor all Rollouts in a specific namespace in bulk, B has simpler setup. The tradeoff is that you must handle Degraded state filtering at the Sensor level yourself.


Pros and Cons Analysis

Before diving into practical examples, here's information to help you decide whether to adopt this stack.

Advantages

Item Details
Kubernetes native Both Argo Events and Rollouts are CNCF projects. Declarative management with YAML, naturally integrates with GitOps
Diverse event sources Beyond Webhooks, you can handle 20+ sources including Kafka, S3, and Pub/Sub on a single platform
Automatic context preservation Image tag, namespace, and timestamp at the time of rollback are automatically recorded in Jira/Confluence
Low operational overhead Workflow changes can be tracked and version-controlled as code, naturally integrating with GitOps pipelines
Fine-grained filtering Label selectors and field selectors let you selectively detect only the Rollouts you want

Disadvantages and Caveats

Item Details Mitigation
Learning curve 4-layer structure of EventSource → EventBus → Sensor → Trigger; initial setup is complex Better to progressively expand based on samples from the official example repository
NATS JetStream dependency Default is single node, so events can be lost during pod restarts. I experienced this myself at first Setting nats.native.replicas: 3 or higher in the EventBus resource is much safer
Jira API rate limits Jira Cloud has a limit of 300 requests per minute per user Set retryStrategy on the Sensor, and consider adding Argo Workflows as a queuing layer for large-scale simultaneous rollbacks
Preventing duplicate incidents Multiple MODIFIED events can fire for a single rollback Use filters.data on the Sensor to handle only exact state transitions, and add logic to search for duplicate issues in Jira before creating
Postmortem quality limits Auto-initialized documents only provide structure Root cause analysis must be filled in by the responsible team — the goal of this automation is to "secure a starting point"

Most Common Mistakes in Practice

  1. Leaving EventBus on a single node: Running the default configuration in production can cause you to lose entire events during pod restarts. Setting nats.native.replicas: 3 or higher is essential.
  2. Writing Jira/Confluence API credentials directly in YAML: Putting tokens in a ConfigMap or Sensor YAML means they end up in your GitOps repository as-is. Before this leads to rotating credentials for the entire team, use secretKeyRef references or External Secrets Operator instead.
  3. Applying notifications to all Rollouts in bulk: Turning every rollback in dev and staging environments into an incident creates too much noise. Use Rollout annotations to selectively enable automation for production namespaces only.

Practical Implementation

Apply the YAML in this order: EventBus → EventSource → Sensor. Following this order avoids dependency errors during deployment.

Step 1: Configure Rollout Notifications

Configure Argo Rollouts to send rollback events as Webhooks. Including the required Jira fields (jiraProject, jiraIssueType) in the payload upfront makes the Sensor side much cleaner.

yaml
# argo-rollouts-notification-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: argo-rollouts-notification-cm
  namespace: argo-rollouts
data:
  service.webhook.argo-events: |
    url: http://argo-events-webhook-eventsource-svc.argo-events.svc.cluster.local:12000/rollout
    headers:
      - name: Content-Type
        value: application/json
    retryMax: 3
    retryWaitMin: 5s
    retryWaitMax: 30s
 
  template.rollback-alert: |
    webhook:
      argo-events:
        method: POST
        body: |
          {
            "rolloutName": "{{.rollout.metadata.name}}",
            "namespace": "{{.rollout.metadata.namespace}}",
            "phase": "{{.rollout.status.phase}}",
            "message": "{{.rollout.status.message}}",
            "timestamp": "{{now | date \"2006-01-02T15:04:05Z07:00\"}}",
            "jiraProject": "OPS",
            "jiraIssueType": "Incident"
          }
 
  trigger.on-rollout-degraded: |
    - send: [rollback-alert]

If you chose Method A, you also need to deploy the EventSource that will receive this Webhook.

yaml
# rollout-webhook-eventsource.yaml
apiVersion: argoproj.io/v1alpha1
kind: EventSource
metadata:
  name: rollout-webhook-eventsource
  namespace: argo-events
spec:
  service:
    ports:
      - port: 12000
        targetPort: 12000
  webhook:
    rollout:
      port: "12000"
      endpoint: /rollout
      method: POST

If you chose Method B, deploy an EventSource that watches the Rollout resource directly. Degraded state filtering is handled by the Sensor, so here we only receive MODIFIED events.

yaml
# rollout-resource-eventsource.yaml
apiVersion: argoproj.io/v1alpha1
kind: EventSource
metadata:
  name: rollout-watcher
  namespace: argo-events
spec:
  resource:
    rollout-degraded:
      group: argoproj.io
      version: v1alpha1
      resource: rollouts
      namespace: production
      eventTypes:
        - MODIFIED

Step 2: Configure EventBus

Deploy the EventBus that will deliver events between EventSource and Sensor. It must be deployed before the Sensor.

yaml
# eventbus.yaml
apiVersion: argoproj.io/v1alpha1
kind: EventBus
metadata:
  name: default
  namespace: argo-events
spec:
  nats:
    native:
      replicas: 3
      auth: token

Step 3: Wire Up the Sensor — Create Jira Incident + Confluence Postmortem Simultaneously

The Sensor receives the Webhook and executes two HTTP Triggers simultaneously. The key part is the status.phase == "Degraded" condition filter that prevents unnecessary duplicate executions.

The Jira Cloud REST API v3 POST /issue requires fields.project.key and fields.issuetype.name as mandatory values. Omitting either causes a 400 error. Since we included them in the payload in Step 1, here we just need to cleanly map them.

ADF (Atlassian Document Format): In Jira Cloud REST API v3, the description field requires a JSON tree structure, not a simple string. The minimum structure is { "type": "doc", "version": 1, "content": [{ "type": "paragraph", "content": [{ "type": "text", "text": "..." }] }] }. You can satisfy this structure by injecting text into the description.content[0].content[0].text path via an Argo Events payload.

yaml
# rollback-sensor.yaml
apiVersion: argoproj.io/v1alpha1
kind: Sensor
metadata:
  name: rollback-incident-sensor
  namespace: argo-events
spec:
  dependencies:
    - name: rollout-dep
      eventSourceName: rollout-webhook-eventsource  # rollout-watcher for Method B
      eventName: rollout
      filters:
        data:
          - path: body.phase
            type: string
            value:
              - "Degraded"
  triggers:
    # Trigger 1: Create Jira incident
    - template:
        name: create-jira-incident
        http:
          url: "https://$(JIRA_DOMAIN)/rest/api/3/issue"
          method: POST
          headers:
            - name: Authorization
              valueFrom:
                secretKeyRef:
                  name: atlassian-credentials
                  key: basic-auth
            - name: Content-Type
              value: application/json
          # payload: copies event data into specific paths in the request body
          payload:
            - src:
                dependencyName: rollout-dep
                dataKey: body.jiraProject
              dest: body.fields.project.key
            - src:
                dependencyName: rollout-dep
                dataKey: body.jiraIssueType
              dest: body.fields.issuetype.name
            - src:
                dependencyName: rollout-dep
                dataKey: body.rolloutName
              dest: body.fields.summary
            - src:
                dependencyName: rollout-dep
                dataKey: body.namespace
              dest: body.fields.description.content[0].content[0].text
          # parameters: modifies body values built by payload using prepend/append/override
          # unlike payload, transformation operations are possible; if used with the same dest, parameters overwrite after payload is applied
          parameters:
            - src:
                dependencyName: rollout-dep
                dataKey: body.rolloutName
              dest: body.fields.summary
              operation: prepend
              value: "[INCIDENT] Rollback detected: "
        retryStrategy:
          steps: 3
          duration: 10s
 
    # Trigger 2: Initialize Confluence postmortem page
    - template:
        name: create-confluence-postmortem
        http:
          url: "https://$(CONFLUENCE_DOMAIN)/wiki/rest/api/content"
          method: POST
          headers:
            - name: Authorization
              valueFrom:
                secretKeyRef:
                  name: atlassian-credentials
                  key: basic-auth
            - name: Content-Type
              value: application/json
          payload:
            - src:
                dependencyName: rollout-dep
                dataKey: body.rolloutName
              dest: body.title
          parameters:
            - src:
                dependencyName: rollout-dep
                dataKey: body.rolloutName
              dest: body.title
              operation: prepend
              value: "Postmortem — Rollback: "
        retryStrategy:
          steps: 3
          duration: 10s

It's better to inject $(JIRA_DOMAIN) and $(CONFLUENCE_DOMAIN) from a ConfigMap rather than hardcoding them.

yaml
# atlassian-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: atlassian-config
  namespace: argo-events
data:
  JIRA_DOMAIN: "your-team.atlassian.net"
  CONFLUENCE_DOMAIN: "your-team.atlassian.net"

Step 4: Compose the Confluence Postmortem Page Body

When using POST /wiki/rest/api/content from the Confluence REST API, the page body must be composed in storage format. Managing the template as a ConfigMap outside the Sensor — with rollback context auto-filled — makes maintenance easier.

json
{
  "type": "page",
  "title": "Postmortem — Rollback: my-service [2026-05-26]",
  "space": { "key": "INCIDENTS" },
  "ancestors": [{ "id": "123456" }],
  "body": {
    "storage": {
      "value": "<h2>Incident Summary</h2><table><tr><th>Field</th><th>Details</th></tr><tr><td>Service</td><td>{{rolloutName}}</td></tr><tr><td>Namespace</td><td>{{namespace}}</td></tr><tr><td>Time of Occurrence</td><td>{{timestamp}}</td></tr><tr><td>Related Jira</td><td>{{jiraIssueKey}}</td></tr></table><h2>Timeline</h2><p>(To be filled in)</p><h2>Root Cause</h2><p>(To be filled in)</p><h2>Action Items to Prevent Recurrence</h2><p>(To be filled in)</p>",
      "representation": "storage"
    }
  }
}

How to look up ancestors.id: This value differs per team, so you need to look it up yourself. Query the parent page's ID with GET /wiki/rest/api/content?title=<parent-page-name>&spaceKey=INCIDENTS, then store that value in a ConfigMap for reuse.

Step 5: Secret Management — Integrating External Secrets Operator

To be honest, the first time I set this up I put a Base64-encoded token directly in a ConfigMap, which ended up in the GitOps repo and caused the entire team to rotate credentials. Since then, I always use External Secrets Operator (ESO). The pattern of integrating with AWS Secrets Manager or Vault is widely used in practice.

yaml
# atlassian-external-secret.yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: atlassian-credentials
  namespace: argo-events
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: SecretStore
  target:
    name: atlassian-credentials
  data:
    - secretKey: basic-auth
      remoteRef:
        key: prod/atlassian
        property: basic-auth-token

Closing Thoughts

Having incident documentation automatically open the moment a rollback occurs means the first step of response is already taken, even if nobody is awake. The quality of the postmortem ultimately needs to be filled in by people, but securing a starting point automatically alone makes a noticeable difference in incident response speed and documentation completeness.

Three steps you can start right now:

  1. Check Argo Events installation status: Run kubectl get pods -n argo-events to verify that the EventSource Controller, Sensor Controller, and EventBus are all in Running state. For a fresh install, you can start with kubectl apply -f https://github.com/argoproj/argo-events/releases/download/v1.9.2/install.yaml. Checking the releases page to pin the latest version tag is important for creating a reproducible environment. If you're already running Argo, you can skip straight to step 2.
  2. Connect the Resource EventSource in your staging environment: Apply the rollout-watcher EventSource from Step 2 to your staging namespace, manually put an actual Rollout into Degraded state, and use kubectl logs to verify events are being received correctly.
  3. Test the Jira Webhook: Connect the Sensor HTTP Trigger's url to a temporary endpoint like https://webhook.site instead of the actual Jira API first — this lets you safely validate the payload structure before switching to the real API.

References

  • Argo Events Official Documentation | argoproj.github.io
  • Argo Events GitHub Repository | github.com
  • Argo Events — Resource EventSource Setup | argoproj.github.io
  • Argo Events — HTTP Trigger Documentation | argoproj.github.io
  • Argo Rollouts — Notifications Overview | argo-rollouts.readthedocs.io
  • Argo Rollouts — Webhook Notification Service | argo-rollouts.readthedocs.io
  • Notifications for Argo | Argo Official Blog
  • Tracking k8s resource changes using Argo Events | Kinaxis Engineering Blog
  • Automating Rollbacks in Spring Boot with Argo Rollouts and Prometheus | Medium
  • Jira Cloud Platform REST API — Issues | Atlassian Developer
  • Confluence Cloud REST API v2 | Atlassian Developer
  • Incident postmortem templates | Atlassian
  • Automated Post-Mortem Generation: The Complete Guide for SRE Teams | dev.to
#ArgoRollouts#ArgoEvents#Kubernetes#Jira#Confluence#GitOps#NATS-JetStream#Webhook#ExternalSecretsOperator#Canary배포
Share

Table of Contents

Core ConceptsThree Components, One FlowArgo Rollouts Notifications EngineTwo Strategies for Receiving EventsPros and Cons AnalysisAdvantagesDisadvantages and CaveatsMost Common Mistakes in PracticePractical ImplementationStep 1: Configure Rollout NotificationsStep 2: Configure EventBusStep 3: Wire Up the Sensor — Create Jira Incident + Confluence Postmortem SimultaneouslyStep 4: Compose the Confluence Postmortem Page BodyStep 5: Secret Management — Integrating External Secrets OperatorClosing ThoughtsReferences

Recommended Posts

Automating Canary Deployment Notifications to Deliver Argo Rollouts AnalysisRun Failures Instantly via Slack and PagerDuty
DevOps

Automating Canary Deployment Notifications to Deliver Argo Rollouts AnalysisRun Failures Instantly via Slack and PagerDuty

That morning, right after arriving at work, a colleague was typing 'rollback' into the Slack search bar. A canary deployment had silently failed overnight — the...

May 26, 202623 min read
Argo Rollouts AnalysisTemplate — Implementing Automated Canary Rollbacks with Prometheus, Datadog, and Webhook
DevOps

Argo Rollouts AnalysisTemplate — Implementing Automated Canary Rollbacks with Prometheus, Datadog, and Webhook

When I first introduced canary deployments, staring intently at Grafana dashboards was part of the deployment process for quite some time. The workflow involved...

May 26, 202623 min read
Argo Rollouts BlueGreen Deployment Strategy — How It Differs from Canary, and When to Choose It
DevOps

Argo Rollouts BlueGreen Deployment Strategy — How It Differs from Canary, and When to Choose It

Whenever I think through deployment strategies, I always pause for a moment at "should I go with canary or BlueGreen?" At first, I vaguely assumed canary was sa...

May 26, 202619 min read
Automating Kubernetes Canary Deployments with a Single PR Merge: An ArgoCD + Argo Rollouts Pipeline
DevOps

Automating Kubernetes Canary Deployments with a Single PR Merge: An ArgoCD + Argo Rollouts Pipeline

Honestly, when I first introduced canary deployments, I was running deployment scripts by hand. I'd type in the terminal, post "canary is now at 5%" in Slack, ...

May 26, 202623 min read
Argo Rollouts Automated Rollback Pipeline | Datadog · CloudWatch Multi-Provider AnalysisTemplate Progressive Threshold Hardening Strategy
DevOps

Argo Rollouts Automated Rollback Pipeline | Datadog · CloudWatch Multi-Provider AnalysisTemplate Progressive Threshold Hardening Strategy

There was a time when I'd wait in Slack during every deployment and manually type rollback commands whenever error rates spiked. I thought introducing canary de...

May 26, 202620 min read
Canary Deployments Across 500 Kubernetes Clusters Using Rancher Fleet and Argo Rollouts — Progressive Delivery That Limits Blast Radius by Partition
DevOps

Canary Deployments Across 500 Kubernetes Clusters Using Rancher Fleet and Argo Rollouts — Progressive Delivery That Limits Blast Radius by Partition

Honestly, even I felt pretty overwhelmed the first time I had to manage dozens of clusters simultaneously. Running a single canary deployment on one cluster isn...

May 26, 202624 min read