Argo Rollouts BlueGreen Deployment Strategy — How It Differs from Canary, and When to Choose It

Whenever I think through deployment strategies, I always pause for a moment at "should I go with canary or BlueGreen?" At first, I vaguely assumed canary was safer — but then I tried pushing a DB schema change through a canary release and ended up in a pretty rough situation. Old Pods started throwing errors as they tried to read the new schema, and rolling back took 40 minutes. During that time, the service error rate spiked to 12%. After that day, I deeply understood: "This should have been BlueGreen."

This post aims to configure the BlueGreen strategy hands-on with Argo Rollouts while understanding the differences from canary in a practical context. Rather than just copying YAML, we'll examine why BlueGreen's instant cutover makes a decisive difference in certain situations, and what criteria to use when choosing between the two strategies.

Before reading this post: This is most useful for those already familiar with Kubernetes Services, Deployments, and ReplicaSets. It will be especially helpful if you're facing a release that includes a DB migration or an API breaking change.

Core Concepts

What "Instant Cutover" in BlueGreen Actually Means

The concept behind BlueGreen deployment is simple. You run the current production environment (Blue) and the new version environment (Green) simultaneously, then switch all traffic to Green at once when it's ready. The key is that this switch does not happen partially.

The way Argo Rollouts implements this in Kubernetes is quite elegant. The cutover happens with a single API call that updates the selector of the activeService Kubernetes Service to the new ReplicaSet hash. This API call to etcd is atomic, so there is no state where the switch is "half applied and half not." However, there can be a propagation delay of hundreds of milliseconds — or even a few seconds on large clusters — for each node's kube-proxy to actually update its iptables/ipvs rules. In practice, this rarely causes issues, but it's more accurate to think of the switch as "effectively instantaneous" rather than "perfectly simultaneous."

Atomic-like Switch: A cutover method where traffic is not simultaneously distributed between the old and new versions — meaning there is no window where both versions are concurrently serving production traffic. This is BlueGreen's defining characteristic.

Comparing with canary makes this difference even clearer.

	BlueGreen	Canary
Traffic cutover method	Instantaneous (effectively atomic) switch	Gradual percentage shift
Period of concurrent production traffic	None	Coexists throughout the entire rollout
Rollback method	Re-point the service pointer	Scale weight back to 0%
Infrastructure cost	Requires 2x resources	Minimizes additional resources
Suitable for	Breaking changes, large-scale releases	Gradual feature validation, high-frequency deployments

The Lifecycle of an Argo Rollouts BlueGreen

Once a Rollout begins, it proceeds internally in the following order:

Green ReplicaSet is created → The previewService is switched to point to Green. At this point, production traffic is still handled by Blue.
prePromotionAnalysis runs (optional) → Automatically validates the state of Green based on Prometheus or Datadog metrics.
Promotion → The activeService switches to Green. This is the moment of the instantaneous traffic cutover.
postPromotionAnalysis runs (optional) → Performs smoke tests or additional validation after the cutover.
Blue ReplicaSet is removed → The old version is cleaned up after scaleDownDelaySeconds.

Practical Application

Example 1: Basic BlueGreen Rollout Configuration

To use BlueGreen with Argo Rollouts, you first need two Kubernetes Services: an activeService and a previewService. At first glance, the two services look nearly identical except for their selectors, which might make you wonder "why bother?" — but in fact, both services initially have the same app: my-app selector. Argo Rollouts works by injecting an additional ReplicaSet hash label into each service as the deployment progresses, which is what distinguishes Blue from Green.

yaml

# active-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: my-app-active
spec:
  selector:
    app: my-app
  ports:
    - port: 80
      targetPort: 8080
---
# preview-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: my-app-preview
spec:
  selector:
    app: my-app
  ports:
    - port: 80
      targetPort: 8080

yaml

# rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: app
        image: my-app:v2
  strategy:
    blueGreen:
      activeService: my-app-active
      previewService: my-app-preview
      autoPromotionEnabled: false       # manual promotion
      previewReplicaCount: 1            # cost savings: Green runs at 1 replica for validation, then scales up on promotion
      scaleDownDelaySeconds: 300        # in production, be more generous than 30 seconds
      prePromotionAnalysis:
        templates:
        - templateName: success-rate
        args:
        - name: service-name
          value: my-app-preview
      postPromotionAnalysis:
        templates:
        - templateName: smoke-test

Field	Role
`activeService`	Service that receives production traffic (required)
`previewService`	Service for accessing Green (new version) — optional, used for QA validation
`autoPromotionEnabled: false`	Manual promotion — approve explicitly after validation
`previewReplicaCount: 1`	Keeps Green at minimum replicas to cut costs to less than half
`scaleDownDelaySeconds: 300`	Keeps Blue running for 5 minutes after promotion (provides rollback window)
`prePromotionAnalysis`	Automatic metrics gate before promotion

To manually promote after deployment, use the following command:

bash

kubectl argo rollouts promote my-app

If something goes wrong, you can immediately revert to the previous state:

bash

kubectl argo rollouts undo my-app

Example 2: Deploying a DB Breaking Change — A Situation That Requires BlueGreen, Not Canary

Now that you understand the basic configuration, it's time to look at the scenario where BlueGreen truly shines. The most typical real-world case is when you need to change a DB schema without backward compatibility. With canary, you get a window where both the old version (expecting the old schema) and the new version (using the new schema) are looking at the same DB simultaneously. I experienced data collision errors in this exact situation, and since then I always use BlueGreen for releases like this.

With BlueGreen, only the old version handles production traffic until the cutover, so you can safely design a flow where you manually promote after confirming the DB migration is complete.

yaml

# rollout-db-migration.yaml (strategy section)
strategy:
  blueGreen:
    activeService: api-active
    previewService: api-preview
    autoPromotionEnabled: false   # manual promotion after confirming migration
    scaleDownDelaySeconds: 600    # keep Blue for 10 minutes to allow rollback window

The actual deployment flow looks like this. Using kubectl argo rollouts set image is the officially recommended way to update the Rollout image:

bash

# 1. Deploy new image (Green environment is created, traffic still goes to Blue)
kubectl argo rollouts set image api app=api:v2
 
# 2. Verify Green status via preview service
curl http://api-preview/health
 
# 3. After confirming DB migration is complete, manually promote
kubectl argo rollouts promote api
 
# 4. If issues arise, immediately roll back
kubectl argo rollouts undo api

Expand-Contract Pattern: To safely handle breaking schema changes, two stages are required. First, deploy an intermediate version that supports both the old and new columns (Expand), then deploy a second time to remove the old columns (Contract). Specifically: ① Deploy a version that adds the new column while keeping the old column → ② After confirming the code only uses the new column, deploy a version that removes the old column. Because BlueGreen clearly separates each stage, it is the deployment strategy that best fits the Contract phase of this pattern.

Example 3: Prometheus-Based Automatic Promotion Gate

Once you're comfortable with manual promotion, you can attach an AnalysisTemplate that automatically determines whether to promote based on metrics. In this example, success rate is measured 5 times at 30-second intervals, for a total of 2.5 minutes. These numbers are chosen as "short enough to give fast feedback while long enough to average out transient spikes — the minimum measurement window." When using this for the first time, it's safer to start with a lenient successCondition and then observe actual metric patterns before tightening the threshold.

yaml

# analysis-template.yaml
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  args:
  - name: service-name
  metrics:
  - name: success-rate
    interval: 30s
    count: 5
    successCondition: result[0] >= 0.95
    provider:
      prometheus:
        address: http://prometheus:9090
        query: |
          sum(rate(http_requests_total{
            service="{{ args.service-name }}",
            status!~"5.."
          }[2m]))
          /
          sum(rate(http_requests_total{
            service="{{ args.service-name }}"
          }[2m]))

When this template is connected to the Rollout's prePromotionAnalysis, promotion proceeds automatically only when Green's error rate is below 5%. If the condition is not met, the Rollout is automatically aborted and Blue continues to handle production traffic.

Pros and Cons Analysis

Advantages

Item	Details
Instant cutover	No intermediate window where both versions serve production traffic simultaneously — eliminates edge cases from version coexistence at the source
Instant rollback	Rollback completes within seconds by simply re-pointing the service pointer
Isolated validation environment	`previewService` allows fully testing the new version without any production traffic
Clear operational state	Always either Blue or Green — minimizes operational complexity

Disadvantages and Caveats

Item	Details	Mitigation
2x infrastructure cost	Two sets of ReplicaSets run simultaneously until promotion	Use `previewReplicaCount` to keep Green at minimum replicas, then scale up at promotion time
No real-traffic validation	The new version doesn't receive real user load before cutover, limiting predictions of production behavior	Run load tests separately against the preview environment, or validate some features with canary first
Session continuity issues	Sticky sessions may be broken after cutover	Establish a session reissuance strategy before cutover, or recommend stateless design
scaleDownDelay misconfiguration	If too short, Blue may already be deleted by the time you attempt a rollback	In production, recommend 300 seconds or more instead of the default 30 seconds

previewReplicaCount: You can specify the replica count for the Green environment separately via spec.strategy.blueGreen.previewReplicaCount. If cost is a concern, run Green at 1 replica for validation, then let it scale up to the full replica count at promotion time — this can cut costs to less than half.

The Most Common Mistakes in Practice

Leaving scaleDownDelaySeconds at the default value (30 seconds) — I actually made this mistake in production once. I discovered a problem right after promotion and tried to roll back, but Blue had already been deleted after just 30 seconds. That was a very long night. In production, I recommend keeping it at a minimum of 5 minutes (300 seconds).
Setting autoPromotionEnabled: false but forgetting to include the promotion step in the CI/CD pipeline — This results in Green being created but traffic never switching over, leaving two sets of ReplicaSets running indefinitely. It's recommended to explicitly include the kubectl argo rollouts promote call in your pipeline.
Skipping the Expand-Contract pattern when using BlueGreen with DB migrations — Even though BlueGreen guarantees an instantaneous cutover, you still need to separately design for whether a rollback to the previous version will conflict with the new schema. Without schema design that includes the rollback path, the fast rollback guarantee that BlueGreen provides becomes meaningless.

Closing Thoughts

BlueGreen is the strategy you choose when there's a constraint that two versions must never serve production traffic simultaneously, while canary is the strategy you choose when you want to validate gradually with real users.

The two strategies are not competitors — they are tools you select based on the situation. In practice, more and more teams are combining BlueGreen for stability-critical services and canary for services that require high-frequency deployments and feature validation.

Three steps you can take right now:

Install Argo Rollouts and configure the BlueGreen environment — Install Argo Rollouts on a local kind cluster (refer to the official documentation's installation guide), then apply the active-service.yaml and rollout.yaml above to see the basic BlueGreen flow in action.
Monitor Rollout status with the kubectl plugin — Use kubectl argo rollouts get rollout my-app --watch to watch in real time how the stages of Green creation → promotion → Blue removal progress. Running the promote, undo, and abort commands yourself is a natural way to get comfortable with the rollback flow.
Connect an AnalysisTemplate — If you already have Prometheus, you can apply the success-rate template from Example 3 as-is to attach an automatic promotion gate. Start with a lenient successCondition and observe actual metric values before adjusting the threshold — this is the safer approach.

References

BlueGreen Deployment Strategy — Argo Rollouts official documentation — Reference for all BlueGreen configuration values
Blue/green deployment strategy with Argo Rollouts — Red Hat Developer — Practical explanation focused on real-world application examples
How to Automate Blue-Green & Canary Deployments with Argo Rollouts — Akuity — CI/CD pipeline automation patterns
Blue/green Versus Canary Deployments: 6 Differences And How To Choose — Octopus Deploy — A comprehensive guide comparing the differences between the two strategies
Chapter 1. Using Argo Rollouts for progressive deployment delivery — Red Hat OpenShift GitOps 1.11 — Reference for enterprise environment application
GitOps in 2025: From Old-School Updates to the Modern Way — CNCF — Latest GitOps trends and where Argo Rollouts fits
Blue-green vs canary deployments: safer API and DB changes — AppMaster — Strategy selection criteria for API and DB change scenarios
Progressive Delivery on Kubernetes: From Blue-Green to GitOps-Powered Rollouts — Medium — A walkthrough from BlueGreen to GitOps integration

#ArgoRollouts#BlueGreen배포#Kubernetes#카나리배포#Prometheus#GitOps#DB마이그레이션#배포전략#ProgressiveDelivery#Expand-Contract패턴

Argo Rollouts BlueGreen Deployment Strategy — How It Differs from Canary, and When to Choose It | DEV BAK - 기술블로그

DevOps

Argo Rollouts BlueGreen Deployment Strategy — How It Differs from Canary, and When to Choose It

Before reading this post: This is most useful for those already familiar with Kubernetes Services, Deployments, and ReplicaSets. It will be especially helpful if you're facing a release that includes a DB migration or an API breaking change.

Core Concepts

What "Instant Cutover" in BlueGreen Actually Means

Atomic-like Switch: A cutover method where traffic is not simultaneously distributed between the old and new versions — meaning there is no window where both versions are concurrently serving production traffic. This is BlueGreen's defining characteristic.

Comparing with canary makes this difference even clearer.

	BlueGreen	Canary
Traffic cutover method	Instantaneous (effectively atomic) switch	Gradual percentage shift
Period of concurrent production traffic	None	Coexists throughout the entire rollout
Rollback method	Re-point the service pointer	Scale weight back to 0%
Infrastructure cost	Requires 2x resources	Minimizes additional resources
Suitable for	Breaking changes, large-scale releases	Gradual feature validation, high-frequency deployments

The Lifecycle of an Argo Rollouts BlueGreen

Once a Rollout begins, it proceeds internally in the following order:

Green ReplicaSet is created → The previewService is switched to point to Green. At this point, production traffic is still handled by Blue.
prePromotionAnalysis runs (optional) → Automatically validates the state of Green based on Prometheus or Datadog metrics.
Promotion → The activeService switches to Green. This is the moment of the instantaneous traffic cutover.
postPromotionAnalysis runs (optional) → Performs smoke tests or additional validation after the cutover.
Blue ReplicaSet is removed → The old version is cleaned up after scaleDownDelaySeconds.

Practical Application

Example 1: Basic BlueGreen Rollout Configuration

yaml

# active-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: my-app-active
spec:
  selector:
    app: my-app
  ports:
    - port: 80
      targetPort: 8080
---
# preview-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: my-app-preview
spec:
  selector:
    app: my-app
  ports:
    - port: 80
      targetPort: 8080

yaml

# rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: app
        image: my-app:v2
  strategy:
    blueGreen:
      activeService: my-app-active
      previewService: my-app-preview
      autoPromotionEnabled: false       # manual promotion
      previewReplicaCount: 1            # cost savings: Green runs at 1 replica for validation, then scales up on promotion
      scaleDownDelaySeconds: 300        # in production, be more generous than 30 seconds
      prePromotionAnalysis:
        templates:
        - templateName: success-rate
        args:
        - name: service-name
          value: my-app-preview
      postPromotionAnalysis:
        templates:
        - templateName: smoke-test

Field	Role
`activeService`	Service that receives production traffic (required)
`previewService`	Service for accessing Green (new version) — optional, used for QA validation
`autoPromotionEnabled: false`	Manual promotion — approve explicitly after validation
`previewReplicaCount: 1`	Keeps Green at minimum replicas to cut costs to less than half
`scaleDownDelaySeconds: 300`	Keeps Blue running for 5 minutes after promotion (provides rollback window)
`prePromotionAnalysis`	Automatic metrics gate before promotion

To manually promote after deployment, use the following command:

bash

kubectl argo rollouts promote my-app

If something goes wrong, you can immediately revert to the previous state:

bash

kubectl argo rollouts undo my-app

Example 2: Deploying a DB Breaking Change — A Situation That Requires BlueGreen, Not Canary

With BlueGreen, only the old version handles production traffic until the cutover, so you can safely design a flow where you manually promote after confirming the DB migration is complete.

yaml

# rollout-db-migration.yaml (strategy section)
strategy:
  blueGreen:
    activeService: api-active
    previewService: api-preview
    autoPromotionEnabled: false   # manual promotion after confirming migration
    scaleDownDelaySeconds: 600    # keep Blue for 10 minutes to allow rollback window

The actual deployment flow looks like this. Using kubectl argo rollouts set image is the officially recommended way to update the Rollout image:

bash

# 1. Deploy new image (Green environment is created, traffic still goes to Blue)
kubectl argo rollouts set image api app=api:v2
 
# 2. Verify Green status via preview service
curl http://api-preview/health
 
# 3. After confirming DB migration is complete, manually promote
kubectl argo rollouts promote api
 
# 4. If issues arise, immediately roll back
kubectl argo rollouts undo api

Expand-Contract Pattern: To safely handle breaking schema changes, two stages are required. First, deploy an intermediate version that supports both the old and new columns (Expand), then deploy a second time to remove the old columns (Contract). Specifically: ① Deploy a version that adds the new column while keeping the old column → ② After confirming the code only uses the new column, deploy a version that removes the old column. Because BlueGreen clearly separates each stage, it is the deployment strategy that best fits the Contract phase of this pattern.

Example 3: Prometheus-Based Automatic Promotion Gate

yaml

# analysis-template.yaml
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  args:
  - name: service-name
  metrics:
  - name: success-rate
    interval: 30s
    count: 5
    successCondition: result[0] >= 0.95
    provider:
      prometheus:
        address: http://prometheus:9090
        query: |
          sum(rate(http_requests_total{
            service="{{ args.service-name }}",
            status!~"5.."
          }[2m]))
          /
          sum(rate(http_requests_total{
            service="{{ args.service-name }}"
          }[2m]))

Pros and Cons Analysis

Advantages

Item	Details
Instant cutover	No intermediate window where both versions serve production traffic simultaneously — eliminates edge cases from version coexistence at the source
Instant rollback	Rollback completes within seconds by simply re-pointing the service pointer
Isolated validation environment	`previewService` allows fully testing the new version without any production traffic
Clear operational state	Always either Blue or Green — minimizes operational complexity

Disadvantages and Caveats

Item	Details	Mitigation
2x infrastructure cost	Two sets of ReplicaSets run simultaneously until promotion	Use `previewReplicaCount` to keep Green at minimum replicas, then scale up at promotion time
No real-traffic validation	The new version doesn't receive real user load before cutover, limiting predictions of production behavior	Run load tests separately against the preview environment, or validate some features with canary first
Session continuity issues	Sticky sessions may be broken after cutover	Establish a session reissuance strategy before cutover, or recommend stateless design
scaleDownDelay misconfiguration	If too short, Blue may already be deleted by the time you attempt a rollback	In production, recommend 300 seconds or more instead of the default 30 seconds

previewReplicaCount: You can specify the replica count for the Green environment separately via spec.strategy.blueGreen.previewReplicaCount. If cost is a concern, run Green at 1 replica for validation, then let it scale up to the full replica count at promotion time — this can cut costs to less than half.

The Most Common Mistakes in Practice

Leaving scaleDownDelaySeconds at the default value (30 seconds) — I actually made this mistake in production once. I discovered a problem right after promotion and tried to roll back, but Blue had already been deleted after just 30 seconds. That was a very long night. In production, I recommend keeping it at a minimum of 5 minutes (300 seconds).
Setting autoPromotionEnabled: false but forgetting to include the promotion step in the CI/CD pipeline — This results in Green being created but traffic never switching over, leaving two sets of ReplicaSets running indefinitely. It's recommended to explicitly include the kubectl argo rollouts promote call in your pipeline.
Skipping the Expand-Contract pattern when using BlueGreen with DB migrations — Even though BlueGreen guarantees an instantaneous cutover, you still need to separately design for whether a rollback to the previous version will conflict with the new schema. Without schema design that includes the rollback path, the fast rollback guarantee that BlueGreen provides becomes meaningless.

Closing Thoughts

Three steps you can take right now:

Install Argo Rollouts and configure the BlueGreen environment — Install Argo Rollouts on a local kind cluster (refer to the official documentation's installation guide), then apply the active-service.yaml and rollout.yaml above to see the basic BlueGreen flow in action.
Monitor Rollout status with the kubectl plugin — Use kubectl argo rollouts get rollout my-app --watch to watch in real time how the stages of Green creation → promotion → Blue removal progress. Running the promote, undo, and abort commands yourself is a natural way to get comfortable with the rollback flow.
Connect an AnalysisTemplate — If you already have Prometheus, you can apply the success-rate template from Example 3 as-is to attach an automatic promotion gate. Start with a lenient successCondition and observe actual metric values before adjusting the threshold — this is the safer approach.

References

BlueGreen Deployment Strategy — Argo Rollouts official documentation — Reference for all BlueGreen configuration values
Blue/green deployment strategy with Argo Rollouts — Red Hat Developer — Practical explanation focused on real-world application examples
How to Automate Blue-Green & Canary Deployments with Argo Rollouts — Akuity — CI/CD pipeline automation patterns
Blue/green Versus Canary Deployments: 6 Differences And How To Choose — Octopus Deploy — A comprehensive guide comparing the differences between the two strategies
Chapter 1. Using Argo Rollouts for progressive deployment delivery — Red Hat OpenShift GitOps 1.11 — Reference for enterprise environment application
GitOps in 2025: From Old-School Updates to the Modern Way — CNCF — Latest GitOps trends and where Argo Rollouts fits
Blue-green vs canary deployments: safer API and DB changes — AppMaster — Strategy selection criteria for API and DB change scenarios
Progressive Delivery on Kubernetes: From Blue-Green to GitOps-Powered Rollouts — Medium — A walkthrough from BlueGreen to GitOps integration

#ArgoRollouts#BlueGreen배포#Kubernetes#카나리배포#Prometheus#GitOps#DB마이그레이션#배포전략#ProgressiveDelivery#Expand-Contract패턴

Core Concepts

What "Instant Cutover" in BlueGreen Actually Means

The Lifecycle of an Argo Rollouts BlueGreen

Practical Application

Example 1: Basic BlueGreen Rollout Configuration

Example 2: Deploying a DB Breaking Change — A Situation That Requires BlueGreen, Not Canary

Example 3: Prometheus-Based Automatic Promotion Gate

Pros and Cons Analysis

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Closing Thoughts

References

Core Concepts

What "Instant Cutover" in BlueGreen Actually Means

The Lifecycle of an Argo Rollouts BlueGreen

Practical Application

Example 1: Basic BlueGreen Rollout Configuration

Example 2: Deploying a DB Breaking Change — A Situation That Requires BlueGreen, Not Canary

Example 3: Prometheus-Based Automatic Promotion Gate

Pros and Cons Analysis

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Closing Thoughts

References

Recommended Posts

Argo Rollouts AnalysisTemplate — Implementing Automated Canary Rollbacks with Prometheus, Datadog, and Webhook

Automating Canary Deployment Notifications to Deliver Argo Rollouts AnalysisRun Failures Instantly via Slack and PagerDuty

How to Detect Argo Rollouts Rollbacks with Argo Events and Automatically Create Jira Incidents and Confluence Postmortems

Automating Kubernetes Canary Deployments with a Single PR Merge: An ArgoCD + Argo Rollouts Pipeline

Argo Rollouts Automated Rollback Pipeline | Datadog · CloudWatch Multi-Provider AnalysisTemplate Progressive Threshold Hardening Strategy

Canary Deployments Across 500 Kubernetes Clusters Using Rancher Fleet and Argo Rollouts — Progressive Delivery That Limits Blast Radius by Partition