ArgoCD + Kubernetes GitOps Zero-Downtime Deployment in Practice — From Rolling Updates to Canary, How to Safely Change Production with a Single Git Commit

Have you ever been afraid of deployments? When I first started operating Kubernetes, I'd post "Deploying 🙏" in Slack every deployment day, stare holes through the monitoring dashboard, and break into a cold sweat. But after adopting ArgoCD, deployments became just another "ordinary Git PR merge." Seeing production changes reflected within 5 minutes of merging a PR became routine, and after experiencing automatic rollbacks in the middle of the night without a single alert, late-night overtime naturally faded away.

This post covers everything from the concept of GitOps to Rolling Update, Blue-Green, and Canary deployment strategies using ArgoCD, complete with real-world code. Rather than a simple "here's what exists" overview, I'll share configurations you can apply immediately in production, along with honest accounts of the traps we commonly fell into. By the end of this post, you'll have a concrete picture of how to design a zero-downtime deployment pipeline with ArgoCD on your own.

Here's the flow: we'll first clarify how GitOps differs from traditional CI/CD, then understand ArgoCD's core components. Next, we'll walk through three deployment strategies — Rolling Update, Blue-Green, and Canary — with actual YAML code, sharing the pitfalls our team genuinely encountered along the way. We'll wrap up with a step-by-step guide you can start using right now.

What Is GitOps — "Git Is the Truth"

GitOps is an operational paradigm where the entire state of infrastructure and application deployments is declaratively recorded in a Git repository and continuously synchronized with the actual system. If traditional CI/CD pipelines work by "executing commands to change state," GitOps works by "declaring the desired state and letting the system figure out how to match it."

The core principle of GitOps: The Git repository is the Single Source of Truth. The moment you run kubectl apply directly against the cluster, this principle is broken.

ArgoCD is the tool that implements this principle on top of Kubernetes. As a CNCF graduated project (the official maturity certification in the cloud-native ecosystem), it is one of the most widely adopted CD tools in the Kubernetes ecosystem. In April 2025, ArgoCD v3.0 was released — the first major version since 2021 — bringing a host of enterprise-grade features including per-resource RBAC and API server load reduction.

ArgoCD Architecture — Understanding the Core Components

When you first encounter ArgoCD, the terminology can feel unfamiliar, but in practice, grasping a few key concepts is enough to see the whole picture.

Component	Role
Application CRD	The core resource that defines "which path of which Git repo" to deploy to "which namespace of which cluster"
App of Apps Pattern	A parent Application that manages other Applications — suitable for managing a small number of services hierarchically
ApplicationSet	Automates multi-cluster and multi-environment deployments — handles dozens of environments from a single template
Sync Waves & Hooks	Controls deployment order — ensures things like DB migrations run first, app deployment second

App of Apps vs ApplicationSet: The two patterns look similar, but there are criteria for choosing. App of Apps is suitable when the number of services is small and the structure is fixed. You manually write Application YAML files to build a hierarchical structure, which is intuitive. ApplicationSet, on the other hand, shines as environments (dev/staging/prod) or clusters grow. Adding a new cluster only requires adding one line of YAML. Our team started with App of Apps, then migrated to ApplicationSet once we exceeded three clusters.

The most fundamental Application resource looks like this:

yaml

# k8s/argocd/apps/my-app.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/my-org/my-app-config
    targetRevision: HEAD
    path: k8s/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true       # Resources deleted from Git are also removed from the cluster
      selfHeal: true    # If cluster state diverges from Git, automatically restore

What selfHeal: true means: Even if someone accidentally edits directly with kubectl edit, ArgoCD detects it and automatically restores the Git state. This is the Self-Healing characteristic of GitOps. You'll occasionally find people confused by "why does my change keep reverting?" — this setting is why.

Deployment Strategy Overview — Rolling, Blue-Green, Canary

Honestly, at first the differences between the three strategies were confusing. Summarized in a single line each:

Strategy	Core Idea	When to Use
Rolling Update	Replace pods one by one sequentially	Most situations, when resources are limited
Blue-Green	Prepare a new environment and switch traffic all at once	When immediate rollback is needed or DB schema changes are involved
Canary	Send only a portion of traffic to the new version first	When the risk is high, when you want to validate with real traffic

Rolling Update is built into Kubernetes, but Blue-Green and Canary require a separate tool called Argo Rollouts. Many people have installed only ArgoCD and wondered "why isn't my canary deployment working?" without knowing this.

Practical Application

Rolling Update in Practice — With a Production Checklist

Rolling Update looks simple, but missing a single setting can cause traffic interruptions mid-deployment. I remember deploying without a readinessProbe the first time and getting flooded with 500 errors the moment new pods came up. Here's the Deployment configuration for a safe Rolling Update:

yaml

# k8s/base/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  annotations:
    argocd.argoproj.io/sync-wave: "2"  # Runs after DB migration (wave 1)
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0    # Always maintain 3 pods during deployment
      maxSurge: 1          # Allow up to 4 pods running simultaneously
  template:
    spec:
      terminationGracePeriodSeconds: 60  # Wait for in-flight requests to complete
      containers:
      - name: my-app
        image: my-app:v2.0
        resources:
          requests:
            cpu: "250m"
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"
        readinessProbe:           # Without this, traffic flows into unhealthy pods
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
---
# k8s/base/pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 2           # With replicas: 3, only 1 pod can be terminated at a time
  selector:
    matchLabels:
      app: my-app

To clarify the relationship between minAvailable: 2 and replicas: 3: it means "at least 2 of the 3 pods must always be alive." Ultimately only 1 can be terminated at a time, ensuring service availability during node drains or rolling updates.

Setting	Problem If Missing
`readinessProbe`	Traffic flows in before the app is ready → errors
`terminationGracePeriodSeconds`	Pod forcibly terminated while handling requests → connection drops
`resources.requests`	Scheduler fails to place pods due to insufficient node resources
`PodDisruptionBudget`	All pods can be terminated simultaneously during node drain

For cases where order matters, such as DB migrations, you can use Sync Waves:

yaml

# k8s/jobs/db-migration.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
  annotations:
    argocd.argoproj.io/sync-wave: "1"   # Runs before Deployment (wave 2)
    argocd.argoproj.io/hook: PreSync
spec:
  template:
    spec:
      restartPolicy: Never              # Must be specified explicitly for Jobs
      containers:
      - name: migrate
        image: my-app:v2.0
        command: ["python", "manage.py", "migrate"]

Blue-Green with Argo Rollouts — When You Need Immediate Rollback

Blue-Green is an approach where you run the "current blue environment" and the "new green environment" simultaneously, then switch traffic all at once after verification. The biggest advantage is the ability to roll back instantly — our team chose this strategy for a deployment involving DB schema changes. Resources are temporarily doubled, but since the old version automatically scales down after scaleDownDelaySeconds, it's not a permanent doubling.

First, you need two Services to receive traffic:

yaml

# k8s/services/my-app-services.yaml
apiVersion: v1
kind: Service
metadata:
  name: my-app-active      # Real user traffic goes here
spec:
  selector:
    app: my-app
  ports:
  - port: 80
    targetPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: my-app-preview     # For validating the new version
spec:
  selector:
    app: my-app
  ports:
  - port: 80
    targetPort: 8080

Then define the Rollout resource:

yaml

# k8s/rollouts/my-app-rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  strategy:
    blueGreen:
      activeService: my-app-active       # Service currently receiving traffic
      previewService: my-app-preview     # Preview service for the new version
      autoPromotionEnabled: false        # Switch only after manual approval
      scaleDownDelaySeconds: 300         # Keep old version for 5 minutes after switch (for rollback)
  template:
    spec:
      containers:
      - name: my-app
        image: my-app:v2.0
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5

After deploying the new version, you can validate it against the my-app-preview service, then switch with the following command when ready. The Argo Rollouts CLI can be installed using the kubectl plugin method described in the official documentation:

bash

# Manual promotion with Argo Rollouts CLI
kubectl argo rollouts promote my-app -n production
 
# When rollback is needed
kubectl argo rollouts abort my-app -n production

Our team ran with autoPromotionEnabled: false for the first six months. We had the QA team validate directly against the preview service before switching traffic, and that habit helped us preemptively prevent several potential incidents.

Canary + Prometheus Automated Analysis — Validating with Real Traffic

Canary is an approach of "sending a little ahead first for real-user validation." By integrating Argo Rollouts' AnalysisTemplate with Prometheus, you can configure automatic rollback when the error rate exceeds a threshold.

Canary also requires two Services:

yaml

# k8s/services/my-app-canary-services.yaml
apiVersion: v1
kind: Service
metadata:
  name: my-app-stable      # Traffic for the existing stable version
spec:
  selector:
    app: my-app
  ports:
  - port: 80
    targetPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: my-app-canary      # Canary traffic for the new version
spec:
  selector:
    app: my-app
  ports:
  - port: 80
    targetPort: 8080

yaml

# k8s/analysis/success-rate.yaml
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  metrics:
  - name: success-rate
    interval: 2m
    # result[0]: the first scalar value from the Prometheus query result (success rate 0.0~1.0)
    successCondition: result[0] >= 0.95   # Must pass with 95%+ success rate
    # failureLimit: 3 → rollback after 3 cumulative failures (total, not consecutive)
    failureLimit: 3
    provider:
      prometheus:
        address: http://prometheus.monitoring:9090
        query: |
          sum(rate(http_requests_total{app="my-app",status!~"5.."}[5m]))
          /
          sum(rate(http_requests_total{app="my-app"}[5m]))

yaml

# k8s/rollouts/my-app-canary-rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: my-app
spec:
  strategy:
    canary:
      canaryService: my-app-canary
      stableService: my-app-stable
      steps:
      - setWeight: 10          # 10% traffic → new version
      - pause: {duration: 5m}  # Observe for 5 minutes
      - setWeight: 30          # Expand to 30%
      - analysis:
          templates:
          - templateName: success-rate   # Automatic error rate check
      - setWeight: 60
      - pause: {duration: 10m}
      - setWeight: 100         # Full cutover if no issues

The role of AnalysisTemplate: During a canary deployment, it runs Prometheus queries to automatically evaluate metrics like error rate and latency. If thresholds are exceeded, it rolls back automatically without human intervention. I actually experienced an automatic rollback during a nighttime deployment with no alerts at all — that's when I truly felt the value of this feature. I only found out what had happened overnight when I came in the next morning and reviewed the rollback history.

Pros and Cons

Advantages

Item	Details
Self-Healing	Automatically detects and restores when cluster state diverges from Git — even manual edits are reverted
Audit Trail	Every deployment is recorded as a Git commit — perfect tracking of "who deployed what and when"
Instant Rollback	A single revert to a previous commit automatically restores the cluster to its previous state
Visual UI	Resource topology, sync status, and health status all visible at a glance
RBAC + SSO	Fine-grained access control and integration with in-house SSO (SAML, OIDC)

Disadvantages and Caveats

Item	Details	Mitigation
Initial Adoption Cost	Installation, Git repo structure design, and team training typically takes over a week	Start with a single service and adopt incrementally
Kubernetes Only	Does not support VM or serverless environments	Run separate CD tools in parallel for non-K8s environments
Advanced Strategies Not Built-in	Blue-Green and Canary require separate Argo Rollouts installation	Set up Argo Rollouts as part of the bundle
Secret Management	Cannot store secrets directly in Git	Adopt External Secrets Operator or Sealed Secrets
No Multi-Environment Promotion	No built-in automatic promotion from dev → staging → prod	Supplement with the Kargo tool
UI Lag at Scale	Rendering slows down with thousands of apps or more	Use ApplicationSet for distributed management

External Secrets Operator (ESO): An operator that synchronizes external secret stores — such as AWS Secrets Manager, GCP Secret Manager, and HashiCorp Vault — with Kubernetes Secrets. Only the secret reference path is stored in Git; the actual values are injected at runtime from the external store.

The Most Common Mistakes in Practice

Not setting Readiness Probes — Traffic flows in the moment a container comes up, causing initial request failures. It's best to always configure this alongside initialDelaySeconds. In my very first deployment, this mistake caused errors to pour out for tens of seconds.
Overusing autoPromotionEnabled: true — Automatically switching without validation in Blue-Green defeats the purpose of Blue-Green. Our team ran manual approval only for the first six months, and only applied automation to specific services after the team process had stabilized.
Not accounting for sync order in App of Apps — When managing multiple Applications hierarchically without configuring Sync Waves, resources with dependencies fall into a Sync Failed state due to ordering conflicts. The classic case is a resource that uses a CRD being applied before the CRD itself is deployed. It's safest to explicitly manage order by applying argocd.argoproj.io/sync-wave annotations to Application CRDs as well.

Closing Thoughts

I still remember those days of typing "Deploying 🙏" and breaking into a cold sweat. After adopting ArgoCD, deployments became just another routine process, no different from a code review. ArgoCD is the tool that best embodies the GitOps principle of "treating deployments like code," and when configured alongside Argo Rollouts, you can complete a production-grade zero-downtime deployment pipeline covering everything from Rolling Updates to metric-based Canary deployments.

Three steps you can start right now:

Install ArgoCD and connect your first Application: Install ArgoCD with the commands below, then upload the Kubernetes manifests for one of your currently running services to a Git repository and create an Application CRD. Running kubectl port-forward svc/argocd-server -n argocd 8080:443 to see the UI and visually observe the sync status is a huge help for building understanding.
bash
```
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
```
Install Argo Rollouts and apply Canary strategy: Migrate your existing Deployment to a Rollout resource and use the Canary example code above to configure a 3-phase traffic shift of 10% → 50% → 100%. Starting without an AnalysisTemplate and using only manual pauses is perfectly fine at first.
bash
```
kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml
```
Establish a secret management system: Install External Secrets Operator and integrate it with AWS Secrets Manager or whichever secret store your team uses. Once this is complete, you'll have a truly "Git is all you need to look at" GitOps environment.

Next post: Multi-stage GitOps with Kargo — Designing an automatic promotion pipeline from dev to prod

References

ArgoCD + Kubernetes GitOps Zero-Downtime Deployment in Practice — From Rolling Updates to Canary, How to Safely Change Production with a Single Git Commit | DEV BAK - 기술블로그

DevOps

ArgoCD + Kubernetes GitOps Zero-Downtime Deployment in Practice — From Rolling Updates to Canary, How to Safely Change Production with a Single Git Commit

What Is GitOps — "Git Is the Truth"

The core principle of GitOps: The Git repository is the Single Source of Truth. The moment you run kubectl apply directly against the cluster, this principle is broken.

ArgoCD Architecture — Understanding the Core Components

When you first encounter ArgoCD, the terminology can feel unfamiliar, but in practice, grasping a few key concepts is enough to see the whole picture.

Component	Role
Application CRD	The core resource that defines "which path of which Git repo" to deploy to "which namespace of which cluster"
App of Apps Pattern	A parent Application that manages other Applications — suitable for managing a small number of services hierarchically
ApplicationSet	Automates multi-cluster and multi-environment deployments — handles dozens of environments from a single template
Sync Waves & Hooks	Controls deployment order — ensures things like DB migrations run first, app deployment second

The most fundamental Application resource looks like this:

yaml

# k8s/argocd/apps/my-app.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/my-org/my-app-config
    targetRevision: HEAD
    path: k8s/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true       # Resources deleted from Git are also removed from the cluster
      selfHeal: true    # If cluster state diverges from Git, automatically restore

What selfHeal: true means: Even if someone accidentally edits directly with kubectl edit, ArgoCD detects it and automatically restores the Git state. This is the Self-Healing characteristic of GitOps. You'll occasionally find people confused by "why does my change keep reverting?" — this setting is why.

Deployment Strategy Overview — Rolling, Blue-Green, Canary

Honestly, at first the differences between the three strategies were confusing. Summarized in a single line each:

Strategy	Core Idea	When to Use
Rolling Update	Replace pods one by one sequentially	Most situations, when resources are limited
Blue-Green	Prepare a new environment and switch traffic all at once	When immediate rollback is needed or DB schema changes are involved
Canary	Send only a portion of traffic to the new version first	When the risk is high, when you want to validate with real traffic

Practical Application

Rolling Update in Practice — With a Production Checklist

yaml

# k8s/base/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  annotations:
    argocd.argoproj.io/sync-wave: "2"  # Runs after DB migration (wave 1)
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0    # Always maintain 3 pods during deployment
      maxSurge: 1          # Allow up to 4 pods running simultaneously
  template:
    spec:
      terminationGracePeriodSeconds: 60  # Wait for in-flight requests to complete
      containers:
      - name: my-app
        image: my-app:v2.0
        resources:
          requests:
            cpu: "250m"
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"
        readinessProbe:           # Without this, traffic flows into unhealthy pods
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
---
# k8s/base/pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 2           # With replicas: 3, only 1 pod can be terminated at a time
  selector:
    matchLabels:
      app: my-app

Setting	Problem If Missing
`readinessProbe`	Traffic flows in before the app is ready → errors
`terminationGracePeriodSeconds`	Pod forcibly terminated while handling requests → connection drops
`resources.requests`	Scheduler fails to place pods due to insufficient node resources
`PodDisruptionBudget`	All pods can be terminated simultaneously during node drain

For cases where order matters, such as DB migrations, you can use Sync Waves:

yaml

# k8s/jobs/db-migration.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
  annotations:
    argocd.argoproj.io/sync-wave: "1"   # Runs before Deployment (wave 2)
    argocd.argoproj.io/hook: PreSync
spec:
  template:
    spec:
      restartPolicy: Never              # Must be specified explicitly for Jobs
      containers:
      - name: migrate
        image: my-app:v2.0
        command: ["python", "manage.py", "migrate"]

Blue-Green with Argo Rollouts — When You Need Immediate Rollback

First, you need two Services to receive traffic:

yaml

# k8s/services/my-app-services.yaml
apiVersion: v1
kind: Service
metadata:
  name: my-app-active      # Real user traffic goes here
spec:
  selector:
    app: my-app
  ports:
  - port: 80
    targetPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: my-app-preview     # For validating the new version
spec:
  selector:
    app: my-app
  ports:
  - port: 80
    targetPort: 8080

Then define the Rollout resource:

yaml

# k8s/rollouts/my-app-rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  strategy:
    blueGreen:
      activeService: my-app-active       # Service currently receiving traffic
      previewService: my-app-preview     # Preview service for the new version
      autoPromotionEnabled: false        # Switch only after manual approval
      scaleDownDelaySeconds: 300         # Keep old version for 5 minutes after switch (for rollback)
  template:
    spec:
      containers:
      - name: my-app
        image: my-app:v2.0
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5

bash

# Manual promotion with Argo Rollouts CLI
kubectl argo rollouts promote my-app -n production
 
# When rollback is needed
kubectl argo rollouts abort my-app -n production

Canary + Prometheus Automated Analysis — Validating with Real Traffic

Canary also requires two Services:

yaml

# k8s/services/my-app-canary-services.yaml
apiVersion: v1
kind: Service
metadata:
  name: my-app-stable      # Traffic for the existing stable version
spec:
  selector:
    app: my-app
  ports:
  - port: 80
    targetPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: my-app-canary      # Canary traffic for the new version
spec:
  selector:
    app: my-app
  ports:
  - port: 80
    targetPort: 8080

yaml

# k8s/analysis/success-rate.yaml
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  metrics:
  - name: success-rate
    interval: 2m
    # result[0]: the first scalar value from the Prometheus query result (success rate 0.0~1.0)
    successCondition: result[0] >= 0.95   # Must pass with 95%+ success rate
    # failureLimit: 3 → rollback after 3 cumulative failures (total, not consecutive)
    failureLimit: 3
    provider:
      prometheus:
        address: http://prometheus.monitoring:9090
        query: |
          sum(rate(http_requests_total{app="my-app",status!~"5.."}[5m]))
          /
          sum(rate(http_requests_total{app="my-app"}[5m]))

yaml

# k8s/rollouts/my-app-canary-rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: my-app
spec:
  strategy:
    canary:
      canaryService: my-app-canary
      stableService: my-app-stable
      steps:
      - setWeight: 10          # 10% traffic → new version
      - pause: {duration: 5m}  # Observe for 5 minutes
      - setWeight: 30          # Expand to 30%
      - analysis:
          templates:
          - templateName: success-rate   # Automatic error rate check
      - setWeight: 60
      - pause: {duration: 10m}
      - setWeight: 100         # Full cutover if no issues

The role of AnalysisTemplate: During a canary deployment, it runs Prometheus queries to automatically evaluate metrics like error rate and latency. If thresholds are exceeded, it rolls back automatically without human intervention. I actually experienced an automatic rollback during a nighttime deployment with no alerts at all — that's when I truly felt the value of this feature. I only found out what had happened overnight when I came in the next morning and reviewed the rollback history.

Pros and Cons

Advantages

Item	Details
Self-Healing	Automatically detects and restores when cluster state diverges from Git — even manual edits are reverted
Audit Trail	Every deployment is recorded as a Git commit — perfect tracking of "who deployed what and when"
Instant Rollback	A single revert to a previous commit automatically restores the cluster to its previous state
Visual UI	Resource topology, sync status, and health status all visible at a glance
RBAC + SSO	Fine-grained access control and integration with in-house SSO (SAML, OIDC)

Disadvantages and Caveats

Item	Details	Mitigation
Initial Adoption Cost	Installation, Git repo structure design, and team training typically takes over a week	Start with a single service and adopt incrementally
Kubernetes Only	Does not support VM or serverless environments	Run separate CD tools in parallel for non-K8s environments
Advanced Strategies Not Built-in	Blue-Green and Canary require separate Argo Rollouts installation	Set up Argo Rollouts as part of the bundle
Secret Management	Cannot store secrets directly in Git	Adopt External Secrets Operator or Sealed Secrets
No Multi-Environment Promotion	No built-in automatic promotion from dev → staging → prod	Supplement with the Kargo tool
UI Lag at Scale	Rendering slows down with thousands of apps or more	Use ApplicationSet for distributed management

External Secrets Operator (ESO): An operator that synchronizes external secret stores — such as AWS Secrets Manager, GCP Secret Manager, and HashiCorp Vault — with Kubernetes Secrets. Only the secret reference path is stored in Git; the actual values are injected at runtime from the external store.

The Most Common Mistakes in Practice

Not setting Readiness Probes — Traffic flows in the moment a container comes up, causing initial request failures. It's best to always configure this alongside initialDelaySeconds. In my very first deployment, this mistake caused errors to pour out for tens of seconds.
Overusing autoPromotionEnabled: true — Automatically switching without validation in Blue-Green defeats the purpose of Blue-Green. Our team ran manual approval only for the first six months, and only applied automation to specific services after the team process had stabilized.
Not accounting for sync order in App of Apps — When managing multiple Applications hierarchically without configuring Sync Waves, resources with dependencies fall into a Sync Failed state due to ordering conflicts. The classic case is a resource that uses a CRD being applied before the CRD itself is deployed. It's safest to explicitly manage order by applying argocd.argoproj.io/sync-wave annotations to Application CRDs as well.

Closing Thoughts

Three steps you can start right now:

Install ArgoCD and connect your first Application: Install ArgoCD with the commands below, then upload the Kubernetes manifests for one of your currently running services to a Git repository and create an Application CRD. Running kubectl port-forward svc/argocd-server -n argocd 8080:443 to see the UI and visually observe the sync status is a huge help for building understanding.
bash
```
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
```
Install Argo Rollouts and apply Canary strategy: Migrate your existing Deployment to a Rollout resource and use the Canary example code above to configure a 3-phase traffic shift of 10% → 50% → 100%. Starting without an AnalysisTemplate and using only manual pauses is perfectly fine at first.
bash
```
kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml
```
Establish a secret management system: Install External Secrets Operator and integrate it with AWS Secrets Manager or whichever secret store your team uses. Once this is complete, you'll have a truly "Git is all you need to look at" GitOps environment.

Next post: Multi-stage GitOps with Kargo — Designing an automatic promotion pipeline from dev to prod

What Is GitOps — "Git Is the Truth"

ArgoCD Architecture — Understanding the Core Components

Deployment Strategy Overview — Rolling, Blue-Green, Canary

Practical Application

Rolling Update in Practice — With a Production Checklist

Blue-Green with Argo Rollouts — When You Need Immediate Rollback

Canary + Prometheus Automated Analysis — Validating with Real Traffic

Pros and Cons

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Closing Thoughts

References

What Is GitOps — "Git Is the Truth"

ArgoCD Architecture — Understanding the Core Components

Deployment Strategy Overview — Rolling, Blue-Green, Canary

Practical Application

Rolling Update in Practice — With a Production Checklist

Blue-Green with Argo Rollouts — When You Need Immediate Rollback

Canary + Prometheus Automated Analysis — Validating with Real Traffic

Pros and Cons

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Closing Thoughts

References

Recommended Posts

"Why Does It Slow Down After Deployment?" — Connecting Node.js Production FlameGraphs to Traces with Grafana Pyroscope and OTLP Profiles

Replacing Ingress with Kubernetes Gateway API — A Practical Guide to Declaratively Controlling Traffic Routing with HTTPRoute, GRPCRoute, and ReferenceGrant

Standardizing Feature Flags with OpenFeature + Flagd — An architecture for managing gradual rollouts and A/B testing as code on Kubernetes, without vendor lock-in

How to Detect Argo Rollouts Rollbacks with Argo Events and Automatically Create Jira Incidents and Confluence Postmortems

Automating Canary Deployment Notifications to Deliver Argo Rollouts AnalysisRun Failures Instantly via Slack and PagerDuty

Argo Rollouts AnalysisTemplate — Implementing Automated Canary Rollbacks with Prometheus, Datadog, and Webhook