Building a GitOps Pipeline to Automate Goldilocks VPA Recommendations with Argo CD Pull Request Generator

Anyone who has operated a Kubernetes cluster has likely encountered this situation: everything seems to be running fine, but the monthly cloud bill feels suspiciously high. CPU requests are set at 5x actual usage, memory at 3x, and that compounds across dozens of services. At first I thought, "Isn't it safer to over-provision?" — but it turns out this directly impacts the cost of Karpenter, the node autoprovisioner. Because Karpenter determines node size based on the sum of pod requests, inflated requests cause it to provision nodes far larger than necessary. After auditing dozens of services, over half showed more than 20% over-allocation.

So I introduced Goldilocks to start collecting VPA recommendations — and ran into another problem. The recommendations looked good, but having a human manually translate them into Helm values.yaml updates every time simply wasn't practical at scale. With dozens of services, that's not even semi-automated. What was really needed was an end-to-end pipeline: read recommendations, automatically open a Git PR, and have Argo CD apply the changes to the cluster after merge.

Following this guide will completely eliminate the manual values.yaml updates you've been doing every week. We'll walk through the full pipeline step by step — extracting Goldilocks VPA recommendations via a CronJob, auto-generating GitHub PRs, and setting up preview environments with the Argo CD ApplicationSet Pull Request Generator. This guide is aimed at teams already using Kubernetes, Helm, and Argo CD.

Core Concepts

What Goldilocks Does: Using VPA "Safely"

Enabling VPA naively causes pods to restart the moment a recommendation is generated — a fairly dangerous behavior in production. Goldilocks works around this with updateMode: "Off". It creates VPA objects but never actually applies them to pods, instead accumulating recommendations in .status.recommendation.

There's one prerequisite I missed initially and spent a while debugging ("why aren't recommendations showing up?"): the VPA Recommender requires Metrics Server (or Prometheus Adapter) to be installed. It's worth checking that kubectl top pods works correctly in your cluster first.

yaml

# Example VPA object auto-created by Goldilocks
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-api-server
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-api-server
  updatePolicy:
    updateMode: "Off"   # Key: recommendations only, no restarts
status:
  recommendation:
    containerRecommendations:
    - containerName: app
      target:
        cpu: "250m"
        memory: "512Mi"
      lowerBound:
        cpu: "100m"
        memory: "256Mi"
      upperBound:
        cpu: "500m"
        memory: "1Gi"

Recommendations come in three flavors — target, lowerBound, and upperBound — and which one you use is a surprisingly important decision. target is based on average usage, while upperBound reflects peak traffic. For services with heavy batch workloads or frequent traffic spikes, it's safer to apply a target * 1.2 safety margin or refer to upperBound.

Usage is straightforward: add a single label to the namespace you want to monitor, and the Goldilocks Controller will detect the Deployments in that namespace and automatically create VPA objects.

bash

kubectl label namespace production goldilocks.fairwinds.com/enabled=true

VPA recommendation convergence time: VPA needs at least 7–14 days of real traffic data to produce reliable recommendations. Pulling recommendations immediately after a deployment will yield meaningless numbers.

End-to-End Flow of the GitOps PR Pipeline

The core idea is simple. Instead of a human copying recommendations, a script polls VPA objects, updates Helm values.yaml, and submits a PR. Argo CD then syncs to the cluster once the PR is merged.

Goldilocks Controller
    ↓  (watches namespaces → creates VPA objects)
VPA Recommender
    ↓  (accumulates recommendations based on actual usage patterns)
Kubernetes CronJob
    ↓  (extracts recommendations via kubectl/API)
Automated GitHub PR
    ↓  (modifies values.yaml + submits PR)
Argo CD ApplicationSet
    ↓  (PR Generator → auto-deploys preview environment)
Code Review & Merge
    ↓
Argo CD Sync → applied to cluster
    ↓
Prometheus + Grafana (cost/performance monitoring)

Here's a summary of each component's role:

Component	Role
Goldilocks Controller	Watches namespaces → auto-creates VPA objects
VPA Recommender	Analyzes actual usage patterns → accumulates `.status.recommendation`
CronJob + Script	Extracts VPA recommendations → updates values.yaml → creates PR
Argo CD ApplicationSet	Auto-deploys preview environments via PR Generator
Argo CD	Syncs merged PRs to the cluster

Practical Implementation

Step 1: Automated PR Creation via CronJob

This is the lowest-barrier pattern to get started. Run a CronJob inside the cluster that reads VPA recommendations weekly and creates PRs. All you need is kubectl, jq, yq, and the gh CLI — no third-party commercial tooling required.

First, register your GitHub Token as a Secret. The token needs contents: write and pull-requests: write permissions; a fine-grained PAT is recommended.

bash

kubectl create secret generic github-token \
  --from-literal=token=<YOUR_FINE_GRAINED_PAT> \
  -n goldilocks

Next, configure a ServiceAccount and RBAC so the CronJob can read VPA objects. The CronJob needs explicit permissions to read VPA objects from the API server.

yaml

# vpa-reader-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: vpa-reader
  namespace: goldilocks
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: vpa-reader
rules:
- apiGroups: ["autoscaling.k8s.io"]
  resources: ["verticalpodautoscalers"]
  verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: vpa-reader
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: vpa-reader
subjects:
- kind: ServiceAccount
  name: vpa-reader
  namespace: goldilocks

The CronJob itself is the critical piece. It runs every Monday at 9 AM, extracts recommendations, and invokes the PR creation script. Since bitnami/kubectl:latest does not include jq, yq, or gh by default, you should build a custom image containing all three tools. The example below uses my-registry/pr-bot:1.0.0 as a placeholder image name.

yaml

# goldilocks-pr-bot-cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: goldilocks-pr-bot
  namespace: goldilocks
spec:
  schedule: "0 9 * * 1"   # Every Monday at 9 AM
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: vpa-reader
          containers:
          - name: pr-bot
            image: my-registry/pr-bot:1.0.0   # Custom image with kubectl + jq + yq + gh
            env:
            - name: GITHUB_TOKEN
              valueFrom:
                secretKeyRef:
                  name: github-token
                  key: token
            - name: DELTA_THRESHOLD
              value: "0.2"   # Only create PR if difference exceeds 20%
            command:
            - /bin/sh
            - -c
            - |
              # 1. Extract VPA recommendations (excluding sidecar containers)
              kubectl get vpa -A -o json | jq -r '
                .items[] |
                .metadata.namespace + "/" + .metadata.name + " " +
                (.status.recommendation.containerRecommendations[]? |
                  select(.containerName != "istio-proxy") |
                  select(.containerName != "linkerd-proxy") |
                  .containerName + " cpu=" + .target.cpu +
                  " mem=" + .target.memory)
              ' > /tmp/recommendations.txt
 
              # 2. Invoke PR creation script
              /scripts/create-pr.sh /tmp/recommendations.txt
          restartPolicy: OnFailure

The most important part of the PR creation script is delta filtering. Honestly, I left this out initially and ended up with dozens of PRs flooding in every week. You need to skip updates where the change is less than 20% from the current value to keep PR noise under control.

Another easy mistake is handling the git clone directory. If a previous run left the directory behind, the clone will fail on the next execution. The script below handles this explicitly with rm -rf, and uses set -euo pipefail throughout to fail immediately on any error.

bash

#!/bin/bash
# create-pr.sh
set -euo pipefail
 
BRANCH="chore/resource-update-$(date +%Y%m%d)"
REPO_DIR="/workspace/k8s-manifests"
THRESHOLD=${DELTA_THRESHOLD:-0.2}
 
# Git configuration
git config --global user.email "bot@example.com"
git config --global user.name "Goldilocks Bot"
 
# Clean up any directory left from a previous run
rm -rf "$REPO_DIR"
git clone "https://x-access-token:${GITHUB_TOKEN}@github.com/my-org/k8s-manifests" "$REPO_DIR" \
  || { echo "ERROR: git clone failed"; exit 1; }
 
cd "$REPO_DIR"
git checkout -b "$BRANCH"
 
CHANGED=false
 
while IFS=' ' read -r ns_name container cpu_rec mem_rec; do
  NAMESPACE=$(echo "$ns_name" | cut -d/ -f1)
  APP=$(echo "$ns_name" | cut -d/ -f2)
  VALUES_FILE="helm/${APP}/values.yaml"
 
  [ ! -f "$VALUES_FILE" ] && continue
 
  # Compare current value against recommendation (delta filtering)
  CURRENT_CPU=$(yq e ".resources.requests.cpu" "$VALUES_FILE") \
    || { echo "WARN: yq parse failed for $VALUES_FILE, skipping"; continue; }
  CPU_VAL=$(echo "$cpu_rec" | sed 's/cpu=//')
 
  # Only update if the change exceeds the threshold (applying target * 1.2 safety margin)
  if python3 -c "
import sys
def parse_cpu(v):
    if v.endswith('m'): return float(v[:-1])
    return float(v) * 1000
cur = parse_cpu('$CURRENT_CPU')
rec = parse_cpu('$CPU_VAL') * 1.2  # 20% safety margin
diff = abs(cur - rec) / cur if cur > 0 else 1
sys.exit(0 if diff >= $THRESHOLD else 1)
  "; then
    MEM_VAL=$(echo "$mem_rec" | sed 's/mem=//')
    # Update with safety margin applied
    SAFE_CPU=$(python3 -c "
def parse_cpu(v):
    if v.endswith('m'): return float(v[:-1])
    return float(v) * 1000
print(str(int(parse_cpu('$CPU_VAL') * 1.2)) + 'm')
    ")
    yq e ".resources.requests.cpu = \"${SAFE_CPU}\"" -i "$VALUES_FILE"
    yq e ".resources.requests.memory = \"${MEM_VAL}\"" -i "$VALUES_FILE"
    CHANGED=true
    echo "Updated: $APP (cpu: $CURRENT_CPU → $SAFE_CPU)"
  fi
done < "$1"
 
# Only create PR if there are changes
if [ "$CHANGED" = true ]; then
  git add -A
  git commit -m "chore: update resource requests from Goldilocks recommendations $(date +%Y-%m-%d)"
  git push origin "$BRANCH" \
    || { echo "ERROR: git push failed"; exit 1; }
 
  gh pr create \
    --title "chore: Goldilocks resource right-sizing $(date +%Y-%m-%d)" \
    --body-file /scripts/pr-template.md \
    --label "resource-optimization" \
    --label "automated"
fi

Preparing a PR body template in advance makes life much easier for reviewers. Here's a suggested format for /scripts/pr-template.md:

markdown

## Goldilocks Resource Right-sizing
 
This PR was auto-generated based on VPA recommendations.
 
### Changes
 
| Service | Before CPU | After CPU | Before Memory | After Memory |
|---------|-----------|-----------|--------------|-------------|
| (populated by script) | | | | |
 
### Estimated Cost Savings
 
- Estimated monthly savings: $XXX (including Karpenter node size reduction)
 
### Verification Checklist
 
- [ ] Preview environment deployed successfully
- [ ] No CrashLoopBackOff after pod restarts
- [ ] Key metrics (response time, error rate) within normal range

Code Point	Description
`select(.containerName != "istio-proxy")`	Filters out sidecar container recommendations
`DELTA_THRESHOLD=0.2`	No PR for changes under 20% (noise prevention)
`target * 1.2` safety margin	Adds spike headroom on top of average-based recommendations
`set -euo pipefail`	Fail immediately on error; prevent execution in a broken state
`rm -rf "$REPO_DIR"`	Prevents clone directory conflicts on CronJob re-runs
`$CHANGED = true` check	Prevents empty PR creation when nothing changed

Step 2: Adding Preview Environments with Argo CD ApplicationSet

Auto-generating PRs is great, but without a way to verify "is this change actually safe?", there's always some anxiety. The most alarming case I personally encountered was a PR with under-provisioned recommendations getting merged. Using the ApplicationSet Pull Request Generator, you can automatically deploy changes to a preview namespace the moment a PR is opened, allowing validation before merge.

One important caveat: using {{branch}} directly as a namespace name causes two problems. First, branch names containing slashes (/) are not valid as Kubernetes namespace names. Second, they may exceed the 63-character length limit. The pattern below sanitizes the branch name to avoid both issues.

yaml

# resource-update-preview-appset.yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: resource-update-previews
  namespace: argocd
spec:
  generators:
  - pullRequest:
      github:
        owner: my-org
        repo: k8s-manifests
        labels:
        - resource-optimization   # Filter only Goldilocks PRs
        tokenRef:
          secretName: github-token
          key: token
      requeueAfterSeconds: 300   # Re-check PR status every 5 minutes
  template:
    metadata:
      # Remove slashes + truncate to 50 chars for valid namespace name
      name: "preview-{{branch | replace '/' '-' | truncate 50}}"
    spec:
      project: default
      source:
        repoURL: https://github.com/my-org/k8s-manifests
        targetRevision: "{{head_sha}}"
        path: helm/my-app
        helm:
          valueFiles:
          - values.yaml
      destination:
        server: https://kubernetes.default.svc
        namespace: "preview-{{branch | replace '/' '-' | truncate 50}}"
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
        - CreateNamespace=true
        - ServerSideApply=true

Pull Request Generator: Argo CD detects PRs with the specified label and automatically creates Application objects. requeueAfterSeconds: 300 sets the interval for re-checking PR status and detecting newly opened PRs. When a PR is closed, the Application is automatically deleted.

This ApplicationSet is used alongside your existing Argo CD project configuration. Rather than project: default, it's better to specify a project that matches your team's RBAC policies. Isolating preview namespace access in a dedicated AppProject keeps it from interfering with your production environment.

With this setup, PR reviewers can confirm that "the app starts up correctly with these new recommendations" directly in the preview environment before merging. After merge, Argo CD automatically syncs to the production namespace, completing the full loop.

Pros and Cons

Advantages

Item	Details
Audit trail	Every resource change is recorded as a Git commit, making it traceable who changed what and why
Safe review process	VPA proposes recommendations as PRs rather than restarting pods directly, allowing team review before applying
Incremental rollout	Use namespace labels to scope targets; apply progressively to services you're confident about
Cost visibility	Include estimated cost savings in PR descriptions to quantify business value
Easy rollback	Instantly revert to previous resource settings with a single `git revert`

Drawbacks and Caveats

The most jarring issue I personally ran into was HPA conflicts. Running CPU-based HPA and VPA simultaneously can create situations where they attempt to scale in opposite directions. The table below summarizes the key pitfalls and how to address them.

Item	Details	Mitigation
Recommendation convergence time	Recommendations are inaccurate immediately after a new deployment	Filter PR targets to workloads at least 14 days past their last deployment
Traffic spikes	Risk of under-provisioning for batch jobs or event-driven traffic	Apply `target * 1.2` safety margin; consider `upperBound`-based settings
PR noise	Can generate dozens to hundreds of PRs per week	Delta threshold filtering (±20%) to generate PRs only for meaningful changes
HPA conflicts	CPU scaling conflicts when HPA and VPA are used simultaneously	Apply VPA to memory only, or exclude CPU via `resourcePolicy`
Pod restarts	Pod restarts occur when Argo CD auto-syncs	Protect availability with `ServerSideApply=true` + PodDisruptionBudget
Multi-container pods	Risk of sidecar recommendations contaminating main container settings	Filter by container name (`select(.containerName != "istio-proxy")`)

QualityOfService (QoS): Kubernetes assigns Guaranteed when requests == limits, Burstable when requests < limits, and BestEffort when requests are not set. Changing resource settings can shift the QoS class. For stability-critical services, it's recommended to update limits alongside requests to maintain Guaranteed class.

Most Common Mistakes in Practice

Attaching PR automation before VPA data has converged: If you spin up the CronJob alongside Goldilocks installation, PRs will be generated from recommendations based on only a few days of data — which are essentially meaningless. Let data accumulate for at least two weeks before enabling automation.
Running without delta filtering: It seems fine at first, but as the number of services grows, you end up with dozens of "reduced CPU by 1m" PRs every week. Once PR fatigue sets in, the team starts ignoring them entirely, and the automation loses its value.
Enabling Argo CD AutoSync without PodDisruptionBudgets: The moment a resource PR is merged, Argo CD may restart all pods simultaneously. If this happens during a high-traffic period, it can result in a brief service outage.

Closing Thoughts

One thing that changes after introducing this pipeline is how the team thinks about resource settings: they shift from "something you set once and are afraid to touch" to "something you review weekly via PR." A culture naturally emerges where the cluster proposes sensible values, humans apply final judgment, and changes land safely with a full Git history.

Here are three steps you can start with right now:

Install Goldilocks and start collecting data: After installing with the commands below, add a label to just one of your most expensive namespaces and let it collect data for two weeks.

bash

helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install goldilocks fairwinds-stable/goldilocks \
  -n goldilocks --create-namespace
kubectl label namespace <ns> goldilocks.fairwinds.com/enabled=true

Review recommendations and establish filtering criteria: Use kubectl get vpa -A -o json | jq '.items[].status.recommendation.containerRecommendations' to inspect the recommendations directly, compare them with your current values.yaml settings, and decide on a delta threshold (20–30%) that fits your team.
Deploy the CronJob and run a pilot: Apply the CronJob YAML above, but initially configure it to create Draft PRs only using the --draft flag. After observing for 2–3 weeks, you'll have a clear sense of the noise level and recommendation quality.

Coming Up Next

We'll cover how to use Argo CD ApplicationSet's Matrix Generator and Cluster Generator to progressively roll out resource optimization PRs across environments (dev/staging/prod) in a multi-cluster setup. (Link will be updated after publication)

References

Building a GitOps Pipeline to Automate Goldilocks VPA Recommendations with Argo CD Pull Request Generator | DEV BAK - 기술블로그

DevOps

Building a GitOps Pipeline to Automate Goldilocks VPA Recommendations with Argo CD Pull Request Generator

Core Concepts

What Goldilocks Does: Using VPA "Safely"

yaml

# Example VPA object auto-created by Goldilocks
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-api-server
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-api-server
  updatePolicy:
    updateMode: "Off"   # Key: recommendations only, no restarts
status:
  recommendation:
    containerRecommendations:
    - containerName: app
      target:
        cpu: "250m"
        memory: "512Mi"
      lowerBound:
        cpu: "100m"
        memory: "256Mi"
      upperBound:
        cpu: "500m"
        memory: "1Gi"

Usage is straightforward: add a single label to the namespace you want to monitor, and the Goldilocks Controller will detect the Deployments in that namespace and automatically create VPA objects.

bash

kubectl label namespace production goldilocks.fairwinds.com/enabled=true

VPA recommendation convergence time: VPA needs at least 7–14 days of real traffic data to produce reliable recommendations. Pulling recommendations immediately after a deployment will yield meaningless numbers.

End-to-End Flow of the GitOps PR Pipeline

The core idea is simple. Instead of a human copying recommendations, a script polls VPA objects, updates Helm values.yaml, and submits a PR. Argo CD then syncs to the cluster once the PR is merged.

Goldilocks Controller
    ↓  (watches namespaces → creates VPA objects)
VPA Recommender
    ↓  (accumulates recommendations based on actual usage patterns)
Kubernetes CronJob
    ↓  (extracts recommendations via kubectl/API)
Automated GitHub PR
    ↓  (modifies values.yaml + submits PR)
Argo CD ApplicationSet
    ↓  (PR Generator → auto-deploys preview environment)
Code Review & Merge
    ↓
Argo CD Sync → applied to cluster
    ↓
Prometheus + Grafana (cost/performance monitoring)

Here's a summary of each component's role:

Component	Role
Goldilocks Controller	Watches namespaces → auto-creates VPA objects
VPA Recommender	Analyzes actual usage patterns → accumulates `.status.recommendation`
CronJob + Script	Extracts VPA recommendations → updates values.yaml → creates PR
Argo CD ApplicationSet	Auto-deploys preview environments via PR Generator
Argo CD	Syncs merged PRs to the cluster

Practical Implementation

Step 1: Automated PR Creation via CronJob

First, register your GitHub Token as a Secret. The token needs contents: write and pull-requests: write permissions; a fine-grained PAT is recommended.

bash

kubectl create secret generic github-token \
  --from-literal=token=<YOUR_FINE_GRAINED_PAT> \
  -n goldilocks

Next, configure a ServiceAccount and RBAC so the CronJob can read VPA objects. The CronJob needs explicit permissions to read VPA objects from the API server.

yaml

# vpa-reader-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: vpa-reader
  namespace: goldilocks
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: vpa-reader
rules:
- apiGroups: ["autoscaling.k8s.io"]
  resources: ["verticalpodautoscalers"]
  verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: vpa-reader
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: vpa-reader
subjects:
- kind: ServiceAccount
  name: vpa-reader
  namespace: goldilocks

yaml

# goldilocks-pr-bot-cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: goldilocks-pr-bot
  namespace: goldilocks
spec:
  schedule: "0 9 * * 1"   # Every Monday at 9 AM
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: vpa-reader
          containers:
          - name: pr-bot
            image: my-registry/pr-bot:1.0.0   # Custom image with kubectl + jq + yq + gh
            env:
            - name: GITHUB_TOKEN
              valueFrom:
                secretKeyRef:
                  name: github-token
                  key: token
            - name: DELTA_THRESHOLD
              value: "0.2"   # Only create PR if difference exceeds 20%
            command:
            - /bin/sh
            - -c
            - |
              # 1. Extract VPA recommendations (excluding sidecar containers)
              kubectl get vpa -A -o json | jq -r '
                .items[] |
                .metadata.namespace + "/" + .metadata.name + " " +
                (.status.recommendation.containerRecommendations[]? |
                  select(.containerName != "istio-proxy") |
                  select(.containerName != "linkerd-proxy") |
                  .containerName + " cpu=" + .target.cpu +
                  " mem=" + .target.memory)
              ' > /tmp/recommendations.txt
 
              # 2. Invoke PR creation script
              /scripts/create-pr.sh /tmp/recommendations.txt
          restartPolicy: OnFailure

bash

#!/bin/bash
# create-pr.sh
set -euo pipefail
 
BRANCH="chore/resource-update-$(date +%Y%m%d)"
REPO_DIR="/workspace/k8s-manifests"
THRESHOLD=${DELTA_THRESHOLD:-0.2}
 
# Git configuration
git config --global user.email "bot@example.com"
git config --global user.name "Goldilocks Bot"
 
# Clean up any directory left from a previous run
rm -rf "$REPO_DIR"
git clone "https://x-access-token:${GITHUB_TOKEN}@github.com/my-org/k8s-manifests" "$REPO_DIR" \
  || { echo "ERROR: git clone failed"; exit 1; }
 
cd "$REPO_DIR"
git checkout -b "$BRANCH"
 
CHANGED=false
 
while IFS=' ' read -r ns_name container cpu_rec mem_rec; do
  NAMESPACE=$(echo "$ns_name" | cut -d/ -f1)
  APP=$(echo "$ns_name" | cut -d/ -f2)
  VALUES_FILE="helm/${APP}/values.yaml"
 
  [ ! -f "$VALUES_FILE" ] && continue
 
  # Compare current value against recommendation (delta filtering)
  CURRENT_CPU=$(yq e ".resources.requests.cpu" "$VALUES_FILE") \
    || { echo "WARN: yq parse failed for $VALUES_FILE, skipping"; continue; }
  CPU_VAL=$(echo "$cpu_rec" | sed 's/cpu=//')
 
  # Only update if the change exceeds the threshold (applying target * 1.2 safety margin)
  if python3 -c "
import sys
def parse_cpu(v):
    if v.endswith('m'): return float(v[:-1])
    return float(v) * 1000
cur = parse_cpu('$CURRENT_CPU')
rec = parse_cpu('$CPU_VAL') * 1.2  # 20% safety margin
diff = abs(cur - rec) / cur if cur > 0 else 1
sys.exit(0 if diff >= $THRESHOLD else 1)
  "; then
    MEM_VAL=$(echo "$mem_rec" | sed 's/mem=//')
    # Update with safety margin applied
    SAFE_CPU=$(python3 -c "
def parse_cpu(v):
    if v.endswith('m'): return float(v[:-1])
    return float(v) * 1000
print(str(int(parse_cpu('$CPU_VAL') * 1.2)) + 'm')
    ")
    yq e ".resources.requests.cpu = \"${SAFE_CPU}\"" -i "$VALUES_FILE"
    yq e ".resources.requests.memory = \"${MEM_VAL}\"" -i "$VALUES_FILE"
    CHANGED=true
    echo "Updated: $APP (cpu: $CURRENT_CPU → $SAFE_CPU)"
  fi
done < "$1"
 
# Only create PR if there are changes
if [ "$CHANGED" = true ]; then
  git add -A
  git commit -m "chore: update resource requests from Goldilocks recommendations $(date +%Y-%m-%d)"
  git push origin "$BRANCH" \
    || { echo "ERROR: git push failed"; exit 1; }
 
  gh pr create \
    --title "chore: Goldilocks resource right-sizing $(date +%Y-%m-%d)" \
    --body-file /scripts/pr-template.md \
    --label "resource-optimization" \
    --label "automated"
fi

Preparing a PR body template in advance makes life much easier for reviewers. Here's a suggested format for /scripts/pr-template.md:

markdown

## Goldilocks Resource Right-sizing
 
This PR was auto-generated based on VPA recommendations.
 
### Changes
 
| Service | Before CPU | After CPU | Before Memory | After Memory |
|---------|-----------|-----------|--------------|-------------|
| (populated by script) | | | | |
 
### Estimated Cost Savings
 
- Estimated monthly savings: $XXX (including Karpenter node size reduction)
 
### Verification Checklist
 
- [ ] Preview environment deployed successfully
- [ ] No CrashLoopBackOff after pod restarts
- [ ] Key metrics (response time, error rate) within normal range

Code Point	Description
`select(.containerName != "istio-proxy")`	Filters out sidecar container recommendations
`DELTA_THRESHOLD=0.2`	No PR for changes under 20% (noise prevention)
`target * 1.2` safety margin	Adds spike headroom on top of average-based recommendations
`set -euo pipefail`	Fail immediately on error; prevent execution in a broken state
`rm -rf "$REPO_DIR"`	Prevents clone directory conflicts on CronJob re-runs
`$CHANGED = true` check	Prevents empty PR creation when nothing changed

Step 2: Adding Preview Environments with Argo CD ApplicationSet

yaml

# resource-update-preview-appset.yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: resource-update-previews
  namespace: argocd
spec:
  generators:
  - pullRequest:
      github:
        owner: my-org
        repo: k8s-manifests
        labels:
        - resource-optimization   # Filter only Goldilocks PRs
        tokenRef:
          secretName: github-token
          key: token
      requeueAfterSeconds: 300   # Re-check PR status every 5 minutes
  template:
    metadata:
      # Remove slashes + truncate to 50 chars for valid namespace name
      name: "preview-{{branch | replace '/' '-' | truncate 50}}"
    spec:
      project: default
      source:
        repoURL: https://github.com/my-org/k8s-manifests
        targetRevision: "{{head_sha}}"
        path: helm/my-app
        helm:
          valueFiles:
          - values.yaml
      destination:
        server: https://kubernetes.default.svc
        namespace: "preview-{{branch | replace '/' '-' | truncate 50}}"
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
        - CreateNamespace=true
        - ServerSideApply=true

Pull Request Generator: Argo CD detects PRs with the specified label and automatically creates Application objects. requeueAfterSeconds: 300 sets the interval for re-checking PR status and detecting newly opened PRs. When a PR is closed, the Application is automatically deleted.

Pros and Cons

Advantages

Item	Details
Audit trail	Every resource change is recorded as a Git commit, making it traceable who changed what and why
Safe review process	VPA proposes recommendations as PRs rather than restarting pods directly, allowing team review before applying
Incremental rollout	Use namespace labels to scope targets; apply progressively to services you're confident about
Cost visibility	Include estimated cost savings in PR descriptions to quantify business value
Easy rollback	Instantly revert to previous resource settings with a single `git revert`

Drawbacks and Caveats

Item	Details	Mitigation
Recommendation convergence time	Recommendations are inaccurate immediately after a new deployment	Filter PR targets to workloads at least 14 days past their last deployment
Traffic spikes	Risk of under-provisioning for batch jobs or event-driven traffic	Apply `target * 1.2` safety margin; consider `upperBound`-based settings
PR noise	Can generate dozens to hundreds of PRs per week	Delta threshold filtering (±20%) to generate PRs only for meaningful changes
HPA conflicts	CPU scaling conflicts when HPA and VPA are used simultaneously	Apply VPA to memory only, or exclude CPU via `resourcePolicy`
Pod restarts	Pod restarts occur when Argo CD auto-syncs	Protect availability with `ServerSideApply=true` + PodDisruptionBudget
Multi-container pods	Risk of sidecar recommendations contaminating main container settings	Filter by container name (`select(.containerName != "istio-proxy")`)

QualityOfService (QoS): Kubernetes assigns Guaranteed when requests == limits, Burstable when requests < limits, and BestEffort when requests are not set. Changing resource settings can shift the QoS class. For stability-critical services, it's recommended to update limits alongside requests to maintain Guaranteed class.

Most Common Mistakes in Practice

Attaching PR automation before VPA data has converged: If you spin up the CronJob alongside Goldilocks installation, PRs will be generated from recommendations based on only a few days of data — which are essentially meaningless. Let data accumulate for at least two weeks before enabling automation.
Running without delta filtering: It seems fine at first, but as the number of services grows, you end up with dozens of "reduced CPU by 1m" PRs every week. Once PR fatigue sets in, the team starts ignoring them entirely, and the automation loses its value.
Enabling Argo CD AutoSync without PodDisruptionBudgets: The moment a resource PR is merged, Argo CD may restart all pods simultaneously. If this happens during a high-traffic period, it can result in a brief service outage.

Closing Thoughts

Here are three steps you can start with right now:

Install Goldilocks and start collecting data: After installing with the commands below, add a label to just one of your most expensive namespaces and let it collect data for two weeks.

bash

helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install goldilocks fairwinds-stable/goldilocks \
  -n goldilocks --create-namespace
kubectl label namespace <ns> goldilocks.fairwinds.com/enabled=true

Review recommendations and establish filtering criteria: Use kubectl get vpa -A -o json | jq '.items[].status.recommendation.containerRecommendations' to inspect the recommendations directly, compare them with your current values.yaml settings, and decide on a delta threshold (20–30%) that fits your team.
Deploy the CronJob and run a pilot: Apply the CronJob YAML above, but initially configure it to create Draft PRs only using the --draft flag. After observing for 2–3 weeks, you'll have a clear sense of the noise level and recommendation quality.

Core Concepts

What Goldilocks Does: Using VPA "Safely"

End-to-End Flow of the GitOps PR Pipeline

Practical Implementation

Step 1: Automated PR Creation via CronJob

Step 2: Adding Preview Environments with Argo CD ApplicationSet

Pros and Cons

Advantages

Drawbacks and Caveats

Most Common Mistakes in Practice

Closing Thoughts

Coming Up Next

References

Core Concepts

What Goldilocks Does: Using VPA "Safely"

End-to-End Flow of the GitOps PR Pipeline

Practical Implementation

Step 1: Automated PR Creation via CronJob

Step 2: Adding Preview Environments with Argo CD ApplicationSet

Pros and Cons

Advantages

Drawbacks and Caveats

Most Common Mistakes in Practice

Closing Thoughts

Coming Up Next

References

Recommended Posts

Automating Multi-Cluster Progressive Deployment with Argo CD ApplicationSet Matrix Generator

Automating PR Preview Environments with ArgoCD ApplicationSet

How to Safely Use a Shared DB in Preview Environments — Per-PR Schema Isolation and Seed Data Automation

Reducing Karpenter Costs by Up to 56% Through Kubernetes Resource Right-Sizing with Goldilocks + VPA

Pattern Guide: Reducing EKS Spot Costs by 56% with OpenCost + Karpenter

Kubernetes Cost Optimization in Practice — From Namespace-Level Cost Tracking with OpenCost & Kubecost to HPA/VPA Tuning