Pattern Guide: Reducing EKS Spot Costs by 56% with OpenCost + Karpenter

When operating an EKS cluster, you eventually hit a moment where you think: "Scaling is working fine, we're using Spot instances… so why are the bills still this high?" I initially thought that just adding Karpenter would solve everything. But when you actually dig in, you often find idle nodes sitting around unattended, or On-Demand pricing quietly accumulating in unexpected places.

This post covers a pattern for using OpenCost to pinpoint wasteful spending, combined with tuning Karpenter's Consolidation policy to meaningfully increase Spot instance utilization. The ZeonEdge team's reduction from $50K to $22K per month (56%), and Tinybird's 20% cut in total AWS costs while actually scaling up, both followed exactly this approach. Links to each case study are in the references section at the bottom.

Target audience: Written for DevOps/infrastructure engineers with foundational knowledge of Kubernetes and EKS. A working familiarity with EC2, Spot instances, and kubectl commands is assumed throughout.

Core Concepts

The Feedback Loop OpenCost and Karpenter Create Together

When I first integrated OpenCost, the most striking thing was seeing costs broken down by namespace. Not just the total AWS bill, but something like: "This namespace spent $340 over the past 7 days and has 18% CPU efficiency." As a CNCF sandbox project, it supports AWS, GCP, and Azure without vendor lock-in, and because it operates as a Prometheus exporter, it slots naturally into an existing monitoring stack.

OpenCost — A vendor-neutral open-source tool that allocates Kubernetes costs down to the Pod level. It automatically recognizes karpenter.sh/nodepool labels and can aggregate per-NodePool hourly costs via PromQL.

On the Karpenter side, unlike the legacy Cluster Autoscaler (CA), it calls cloud APIs directly without predefined node groups — provisioning the optimal instance for a Pending Pod's requirements immediately. The v1.0 GA in late 2024 stabilized the NodePool and EC2NodeClass APIs, and the critical SpotToSpotConsolidation feature is now officially available as a Feature Gate.

Spot Instances — AWS's way of offering spare compute capacity at up to 90% off On-Demand pricing. The trade-off: AWS can reclaim that capacity with 2 minutes' notice when it's needed elsewhere.

Spot-to-Spot Consolidation (SpotToSpotConsolidation) — Automatically replaces currently running Spot instances with cheaper Spot types. Only activates when 15 or more instance types are specified. Many teams unknowingly run with this disabled because they aren't aware of that requirement.

Honestly, using Karpenter alone isn't bad. But without OpenCost, it's hard to know precisely where your cluster is burning money. Conversely, OpenCost alone can surface waste without automatically fixing it. Together, the two tools complete the following loop:

Step	Tool	Role
① Visibility	OpenCost	Identify wasteful spending by namespace and NodePool
② Policy Tuning	Karpenter NodePool	Expand instance diversity, configure consolidation policy
③ Auto-Optimization	Karpenter Consolidation	Auto-replace with cheaper Spot, delete idle nodes
④ Measure Impact	OpenCost + Grafana	Track savings, enter next tuning cycle

As you read through the three practical examples below, it helps to keep in mind which step of this loop each one corresponds to.

Practical Application

Before you start: Spot instances require interruption handling as a prerequisite. If you haven't configured Karpenter to receive interruption events via SQS + EventBridge and drain Pods proactively, a sudden instance disappearance will impact your services. The setup is well documented in the AWS Spot Instances with Karpenter official blog post — review it before applying the examples below.

② Policy Tuning — Configuring a Spot-to-Spot Consolidation NodePool

This example corresponds to step ② of the loop: NodePool policy tuning. The first thing to do in practice is to open up your instance types to 15 or more. I initially only listed 3–4 types, thinking "won't more types be harder to manage?" — and later discovered that Spot-to-Spot consolidation was completely disabled, leaving On-Demand nodes lingering indefinitely.

To enable the SpotToSpotConsolidation Feature Gate, you need to add a flag to the Karpenter controller. If you're installing via Helm, add the following to values.yaml:

yaml

# karpenter values.yaml
controller:
  extraArgs:
    - --feature-gates=SpotToSpotConsolidation=true

With the Feature Gate configured, the NodePool can be structured like this:

yaml

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: spot-general
spec:
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 5m          # too short causes unnecessary restarts
    budgets:
      - nodes: "20%"              # replace at most 20% of nodes at a time
  template:
    metadata:
      labels:
        workload-type: general    # label for OpenCost cost classification
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]   # Karpenter picks Spot first by cost; falls back to On-Demand when unavailable
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
            - m5.xlarge
            - m5a.xlarge
            - m5d.xlarge
            - m6i.xlarge
            - m6a.xlarge
            - m6in.xlarge
            - m7i.xlarge
            - m7a.xlarge
            - c5.2xlarge
            - c5a.2xlarge
            - c6i.2xlarge
            - c6a.2xlarge
            - r5.xlarge
            - r6i.xlarge
            - r6a.xlarge
            # 15+ types — required for SpotToSpotConsolidation

You might wonder: if both spot and on-demand are listed under capacity-type, does Karpenter pick randomly? In practice, Karpenter queries the current Spot price for each instance type in real time and selects the cheapest combination — so as long as Spot inventory is available, Spot will be chosen automatically.

Here's a summary of the intent behind each setting:

Config Key	Value	Intent
`consolidationPolicy`	`WhenEmptyOrUnderutilized`	Target both empty nodes and underutilized nodes for replacement
`consolidateAfter`	`5m`	1m is too aggressive — 5m recommended for bursty workloads
`budgets.nodes`	`20%`	Prevent service impact from replacing too many nodes at once
`capacity-type`	`spot`, `on-demand`	Karpenter's cost-based algorithm prioritizes Spot

① Visibility — Detecting Wasteful Namespaces with OpenCost

Now, back to step ① of the loop: finding where money is actually leaking. Even with a well-configured Karpenter setup, if CPU/Memory Requests are set significantly higher than actual usage, Karpenter treats those nodes as "full" and excludes them from consolidation. The OpenCost API can surface this.

To call the API locally, you'll first need to set up port forwarding:

bash

kubectl port-forward -n opencost svc/opencost 9090:9090

With port forwarding active, the following command produces a cost efficiency report by namespace:

bash

# Query per-namespace cost efficiency via OpenCost API (last 7 days)
curl "http://localhost:9090/model/allocation" \
  --data-urlencode 'window=7d' \
  --data-urlencode 'aggregate=namespace' \
  --data-urlencode 'accumulate=true' | \
  jq '.data[0] | to_entries | 
      map({
        namespace: .key,
        cpuEfficiency: .value.cpuEfficiency,
        memEfficiency: .value.memEfficiency,
        totalCost: .value.totalCost
      }) | 
      sort_by(.totalCost) | reverse | .[:10]'

For those unfamiliar with the jq pipeline, the output looks like this:

json

[
  {
    "namespace": "data-pipeline",
    "cpuEfficiency": 0.18,
    "memEfficiency": 0.42,
    "totalCost": 340.21
  },
  {
    "namespace": "api-server",
    "cpuEfficiency": 0.71,
    "memEfficiency": 0.68,
    "totalCost": 112.05
  }
]

Any namespace with cpuEfficiency below 0.3 (under 30%) is a candidate for VPA right-sizing. Reducing Requests allows Karpenter to re-consolidate onto smaller Spot nodes. This is exactly the flow ZeonEdge used to go from $50K to $22K.

To get VPA right-sizing recommendations, it's worth installing VPA in Recommendation mode first — it surfaces suggestions without actually changing any Requests, making it a safe starting point.

yaml

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: data-pipeline-vpa
  namespace: data-pipeline
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: data-pipeline
  updatePolicy:
    updateMode: "Off"   # Recommendation mode — collects suggestions without applying changes

Run kubectl describe vpa data-pipeline-vpa -n data-pipeline to check the recommended Request values. If they differ significantly from what's currently set, reduce them gradually.

NodePool hourly costs can also be aggregated with PromQL. Metric names can vary depending on how OpenCost is deployed — the exact list is in the OpenCost Prometheus exporter official docs. Metrics like container_cpu_allocation are generally validated across most deployment environments.

③ Auto-Optimization — Separating NodePools by Workload Characteristics

This is step ③ of the loop: maximizing automated cost optimization. Running batch workloads (CI/CD, media encoding, etc.) and front-end APIs on the same NodePool is a common inefficiency in practice. Consolidation policies can interrupt API servers, and batch jobs — which are relatively tolerant of Spot interruptions — end up running on expensive On-Demand nodes.

When I first applied this separation, there was internal pushback about "Spot being unstable." It turned out that batch and API workloads were sharing the same pool, so batch interruptions were bleeding into API latency. Separating them made the API stability issues disappear.

yaml

# Batch-only Spot NodePool — aggressive consolidation allowed
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: spot-batch
spec:
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m          # short interval is fine for batch
    budgets:
      - nodes: "30%"
  template:
    metadata:
      labels:
        workload-type: batch
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      taints:
        - key: workload-type
          value: batch
          effect: NoSchedule       # only batch Pods schedule onto these nodes
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]         # Spot only for batch
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
            - c5.2xlarge
            - c5a.2xlarge
            - c6i.2xlarge
            - c6a.2xlarge
            - m5.2xlarge
            - m6i.2xlarge
            - m6a.2xlarge
            - m7i.2xlarge
            - r5.xlarge
            - r6i.xlarge
            - r6a.xlarge
            - r7i.xlarge
            - c7i.2xlarge
            - c7a.2xlarge
            - m7a.2xlarge
            # maintain 15+ types
---
# API server On-Demand NodePool — stability first
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: ondemand-api
spec:
  disruption:
    consolidationPolicy: WhenEmpty # delete only when empty
    consolidateAfter: 30m
    budgets:
      - nodes: "10%"              # more conservative for stability
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]

Viewing costs in OpenCost segmented by label_workload_type makes it immediately clear how much the Spot batch NodePool is saving — which naturally flows into step ④, measuring impact.

Pros and Cons

Advantages

Item	Details
Cost savings potential	Spot is up to 90% cheaper than On-Demand. Real-world cases show 20–56% total savings
Full automation	The consolidation loop continuously finds and replaces with cheaper Spot — no manual intervention required
Granular visibility	Per-NodePool, per-namespace, per-label cost tracking enables precise identification of wasteful spending
Open-source stack	No licensing costs for either tool. OpenCost is a CNCF project with no vendor lock-in
Existing stack integration	Built on Prometheus + Grafana — integrates without additional monitoring infrastructure

Disadvantages and Caveats

Item	Details	Mitigation
Spot interruption risk	AWS can reclaim with 2 minutes' notice	SQS + EventBridge interruption handling is mandatory
Fewer than 15 instance types	Spot-to-Spot consolidation is disabled	Specify 15+ instance types in NodePool
OpenCost price accuracy	Falls back to On-Demand pricing if unconfigured, underreporting savings	Requires S3 data feed + IAM permission setup
Unsuitable for stateful workloads	DBs, distributed storage risk data loss on Spot interruption	PDB + tolerations design, separate On-Demand NodePool
Excessive rescheduling	Too short a `consolidateAfter` causes unnecessary Pod restarts	5m or longer recommended for bursty workloads
Cost data latency	Depends on AWS pricing feed refresh cycle — not fully real-time	Account for a few minutes to tens of minutes of delay when interpreting dashboards

PodDisruptionBudget (PDB) — A Kubernetes resource that limits the minimum/maximum number of Pods that can be disrupted simultaneously. It's the key mechanism for maintaining service availability during Spot consolidation.

Here are the mistakes I see most often in practice:

Most Common Mistakes in Production

Specifying only 3–5 instance types — Spot-to-Spot consolidation silently disables itself, and this often doesn't show up clearly in logs. After applying a NodePool, always check the status field with kubectl get nodepool spot-general -o yaml.
Skipping the OpenCost Spot pricing feed configuration — Without it, costs fall back to On-Demand pricing, making dashboard savings appear as $0 or far less than actual. AWS IAM permissions are easy to miss; the required permission list is documented in the OpenCost AWS configuration guide.
Applying Spot to stateful workloads — Running stateful workloads like databases or Kafka on Spot NodePools risks data loss on interruption. Always pin them to an On-Demand NodePool using nodeSelector or affinity.

Closing Thoughts

The visibility → analysis → optimization loop — using OpenCost to pinpoint waste and tuning Karpenter's Spot-to-Spot consolidation policy — is the core pattern for reducing EKS costs. Rather than aiming for a perfect configuration from the start, it's worth beginning lightly with the steps below:

4 Steps You Can Start Right Now

Install OpenCost — Install with helm install opencost opencost/opencost -n opencost --create-namespace and access the UI via port forwarding.
Identify wasteful namespaces — Use the curl + jq example above to find namespaces with cpuEfficiency below 30% and identify VPA right-sizing candidates.
Expand NodePool instance types — Use the AWS Spot Instance Advisor to select instances with <5% interruption rate, expand to 15+ types, and enable the SpotToSpotConsolidation Feature Gate.
Connect a Grafana dashboard — Add the opencost-mixin + kubernetes-autoscaling-mixin dashboards to monitor per-NodePool Spot savings and idle node ratios in a single view.

After one full cycle of this loop, "where our cluster was losing money" becomes visible in concrete numbers. At that point, the next optimization cycle runs significantly faster — and that compounding speed is the real advantage of this pattern.

Next post: How to further improve Karpenter consolidation efficiency by automatically right-sizing CPU/Memory Requests with Goldilocks + VPA

References

Pattern Guide: Reducing EKS Spot Costs by 56% with OpenCost + Karpenter | DEV BAK - 기술블로그

DevOps

Pattern Guide: Reducing EKS Spot Costs by 56% with OpenCost + Karpenter

Target audience: Written for DevOps/infrastructure engineers with foundational knowledge of Kubernetes and EKS. A working familiarity with EC2, Spot instances, and kubectl commands is assumed throughout.

Core Concepts

The Feedback Loop OpenCost and Karpenter Create Together

OpenCost — A vendor-neutral open-source tool that allocates Kubernetes costs down to the Pod level. It automatically recognizes karpenter.sh/nodepool labels and can aggregate per-NodePool hourly costs via PromQL.

Spot Instances — AWS's way of offering spare compute capacity at up to 90% off On-Demand pricing. The trade-off: AWS can reclaim that capacity with 2 minutes' notice when it's needed elsewhere.

Spot-to-Spot Consolidation (SpotToSpotConsolidation) — Automatically replaces currently running Spot instances with cheaper Spot types. Only activates when 15 or more instance types are specified. Many teams unknowingly run with this disabled because they aren't aware of that requirement.

Step	Tool	Role
① Visibility	OpenCost	Identify wasteful spending by namespace and NodePool
② Policy Tuning	Karpenter NodePool	Expand instance diversity, configure consolidation policy
③ Auto-Optimization	Karpenter Consolidation	Auto-replace with cheaper Spot, delete idle nodes
④ Measure Impact	OpenCost + Grafana	Track savings, enter next tuning cycle

As you read through the three practical examples below, it helps to keep in mind which step of this loop each one corresponds to.

Practical Application

Before you start: Spot instances require interruption handling as a prerequisite. If you haven't configured Karpenter to receive interruption events via SQS + EventBridge and drain Pods proactively, a sudden instance disappearance will impact your services. The setup is well documented in the AWS Spot Instances with Karpenter official blog post — review it before applying the examples below.

② Policy Tuning — Configuring a Spot-to-Spot Consolidation NodePool

To enable the SpotToSpotConsolidation Feature Gate, you need to add a flag to the Karpenter controller. If you're installing via Helm, add the following to values.yaml:

yaml

# karpenter values.yaml
controller:
  extraArgs:
    - --feature-gates=SpotToSpotConsolidation=true

With the Feature Gate configured, the NodePool can be structured like this:

yaml

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: spot-general
spec:
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 5m          # too short causes unnecessary restarts
    budgets:
      - nodes: "20%"              # replace at most 20% of nodes at a time
  template:
    metadata:
      labels:
        workload-type: general    # label for OpenCost cost classification
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]   # Karpenter picks Spot first by cost; falls back to On-Demand when unavailable
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
            - m5.xlarge
            - m5a.xlarge
            - m5d.xlarge
            - m6i.xlarge
            - m6a.xlarge
            - m6in.xlarge
            - m7i.xlarge
            - m7a.xlarge
            - c5.2xlarge
            - c5a.2xlarge
            - c6i.2xlarge
            - c6a.2xlarge
            - r5.xlarge
            - r6i.xlarge
            - r6a.xlarge
            # 15+ types — required for SpotToSpotConsolidation

Here's a summary of the intent behind each setting:

Config Key	Value	Intent
`consolidationPolicy`	`WhenEmptyOrUnderutilized`	Target both empty nodes and underutilized nodes for replacement
`consolidateAfter`	`5m`	1m is too aggressive — 5m recommended for bursty workloads
`budgets.nodes`	`20%`	Prevent service impact from replacing too many nodes at once
`capacity-type`	`spot`, `on-demand`	Karpenter's cost-based algorithm prioritizes Spot

① Visibility — Detecting Wasteful Namespaces with OpenCost

To call the API locally, you'll first need to set up port forwarding:

bash

kubectl port-forward -n opencost svc/opencost 9090:9090

With port forwarding active, the following command produces a cost efficiency report by namespace:

bash

# Query per-namespace cost efficiency via OpenCost API (last 7 days)
curl "http://localhost:9090/model/allocation" \
  --data-urlencode 'window=7d' \
  --data-urlencode 'aggregate=namespace' \
  --data-urlencode 'accumulate=true' | \
  jq '.data[0] | to_entries | 
      map({
        namespace: .key,
        cpuEfficiency: .value.cpuEfficiency,
        memEfficiency: .value.memEfficiency,
        totalCost: .value.totalCost
      }) | 
      sort_by(.totalCost) | reverse | .[:10]'

For those unfamiliar with the jq pipeline, the output looks like this:

json

[
  {
    "namespace": "data-pipeline",
    "cpuEfficiency": 0.18,
    "memEfficiency": 0.42,
    "totalCost": 340.21
  },
  {
    "namespace": "api-server",
    "cpuEfficiency": 0.71,
    "memEfficiency": 0.68,
    "totalCost": 112.05
  }
]

To get VPA right-sizing recommendations, it's worth installing VPA in Recommendation mode first — it surfaces suggestions without actually changing any Requests, making it a safe starting point.

yaml

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: data-pipeline-vpa
  namespace: data-pipeline
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: data-pipeline
  updatePolicy:
    updateMode: "Off"   # Recommendation mode — collects suggestions without applying changes

Run kubectl describe vpa data-pipeline-vpa -n data-pipeline to check the recommended Request values. If they differ significantly from what's currently set, reduce them gradually.

③ Auto-Optimization — Separating NodePools by Workload Characteristics

yaml

# Batch-only Spot NodePool — aggressive consolidation allowed
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: spot-batch
spec:
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m          # short interval is fine for batch
    budgets:
      - nodes: "30%"
  template:
    metadata:
      labels:
        workload-type: batch
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      taints:
        - key: workload-type
          value: batch
          effect: NoSchedule       # only batch Pods schedule onto these nodes
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]         # Spot only for batch
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
            - c5.2xlarge
            - c5a.2xlarge
            - c6i.2xlarge
            - c6a.2xlarge
            - m5.2xlarge
            - m6i.2xlarge
            - m6a.2xlarge
            - m7i.2xlarge
            - r5.xlarge
            - r6i.xlarge
            - r6a.xlarge
            - r7i.xlarge
            - c7i.2xlarge
            - c7a.2xlarge
            - m7a.2xlarge
            # maintain 15+ types
---
# API server On-Demand NodePool — stability first
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: ondemand-api
spec:
  disruption:
    consolidationPolicy: WhenEmpty # delete only when empty
    consolidateAfter: 30m
    budgets:
      - nodes: "10%"              # more conservative for stability
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]

Viewing costs in OpenCost segmented by label_workload_type makes it immediately clear how much the Spot batch NodePool is saving — which naturally flows into step ④, measuring impact.

Pros and Cons

Advantages

Item	Details
Cost savings potential	Spot is up to 90% cheaper than On-Demand. Real-world cases show 20–56% total savings
Full automation	The consolidation loop continuously finds and replaces with cheaper Spot — no manual intervention required
Granular visibility	Per-NodePool, per-namespace, per-label cost tracking enables precise identification of wasteful spending
Open-source stack	No licensing costs for either tool. OpenCost is a CNCF project with no vendor lock-in
Existing stack integration	Built on Prometheus + Grafana — integrates without additional monitoring infrastructure

Disadvantages and Caveats

Item	Details	Mitigation
Spot interruption risk	AWS can reclaim with 2 minutes' notice	SQS + EventBridge interruption handling is mandatory
Fewer than 15 instance types	Spot-to-Spot consolidation is disabled	Specify 15+ instance types in NodePool
OpenCost price accuracy	Falls back to On-Demand pricing if unconfigured, underreporting savings	Requires S3 data feed + IAM permission setup
Unsuitable for stateful workloads	DBs, distributed storage risk data loss on Spot interruption	PDB + tolerations design, separate On-Demand NodePool
Excessive rescheduling	Too short a `consolidateAfter` causes unnecessary Pod restarts	5m or longer recommended for bursty workloads
Cost data latency	Depends on AWS pricing feed refresh cycle — not fully real-time	Account for a few minutes to tens of minutes of delay when interpreting dashboards

PodDisruptionBudget (PDB) — A Kubernetes resource that limits the minimum/maximum number of Pods that can be disrupted simultaneously. It's the key mechanism for maintaining service availability during Spot consolidation.

Here are the mistakes I see most often in practice:

Most Common Mistakes in Production

Specifying only 3–5 instance types — Spot-to-Spot consolidation silently disables itself, and this often doesn't show up clearly in logs. After applying a NodePool, always check the status field with kubectl get nodepool spot-general -o yaml.
Skipping the OpenCost Spot pricing feed configuration — Without it, costs fall back to On-Demand pricing, making dashboard savings appear as $0 or far less than actual. AWS IAM permissions are easy to miss; the required permission list is documented in the OpenCost AWS configuration guide.
Applying Spot to stateful workloads — Running stateful workloads like databases or Kafka on Spot NodePools risks data loss on interruption. Always pin them to an On-Demand NodePool using nodeSelector or affinity.

Closing Thoughts

4 Steps You Can Start Right Now

Install OpenCost — Install with helm install opencost opencost/opencost -n opencost --create-namespace and access the UI via port forwarding.
Identify wasteful namespaces — Use the curl + jq example above to find namespaces with cpuEfficiency below 30% and identify VPA right-sizing candidates.
Expand NodePool instance types — Use the AWS Spot Instance Advisor to select instances with <5% interruption rate, expand to 15+ types, and enable the SpotToSpotConsolidation Feature Gate.
Connect a Grafana dashboard — Add the opencost-mixin + kubernetes-autoscaling-mixin dashboards to monitor per-NodePool Spot savings and idle node ratios in a single view.

Next post: How to further improve Karpenter consolidation efficiency by automatically right-sizing CPU/Memory Requests with Goldilocks + VPA

Core Concepts

The Feedback Loop OpenCost and Karpenter Create Together

Practical Application

② Policy Tuning — Configuring a Spot-to-Spot Consolidation NodePool

① Visibility — Detecting Wasteful Namespaces with OpenCost

③ Auto-Optimization — Separating NodePools by Workload Characteristics

Pros and Cons

Advantages

Disadvantages and Caveats

Most Common Mistakes in Production

Closing Thoughts

4 Steps You Can Start Right Now

References

Core Concepts

The Feedback Loop OpenCost and Karpenter Create Together

Practical Application

② Policy Tuning — Configuring a Spot-to-Spot Consolidation NodePool

① Visibility — Detecting Wasteful Namespaces with OpenCost

③ Auto-Optimization — Separating NodePools by Workload Characteristics

Pros and Cons

Advantages

Disadvantages and Caveats

Most Common Mistakes in Production

Closing Thoughts

4 Steps You Can Start Right Now

References

Recommended Posts

Reducing Karpenter Costs by Up to 56% Through Kubernetes Resource Right-Sizing with Goldilocks + VPA

Building a GitOps Pipeline to Automate Goldilocks VPA Recommendations with Argo CD Pull Request Generator

Automating Multi-Cluster Progressive Deployment with Argo CD ApplicationSet Matrix Generator

Kubernetes Cost Optimization in Practice — From Namespace-Level Cost Tracking with OpenCost & Kubecost to HPA/VPA Tuning

MLOps Model Deployment Automation: Building a CI/CD/CT Pipeline with GitHub Actions + Kubeflow

WebAssembly (Wasm) Serverless: The Complete Guide — Sub-1ms Cold Starts to Kubernetes Deployment