Kubernetes Cost Optimization in Practice — From Namespace-Level Cost Tracking with OpenCost & Kubecost to HPA/VPA Tuning

If you've ever stared blankly at a cloud bill wondering "where did all of this come from?", you're not alone. When I first started operating a Kubernetes cluster, I saw the number on our AWS invoice and tried to break down costs by team — only to hit a wall with the node-based billing model. In an environment where dozens of teams share a single CPU and a handful of memory, figuring out "how much did the payments team spend?" turns out to be a surprisingly complex problem.

By the end of this post, you'll be able to install OpenCost in under 10 minutes and pull a per-team cost report today. Using this approach, I cut our staging namespace costs by 30% in a single month — and 20–30% savings stories like this are far more common in the industry than you might think. This post walks through how to track per-namespace costs with OpenCost and Kubecost, improve your HPA/VPA autoscaler configuration, and drive real cost savings — with practical, hands-on examples.

The content is primarily aimed at teams running shared clusters, but the principles apply at any cost scale as ratios. If the example dollar figures seem large, just read them as "the proportions for our cluster."

Core Concepts

Why Kubernetes Cost Allocation Is Hard

One of the ironies of cloud-native environments is that the more efficiently you share resources, the harder cost tracking becomes. Back when one service ran on one EC2 instance, cost attribution was straightforward. But Kubernetes has dozens of pods sharing a single node. AWS bills at the node level, so separating the cost of each team's workloads running on that node requires a dedicated cost allocation layer.

Two concepts come up frequently here:

Showback vs Chargeback: Showback means "showing" teams their costs without actual billing. Chargeback means actually charging those costs against a team's budget. Most organizations start with Showback and move to Chargeback as the culture matures.

OpenCost vs Kubecost — Which Should You Use?

Honestly, these two tools are less competitors and more the two ends of a spectrum. Kubecost is a commercial product that evolved from the OpenCost spec, differing mainly in how many enterprise features are layered on top. That said, Kubecost runs its own separate data collection layer, so it's more accurate to say they both started from a common spec and evolved in their own directions, rather than Kubecost using OpenCost as its core engine.

Item	OpenCost	Kubecost
License	Open source (Apache 2.0)	Commercial (based on OpenCost spec)
Price	Free	From $449/month (Business)
Multi-cluster	Limited	Fully supported
RI/Spot discount reflection	Not supported	AWS/GCP/Azure billing integration
RBAC / SSO	Not supported	Supported
Savings recommendations	None	Available (includes right-sizing)
Budget alerts	None	Available

For a startup with five or fewer engineers, OpenCost + Grafana is more than enough. For an organization with dozens of teams sharing a cluster and looking to introduce FinOps culture, Kubecost's enterprise features really shine.

FinOps: Short for Financial Operations. A cultural and organizational practice that manages cloud costs not just through monitoring, but through team accountability frameworks and policies.

HPA and VPA — The Core Levers of Cost Optimization

If cost tracking is about measuring "how much you're spending now," autoscalers are tools for improving "how efficiently you're spending it." A situation I encounter often in practice: when you look at overall cluster CPU usage, actual utilization relative to requests often sits somewhere between 20–45%. In other words, a lot is reserved but not much is actually used.

HPA (Horizontal Pod Autoscaler): Scales the number of pod replicas up or down based on CPU or custom metrics (RPS, queue length, etc.). Its strength is fast reaction to traffic changes.
VPA (Vertical Pod Autoscaler): Adjusts the CPU/memory requests and limits of individual pods to match actual usage patterns. Particularly suited for reducing over-provisioned resources.

Requests vs Limits: Requests are the amount of resources guaranteed to a pod at scheduling time; Limits are the maximum a pod can use. In most cases, costs are calculated based on Requests, not Limits. However, this can vary by cloud provider and billing model, so it's worth checking your provider's documentation. Reducing Requests to match actual usage is the core of cost optimization.

Practical Application

Now that we've covered the concepts, let's put them into practice. The four examples below are ordered by complexity. Examples 1–3 can be applied immediately by most teams, while Example 4 is an advanced pattern that elevates cost control to the policy level.

Example 1: Installing OpenCost and Querying Per-Namespace Costs

With Helm, installation itself is fairly straightforward. First, Prometheus must be installed in your cluster so OpenCost can collect metrics. If you don't have Prometheus, you can get started quickly with the kube-prometheus-stack Helm chart. Once Prometheus is ready, you can see your first cost data from OpenCost in under 10 minutes.

bash

# Add the OpenCost Helm repository
helm repo add opencost https://opencost.github.io/opencost-helm-chart
helm repo update
 
# Install into the opencost namespace
helm install opencost opencost/opencost \
  --namespace opencost --create-namespace \
  --set opencost.exporter.defaultClusterId=my-cluster
 
# Install the kubectl cost plugin
# kubectl krew is a kubectl plugin manager — https://krew.sigs.k8s.io/docs/user-guide/setup/install/
kubectl krew install cost
 
# Query per-namespace costs for the last 7 days
kubectl cost namespace --window 7d

Below is an example of what this command might return. The figures are based on a large enterprise cluster, but the patterns hold at any scale when viewed proportionally.

Namespace	Monthly Cost (Projected)	CPU Cost	Memory Cost
analytics	$22,000	$14,000	$8,000
payments	$18,000	$12,000	$6,000
staging	$10,000	$6,500	$3,500

I was confused at first too — seeing the staging namespace come in at $10,000 raised some eyebrows. When I investigated, it turned out to be running at full capacity through nights and weekends. That single report kicked off a team conversation. After applying nighttime scale-down via a CronJob, costs dropped by nearly 30% in a month.

Example 2: Setting Up a Cost Dashboard with Prometheus + Grafana

OpenCost exports Prometheus metrics on port :9003 by default. Add the following configuration to your prometheus.yml:

yaml

# prometheus.yml scrape configuration
scrape_configs:
  - job_name: 'opencost'
    scrape_interval: 1m
    static_configs:
      # Format: servicename.namespace.svc, 9003 is the default OpenCost metrics exporter port
      - targets: ['opencost.opencost.svc:9003']

This tells Prometheus to scrape OpenCost metrics every minute. Then, importing dashboard ID 22208 in Grafana gives you immediate visualization of costs by cluster, namespace, and pod — no custom queries needed, making this the fastest path to a cost dashboard.

Example 3: Using VPA Recommendation Mode + Separate HPA to Avoid Conflicts

Running HPA and VPA simultaneously is fine, but if both autoscalers are tuning the same metric (CPU or memory) at the same time, a feedback loop can form that degrades both cost and stability. This actually happened to our team: HPA would scale up pods to reduce CPU utilization, then VPA would increase the CPU allocation per pod, which would trigger HPA again — causing oscillation.

The pattern that has become established in the industry is to clearly separate their roles: HPA handles CPU-based horizontal scaling, and VPA handles memory sizing recommendations.

yaml

# VPA: Memory resource optimization (recommendation mode — changes applied manually)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa
  namespace: payments
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Off"   # Provides recommendations only, no automatic application
  resourcePolicy:
    containerPolicies:
      - containerName: api
        mode: "Off"     # Disables automatic updates at the container level
        minAllowed:
          cpu: "100m"
          memory: "128Mi"
        maxAllowed:
          cpu: "2"
          memory: "2Gi"

The reason for explicitly setting mode: "Off" in the container policy — separate from updateMode: "Off" which disables overall VPA behavior — is that it enables fine-grained control per container. Omitting this field when first rolling out VPA can cause confusion later when you try to switch only specific containers to Auto mode.

yaml

# HPA: CPU-based horizontal replica scaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
  namespace: payments
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60

Item	Setting	Reason
VPA updateMode	`Off`	Review recommendations manually before applying
VPA containerPolicies mode	`Off`	Disable automatic updates per container
HPA metric	CPU	Memory reacts slowly and can cause pod restarts
HPA minReplicas	2	Prevent single point of failure
VPA maxAllowed	cpu: 2, memory: 2Gi	Prevent unexpectedly excessive resource allocation

Example 4: Blocking Resource Creation on Namespace Budget Overrun with Kyverno (Advanced)

Going one step further from viewing cost data, it's also possible to enforce it as policy. Below is an example policy that integrates OpenCost with Kyverno to block new pod creation when a specific namespace exceeds its budget threshold.

Important: The YAML below is pseudo-code intended to illustrate the concept. {{ opencost.namespace_cost }} is not a valid variable reference syntax in actual Kyverno. In a real implementation, you would use Kyverno's External Data feature along with JMESPath expressions to call the OpenCost API. For a working implementation, see the Nirmata guide.

yaml

# Conceptual example — this is not working code
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: block-overspending-ns
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-namespace-cost
      match:
        resources:
          kinds: ["Pod"]
      validate:
        message: "Namespace budget exceeded. Contact FinOps team."
        deny:
          conditions:
            - key: "{{ opencost.namespace_cost }}"  # Replace with external data source in real implementation
              operator: GreaterThan
              value: "5000"

Kyverno: A Kubernetes-native policy engine. Policies are written in YAML and operate as an admission controller, allowing you to block or audit policy violations in real time during resource creation and modification.

Pros and Cons

The most painful drawback I encountered running this in production was OpenCost's lack of discount reflection. We were getting a 40% discount on Reserved Instances, but the OpenCost dashboard was showing on-demand pricing — at first I was thrown off wondering "why is it showing so expensive?" I've broken this down in more detail in the table below.

Advantages

Item	Details
Immediate visibility	Real-time cost visibility at the namespace, pod, and label level
Waste detection	Identifies hidden waste like full-capacity staging environments and over-provisioned requests
FinOps culture foundation	Easy to introduce team accountability, from Showback to Chargeback
Autoscaler improvement	Immediate cost savings when optimizing requests based on VPA recommendations
Policy-based control	Can automatically block budget overruns via Kyverno integration (advanced)

Disadvantages and Caveats

Item	Details	Mitigation
OpenCost discount gap	RI/Spot discounts not reflected — costs shown higher than actual	Use Kubecost or AWS Cost Explorer alongside
Prometheus dependency	Prometheus required for OpenCost's default operation	Evaluate the Collector Datasource (beta) lightweight option
VPA restart issue	`Auto` mode restarts pods to change requests	Start with `updateMode: Off` and transition gradually
Kubecost pricing	$449/month+ is a burden; pricing policy opaque after IBM acquisition	OpenCost Free tier is sufficient for small teams
HPA memory metric risk	Memory-based HPA can trigger pod restart loops	Replace with CPU or custom metrics (RPS, queue length)

Reserved Instance (RI): Reserving an instance with a 1- or 3-year commitment from a cloud provider can yield discounts of up to 60–70% compared to on-demand pricing. Because OpenCost does not automatically reflect these discounts, there can be a discrepancy with your actual billed amount.

Top 3 Mistakes to Avoid

Activating HPA and VPA on the same metric simultaneously — If both autoscalers are tuning CPU at the same time, a feedback loop causes pod counts and resource allocations to oscillate unpredictably. Strongly recommend separating metric responsibilities.
Omitting minAllowed / maxAllowed in VPA — Running VPA without boundary values can lead to unexpected OOM kills or, conversely, excessive CPU allocation. It's best to define a safe range from the start.
Installing cost tracking tools without setting up alerts — No matter how beautiful the dashboard, it's useless if no one's watching it. Connecting Kubecost budget alerts or Grafana threshold alerts to a Slack channel lets you catch anomalies quickly without weekly reviews.

Closing Thoughts

Cost optimization starts with making spending visible, and real savings come from tuning HPA/VPA. Without any major architectural changes, achieving 20–30% cluster cost reductions simply by installing OpenCost and reviewing VPA recommendations is genuinely common. My own 30% savings in the staging namespace is a firsthand example.

Three steps you can take right now:

Install OpenCost and check your first report Install with a single line — helm install opencost opencost/opencost --namespace opencost --create-namespace — then run kubectl cost namespace --window 7d to see per-namespace costs for the past 7 days. There will almost certainly be at least one namespace higher than you expected.
Apply VPA recommendation mode to your highest-cost workloads Attaching VPA with updateMode: Off to your most expensive workloads lets you safely review recommendations. After collecting data for a week or two, if the recommended requests values are significantly lower than your current settings, that's your immediate savings opportunity.
Build a shared cost dashboard for your team Import Grafana dashboard ID 22208 and create a shared cost board for your team. Once the numbers are visible, team culture starts to shift. The first step in FinOps is always visibility.

Next post: Combining the OpenCost cost data covered in this post with Karpenter — we'll look at patterns for increasing Spot Instance utilization on AWS EKS and achieving additional cost savings at the node level.

References

Kubernetes Cost Optimization in Practice — From Namespace-Level Cost Tracking with OpenCost & Kubecost to HPA/VPA Tuning | DEV BAK - 기술블로그

DevOps

Kubernetes Cost Optimization in Practice — From Namespace-Level Cost Tracking with OpenCost & Kubecost to HPA/VPA Tuning

Core Concepts

Why Kubernetes Cost Allocation Is Hard

Two concepts come up frequently here:

Showback vs Chargeback: Showback means "showing" teams their costs without actual billing. Chargeback means actually charging those costs against a team's budget. Most organizations start with Showback and move to Chargeback as the culture matures.

OpenCost vs Kubecost — Which Should You Use?

Item	OpenCost	Kubecost
License	Open source (Apache 2.0)	Commercial (based on OpenCost spec)
Price	Free	From $449/month (Business)
Multi-cluster	Limited	Fully supported
RI/Spot discount reflection	Not supported	AWS/GCP/Azure billing integration
RBAC / SSO	Not supported	Supported
Savings recommendations	None	Available (includes right-sizing)
Budget alerts	None	Available

FinOps: Short for Financial Operations. A cultural and organizational practice that manages cloud costs not just through monitoring, but through team accountability frameworks and policies.

HPA and VPA — The Core Levers of Cost Optimization

HPA (Horizontal Pod Autoscaler): Scales the number of pod replicas up or down based on CPU or custom metrics (RPS, queue length, etc.). Its strength is fast reaction to traffic changes.
VPA (Vertical Pod Autoscaler): Adjusts the CPU/memory requests and limits of individual pods to match actual usage patterns. Particularly suited for reducing over-provisioned resources.

Requests vs Limits: Requests are the amount of resources guaranteed to a pod at scheduling time; Limits are the maximum a pod can use. In most cases, costs are calculated based on Requests, not Limits. However, this can vary by cloud provider and billing model, so it's worth checking your provider's documentation. Reducing Requests to match actual usage is the core of cost optimization.

Practical Application

Example 1: Installing OpenCost and Querying Per-Namespace Costs

bash

# Add the OpenCost Helm repository
helm repo add opencost https://opencost.github.io/opencost-helm-chart
helm repo update
 
# Install into the opencost namespace
helm install opencost opencost/opencost \
  --namespace opencost --create-namespace \
  --set opencost.exporter.defaultClusterId=my-cluster
 
# Install the kubectl cost plugin
# kubectl krew is a kubectl plugin manager — https://krew.sigs.k8s.io/docs/user-guide/setup/install/
kubectl krew install cost
 
# Query per-namespace costs for the last 7 days
kubectl cost namespace --window 7d

Below is an example of what this command might return. The figures are based on a large enterprise cluster, but the patterns hold at any scale when viewed proportionally.

Namespace	Monthly Cost (Projected)	CPU Cost	Memory Cost
analytics	$22,000	$14,000	$8,000
payments	$18,000	$12,000	$6,000
staging	$10,000	$6,500	$3,500

Example 2: Setting Up a Cost Dashboard with Prometheus + Grafana

OpenCost exports Prometheus metrics on port :9003 by default. Add the following configuration to your prometheus.yml:

yaml

# prometheus.yml scrape configuration
scrape_configs:
  - job_name: 'opencost'
    scrape_interval: 1m
    static_configs:
      # Format: servicename.namespace.svc, 9003 is the default OpenCost metrics exporter port
      - targets: ['opencost.opencost.svc:9003']

Example 3: Using VPA Recommendation Mode + Separate HPA to Avoid Conflicts

The pattern that has become established in the industry is to clearly separate their roles: HPA handles CPU-based horizontal scaling, and VPA handles memory sizing recommendations.

yaml

# VPA: Memory resource optimization (recommendation mode — changes applied manually)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa
  namespace: payments
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Off"   # Provides recommendations only, no automatic application
  resourcePolicy:
    containerPolicies:
      - containerName: api
        mode: "Off"     # Disables automatic updates at the container level
        minAllowed:
          cpu: "100m"
          memory: "128Mi"
        maxAllowed:
          cpu: "2"
          memory: "2Gi"

yaml

# HPA: CPU-based horizontal replica scaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
  namespace: payments
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60

Item	Setting	Reason
VPA updateMode	`Off`	Review recommendations manually before applying
VPA containerPolicies mode	`Off`	Disable automatic updates per container
HPA metric	CPU	Memory reacts slowly and can cause pod restarts
HPA minReplicas	2	Prevent single point of failure
VPA maxAllowed	cpu: 2, memory: 2Gi	Prevent unexpectedly excessive resource allocation

Example 4: Blocking Resource Creation on Namespace Budget Overrun with Kyverno (Advanced)

Important: The YAML below is pseudo-code intended to illustrate the concept. {{ opencost.namespace_cost }} is not a valid variable reference syntax in actual Kyverno. In a real implementation, you would use Kyverno's External Data feature along with JMESPath expressions to call the OpenCost API. For a working implementation, see the Nirmata guide.

yaml

# Conceptual example — this is not working code
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: block-overspending-ns
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-namespace-cost
      match:
        resources:
          kinds: ["Pod"]
      validate:
        message: "Namespace budget exceeded. Contact FinOps team."
        deny:
          conditions:
            - key: "{{ opencost.namespace_cost }}"  # Replace with external data source in real implementation
              operator: GreaterThan
              value: "5000"

Kyverno: A Kubernetes-native policy engine. Policies are written in YAML and operate as an admission controller, allowing you to block or audit policy violations in real time during resource creation and modification.

Pros and Cons

Advantages

Item	Details
Immediate visibility	Real-time cost visibility at the namespace, pod, and label level
Waste detection	Identifies hidden waste like full-capacity staging environments and over-provisioned requests
FinOps culture foundation	Easy to introduce team accountability, from Showback to Chargeback
Autoscaler improvement	Immediate cost savings when optimizing requests based on VPA recommendations
Policy-based control	Can automatically block budget overruns via Kyverno integration (advanced)

Disadvantages and Caveats

Item	Details	Mitigation
OpenCost discount gap	RI/Spot discounts not reflected — costs shown higher than actual	Use Kubecost or AWS Cost Explorer alongside
Prometheus dependency	Prometheus required for OpenCost's default operation	Evaluate the Collector Datasource (beta) lightweight option
VPA restart issue	`Auto` mode restarts pods to change requests	Start with `updateMode: Off` and transition gradually
Kubecost pricing	$449/month+ is a burden; pricing policy opaque after IBM acquisition	OpenCost Free tier is sufficient for small teams
HPA memory metric risk	Memory-based HPA can trigger pod restart loops	Replace with CPU or custom metrics (RPS, queue length)

Reserved Instance (RI): Reserving an instance with a 1- or 3-year commitment from a cloud provider can yield discounts of up to 60–70% compared to on-demand pricing. Because OpenCost does not automatically reflect these discounts, there can be a discrepancy with your actual billed amount.

Top 3 Mistakes to Avoid

Activating HPA and VPA on the same metric simultaneously — If both autoscalers are tuning CPU at the same time, a feedback loop causes pod counts and resource allocations to oscillate unpredictably. Strongly recommend separating metric responsibilities.
Omitting minAllowed / maxAllowed in VPA — Running VPA without boundary values can lead to unexpected OOM kills or, conversely, excessive CPU allocation. It's best to define a safe range from the start.
Installing cost tracking tools without setting up alerts — No matter how beautiful the dashboard, it's useless if no one's watching it. Connecting Kubecost budget alerts or Grafana threshold alerts to a Slack channel lets you catch anomalies quickly without weekly reviews.

Closing Thoughts

Three steps you can take right now:

Install OpenCost and check your first report Install with a single line — helm install opencost opencost/opencost --namespace opencost --create-namespace — then run kubectl cost namespace --window 7d to see per-namespace costs for the past 7 days. There will almost certainly be at least one namespace higher than you expected.
Apply VPA recommendation mode to your highest-cost workloads Attaching VPA with updateMode: Off to your most expensive workloads lets you safely review recommendations. After collecting data for a week or two, if the recommended requests values are significantly lower than your current settings, that's your immediate savings opportunity.
Build a shared cost dashboard for your team Import Grafana dashboard ID 22208 and create a shared cost board for your team. Once the numbers are visible, team culture starts to shift. The first step in FinOps is always visibility.

Next post: Combining the OpenCost cost data covered in this post with Karpenter — we'll look at patterns for increasing Spot Instance utilization on AWS EKS and achieving additional cost savings at the node level.

Core Concepts

Why Kubernetes Cost Allocation Is Hard

OpenCost vs Kubecost — Which Should You Use?

HPA and VPA — The Core Levers of Cost Optimization

Practical Application

Example 1: Installing OpenCost and Querying Per-Namespace Costs

Example 2: Setting Up a Cost Dashboard with Prometheus + Grafana

Example 3: Using VPA Recommendation Mode + Separate HPA to Avoid Conflicts

Example 4: Blocking Resource Creation on Namespace Budget Overrun with Kyverno (Advanced)

Pros and Cons

Advantages

Disadvantages and Caveats

Top 3 Mistakes to Avoid

Closing Thoughts

References

Core Concepts

Why Kubernetes Cost Allocation Is Hard

OpenCost vs Kubecost — Which Should You Use?

HPA and VPA — The Core Levers of Cost Optimization

Practical Application

Example 1: Installing OpenCost and Querying Per-Namespace Costs

Example 2: Setting Up a Cost Dashboard with Prometheus + Grafana

Example 3: Using VPA Recommendation Mode + Separate HPA to Avoid Conflicts

Example 4: Blocking Resource Creation on Namespace Budget Overrun with Kyverno (Advanced)

Pros and Cons

Advantages

Disadvantages and Caveats

Top 3 Mistakes to Avoid

Closing Thoughts

References

Recommended Posts

Pattern Guide: Reducing EKS Spot Costs by 56% with OpenCost + Karpenter

Reducing Karpenter Costs by Up to 56% Through Kubernetes Resource Right-Sizing with Goldilocks + VPA

Building a GitOps Pipeline to Automate Goldilocks VPA Recommendations with Argo CD Pull Request Generator

MLOps Model Deployment Automation: Building a CI/CD/CT Pipeline with GitHub Actions + Kubeflow

WebAssembly (Wasm) Serverless: The Complete Guide — Sub-1ms Cold Starts to Kubernetes Deployment

FinOps Practical Guide: Preventing Bill Shock Through Cloud Cost Optimization