Kubernetes Cost Optimization in Practice — From Namespace-Level Cost Tracking with OpenCost & Kubecost to HPA/VPA Tuning
If you've ever stared blankly at a cloud bill wondering "where did all of this come from?", you're not alone. When I first started operating a Kubernetes cluster, I saw the number on our AWS invoice and tried to break down costs by team — only to hit a wall with the node-based billing model. In an environment where dozens of teams share a single CPU and a handful of memory, figuring out "how much did the payments team spend?" turns out to be a surprisingly complex problem.
By the end of this post, you'll be able to install OpenCost in under 10 minutes and pull a per-team cost report today. Using this approach, I cut our staging namespace costs by 30% in a single month — and 20–30% savings stories like this are far more common in the industry than you might think. This post walks through how to track per-namespace costs with OpenCost and Kubecost, improve your HPA/VPA autoscaler configuration, and drive real cost savings — with practical, hands-on examples.
The content is primarily aimed at teams running shared clusters, but the principles apply at any cost scale as ratios. If the example dollar figures seem large, just read them as "the proportions for our cluster."
Core Concepts
Why Kubernetes Cost Allocation Is Hard
One of the ironies of cloud-native environments is that the more efficiently you share resources, the harder cost tracking becomes. Back when one service ran on one EC2 instance, cost attribution was straightforward. But Kubernetes has dozens of pods sharing a single node. AWS bills at the node level, so separating the cost of each team's workloads running on that node requires a dedicated cost allocation layer.
Two concepts come up frequently here:
Showback vs Chargeback: Showback means "showing" teams their costs without actual billing. Chargeback means actually charging those costs against a team's budget. Most organizations start with Showback and move to Chargeback as the culture matures.
OpenCost vs Kubecost — Which Should You Use?
Honestly, these two tools are less competitors and more the two ends of a spectrum. Kubecost is a commercial product that evolved from the OpenCost spec, differing mainly in how many enterprise features are layered on top. That said, Kubecost runs its own separate data collection layer, so it's more accurate to say they both started from a common spec and evolved in their own directions, rather than Kubecost using OpenCost as its core engine.
| Item | OpenCost | Kubecost |
|---|---|---|
| License | Open source (Apache 2.0) | Commercial (based on OpenCost spec) |
| Price | Free | From $449/month (Business) |
| Multi-cluster | Limited | Fully supported |
| RI/Spot discount reflection | Not supported | AWS/GCP/Azure billing integration |
| RBAC / SSO | Not supported | Supported |
| Savings recommendations | None | Available (includes right-sizing) |
| Budget alerts | None | Available |
For a startup with five or fewer engineers, OpenCost + Grafana is more than enough. For an organization with dozens of teams sharing a cluster and looking to introduce FinOps culture, Kubecost's enterprise features really shine.
FinOps: Short for Financial Operations. A cultural and organizational practice that manages cloud costs not just through monitoring, but through team accountability frameworks and policies.
HPA and VPA — The Core Levers of Cost Optimization
If cost tracking is about measuring "how much you're spending now," autoscalers are tools for improving "how efficiently you're spending it." A situation I encounter often in practice: when you look at overall cluster CPU usage, actual utilization relative to requests often sits somewhere between 20–45%. In other words, a lot is reserved but not much is actually used.
- HPA (Horizontal Pod Autoscaler): Scales the number of pod replicas up or down based on CPU or custom metrics (RPS, queue length, etc.). Its strength is fast reaction to traffic changes.
- VPA (Vertical Pod Autoscaler): Adjusts the CPU/memory requests and limits of individual pods to match actual usage patterns. Particularly suited for reducing over-provisioned resources.
Requests vs Limits: Requests are the amount of resources guaranteed to a pod at scheduling time; Limits are the maximum a pod can use. In most cases, costs are calculated based on Requests, not Limits. However, this can vary by cloud provider and billing model, so it's worth checking your provider's documentation. Reducing Requests to match actual usage is the core of cost optimization.
Practical Application
Now that we've covered the concepts, let's put them into practice. The four examples below are ordered by complexity. Examples 1–3 can be applied immediately by most teams, while Example 4 is an advanced pattern that elevates cost control to the policy level.
Example 1: Installing OpenCost and Querying Per-Namespace Costs
With Helm, installation itself is fairly straightforward. First, Prometheus must be installed in your cluster so OpenCost can collect metrics. If you don't have Prometheus, you can get started quickly with the kube-prometheus-stack Helm chart. Once Prometheus is ready, you can see your first cost data from OpenCost in under 10 minutes.
# Add the OpenCost Helm repository
helm repo add opencost https://opencost.github.io/opencost-helm-chart
helm repo update
# Install into the opencost namespace
helm install opencost opencost/opencost \
--namespace opencost --create-namespace \
--set opencost.exporter.defaultClusterId=my-cluster
# Install the kubectl cost plugin
# kubectl krew is a kubectl plugin manager — https://krew.sigs.k8s.io/docs/user-guide/setup/install/
kubectl krew install cost
# Query per-namespace costs for the last 7 days
kubectl cost namespace --window 7dBelow is an example of what this command might return. The figures are based on a large enterprise cluster, but the patterns hold at any scale when viewed proportionally.
| Namespace | Monthly Cost (Projected) | CPU Cost | Memory Cost |
|---|---|---|---|
| analytics | $22,000 | $14,000 | $8,000 |
| payments | $18,000 | $12,000 | $6,000 |
| staging | $10,000 | $6,500 | $3,500 |
I was confused at first too — seeing the staging namespace come in at $10,000 raised some eyebrows. When I investigated, it turned out to be running at full capacity through nights and weekends. That single report kicked off a team conversation. After applying nighttime scale-down via a CronJob, costs dropped by nearly 30% in a month.
Example 2: Setting Up a Cost Dashboard with Prometheus + Grafana
OpenCost exports Prometheus metrics on port :9003 by default. Add the following configuration to your prometheus.yml:
# prometheus.yml scrape configuration
scrape_configs:
- job_name: 'opencost'
scrape_interval: 1m
static_configs:
# Format: servicename.namespace.svc, 9003 is the default OpenCost metrics exporter port
- targets: ['opencost.opencost.svc:9003']This tells Prometheus to scrape OpenCost metrics every minute. Then, importing dashboard ID 22208 in Grafana gives you immediate visualization of costs by cluster, namespace, and pod — no custom queries needed, making this the fastest path to a cost dashboard.
Example 3: Using VPA Recommendation Mode + Separate HPA to Avoid Conflicts
Running HPA and VPA simultaneously is fine, but if both autoscalers are tuning the same metric (CPU or memory) at the same time, a feedback loop can form that degrades both cost and stability. This actually happened to our team: HPA would scale up pods to reduce CPU utilization, then VPA would increase the CPU allocation per pod, which would trigger HPA again — causing oscillation.
The pattern that has become established in the industry is to clearly separate their roles: HPA handles CPU-based horizontal scaling, and VPA handles memory sizing recommendations.
# VPA: Memory resource optimization (recommendation mode — changes applied manually)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-server-vpa
namespace: payments
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: "Off" # Provides recommendations only, no automatic application
resourcePolicy:
containerPolicies:
- containerName: api
mode: "Off" # Disables automatic updates at the container level
minAllowed:
cpu: "100m"
memory: "128Mi"
maxAllowed:
cpu: "2"
memory: "2Gi"The reason for explicitly setting mode: "Off" in the container policy — separate from updateMode: "Off" which disables overall VPA behavior — is that it enables fine-grained control per container. Omitting this field when first rolling out VPA can cause confusion later when you try to switch only specific containers to Auto mode.
# HPA: CPU-based horizontal replica scaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server-hpa
namespace: payments
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60| Item | Setting | Reason |
|---|---|---|
| VPA updateMode | Off |
Review recommendations manually before applying |
| VPA containerPolicies mode | Off |
Disable automatic updates per container |
| HPA metric | CPU | Memory reacts slowly and can cause pod restarts |
| HPA minReplicas | 2 | Prevent single point of failure |
| VPA maxAllowed | cpu: 2, memory: 2Gi | Prevent unexpectedly excessive resource allocation |
Example 4: Blocking Resource Creation on Namespace Budget Overrun with Kyverno (Advanced)
Going one step further from viewing cost data, it's also possible to enforce it as policy. Below is an example policy that integrates OpenCost with Kyverno to block new pod creation when a specific namespace exceeds its budget threshold.
Important: The YAML below is pseudo-code intended to illustrate the concept.
{{ opencost.namespace_cost }}is not a valid variable reference syntax in actual Kyverno. In a real implementation, you would use Kyverno's External Data feature along with JMESPath expressions to call the OpenCost API. For a working implementation, see the Nirmata guide.
# Conceptual example — this is not working code
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: block-overspending-ns
spec:
validationFailureAction: Enforce
rules:
- name: check-namespace-cost
match:
resources:
kinds: ["Pod"]
validate:
message: "Namespace budget exceeded. Contact FinOps team."
deny:
conditions:
- key: "{{ opencost.namespace_cost }}" # Replace with external data source in real implementation
operator: GreaterThan
value: "5000"Kyverno: A Kubernetes-native policy engine. Policies are written in YAML and operate as an admission controller, allowing you to block or audit policy violations in real time during resource creation and modification.
Pros and Cons
The most painful drawback I encountered running this in production was OpenCost's lack of discount reflection. We were getting a 40% discount on Reserved Instances, but the OpenCost dashboard was showing on-demand pricing — at first I was thrown off wondering "why is it showing so expensive?" I've broken this down in more detail in the table below.
Advantages
| Item | Details |
|---|---|
| Immediate visibility | Real-time cost visibility at the namespace, pod, and label level |
| Waste detection | Identifies hidden waste like full-capacity staging environments and over-provisioned requests |
| FinOps culture foundation | Easy to introduce team accountability, from Showback to Chargeback |
| Autoscaler improvement | Immediate cost savings when optimizing requests based on VPA recommendations |
| Policy-based control | Can automatically block budget overruns via Kyverno integration (advanced) |
Disadvantages and Caveats
| Item | Details | Mitigation |
|---|---|---|
| OpenCost discount gap | RI/Spot discounts not reflected — costs shown higher than actual | Use Kubecost or AWS Cost Explorer alongside |
| Prometheus dependency | Prometheus required for OpenCost's default operation | Evaluate the Collector Datasource (beta) lightweight option |
| VPA restart issue | Auto mode restarts pods to change requests |
Start with updateMode: Off and transition gradually |
| Kubecost pricing | $449/month+ is a burden; pricing policy opaque after IBM acquisition | OpenCost Free tier is sufficient for small teams |
| HPA memory metric risk | Memory-based HPA can trigger pod restart loops | Replace with CPU or custom metrics (RPS, queue length) |
Reserved Instance (RI): Reserving an instance with a 1- or 3-year commitment from a cloud provider can yield discounts of up to 60–70% compared to on-demand pricing. Because OpenCost does not automatically reflect these discounts, there can be a discrepancy with your actual billed amount.
Top 3 Mistakes to Avoid
-
Activating HPA and VPA on the same metric simultaneously — If both autoscalers are tuning CPU at the same time, a feedback loop causes pod counts and resource allocations to oscillate unpredictably. Strongly recommend separating metric responsibilities.
-
Omitting minAllowed / maxAllowed in VPA — Running VPA without boundary values can lead to unexpected OOM kills or, conversely, excessive CPU allocation. It's best to define a safe range from the start.
-
Installing cost tracking tools without setting up alerts — No matter how beautiful the dashboard, it's useless if no one's watching it. Connecting Kubecost budget alerts or Grafana threshold alerts to a Slack channel lets you catch anomalies quickly without weekly reviews.
Closing Thoughts
Cost optimization starts with making spending visible, and real savings come from tuning HPA/VPA. Without any major architectural changes, achieving 20–30% cluster cost reductions simply by installing OpenCost and reviewing VPA recommendations is genuinely common. My own 30% savings in the staging namespace is a firsthand example.
Three steps you can take right now:
-
Install OpenCost and check your first report Install with a single line —
helm install opencost opencost/opencost --namespace opencost --create-namespace— then runkubectl cost namespace --window 7dto see per-namespace costs for the past 7 days. There will almost certainly be at least one namespace higher than you expected. -
Apply VPA recommendation mode to your highest-cost workloads Attaching VPA with
updateMode: Offto your most expensive workloads lets you safely review recommendations. After collecting data for a week or two, if the recommended requests values are significantly lower than your current settings, that's your immediate savings opportunity. -
Build a shared cost dashboard for your team Import Grafana dashboard ID
22208and create a shared cost board for your team. Once the numbers are visible, team culture starts to shift. The first step in FinOps is always visibility.
Next post: Combining the OpenCost cost data covered in this post with Karpenter — we'll look at patterns for increasing Spot Instance utilization on AWS EKS and achieving additional cost savings at the node level.
References
- OpenCost Official Site | opencost.io
- OpenCost GitHub | github.com/opencost
- OpenCost 2025 Annual Review | opencost.io
- OpenCost Prometheus Exporter Configuration | opencost.io
- kubectl cost Plugin Documentation | opencost.io
- Grafana OpenCost Dashboard (ID: 22208) | grafana.com
- Kubecost vs OpenCost Comparison | CloudZero
- Kubecost vs OpenCost Comparison | Apptio
- Kubernetes Autoscaling HPA/VPA Guide | Sedai
- HPA vs VPA: 2025 Selection Guide | ScaleOps
- Kubernetes Cost Optimization Strategies 2025 | Sealos
- OpenCost + Prometheus + Grafana Practical Guide | hodovi.cc
- Policy-Based Cost Management with OpenCost and Kyverno | Nirmata
- Cloud Cost Governance: Kubecost, OpenCost, Infracost | Open Source For You
- EKS + OpenCost + AWS Managed Prometheus/Grafana Setup Case Study | Automat-it