Tail Sampling + KEDA: A 2-Tier OTel Architecture That Never Misses a Trace During Traffic Spikes
The most nerve-wracking moment in a distributed system is exactly when traffic explodes. When a flash sale kicks off or a viral event hits, your observability pipeline must handle the most data while being the most likely to collapse. If you rely on head sampling — which randomly drops traces the moment a request starts — the requests that actually caused problems vanish without a trace.
This article covers how to combine OpenTelemetry Tail Sampling with KEDA's event-driven autoscaling into a two-stage pipeline (2-tier architecture) that precisely preserves meaningful traces even during traffic spikes while automatically scaling your infrastructure. It includes the context around Kedify's otel-add-on (introduced in 2024) that makes this combination far easier to implement in production, along with YAML examples and operational tips you can apply directly to a real Kubernetes environment.
By the end, you'll understand why Tail Sampling's statefulness blocks naive horizontal scaling — and how a 2-tier Collector architecture combined with KEDA's push-based metrics elegantly solves that constraint.
Core Concepts
What Is Tail Sampling
In distributed tracing, sampling is the process of deciding "which requests' traces should we store?" The simplest approach, head sampling, makes a random keep-or-drop decision the moment a request begins. It's easy to implement, but has a fatal flaw: because the decision is made before you know whether the request actually errored or was slow, the most important traces can be discarded.
Tail sampling inverts this approach. It waits until all spans that make up a trace have been collected, then evaluates the entire trace before deciding whether to keep it. You can base the decision on whether there was an error, whether latency exceeded a threshold, or whether specific attributes are present — reducing noise while reliably preserving valuable traces.
Glossary: A span represents a single unit of work; multiple spans form a trace. For example, "HTTP request → DB query → cache lookup" are each a span, and the bundle of those three spans is the trace.
The Core Constraint of Tail Sampling: Statefulness
As powerful as tail sampling is, there is one operationally important constraint. All spans belonging to the same TraceID must arrive at the same Collector instance. Making a decision requires the complete trace to be in one place.
This is what prevents naive horizontal scaling. If you simply run multiple Collector instances, spans from the same trace get distributed across different instances, and no single instance ever has a complete trace.
# ❌ Wrong setup: spans scatter randomly
App → [Round-Robin LB] → Collector-0, Collector-1, Collector-2
(spans from the same TraceID end up split apart)
# ✅ Correct setup: consistent routing by TraceID
App → [TraceID Consistent-Hash LB] → Collector-0 (owns TraceID-A)
→ Collector-1 (owns TraceID-B)
→ Collector-2 (owns TraceID-C)The loadbalancing exporter determines the routing target using a consistent hash algorithm keyed on TraceID. This ensures spans for a given TraceID always go to the same instance, even as the Collector cluster size changes. However, when scaling a StatefulSet in or out, the DNS list changes and a brief rebalancing occurs. During this window, some spans may be delivered to a different instance — setting a generous decision_wait minimizes the impact.
KEDA: Extending Event-Driven Autoscaling
KEDA (Kubernetes Event-Driven Autoscaling) overcomes the limitation of Kubernetes' built-in HPA (Horizontal Pod Autoscaler), which only understands CPU and memory metrics. KEDA supports 70+ external event sources as triggers — Kafka queue depth, Prometheus metrics, OpenTelemetry metrics, and more — and also supports Scale-to-Zero, reducing replicas all the way to zero when there is no traffic.
Push vs Pull scaling: The traditional Prometheus approach has KEDA pull metrics every 15–30 seconds before acting. With OTLP Push, the Collector sends metrics directly to KEDA, enabling scaling reactions within seconds.
KEDA v2.12 Built-in OTel Integration vs Kedify otel-add-on: What's the Difference
The names are similar and easy to confuse, but the two features have completely different purposes.
| KEDA v2.12 Built-in OTel Integration | Kedify otel-add-on | |
|---|---|---|
| Direction | KEDA → OTel Collector (exports KEDA's internal metrics via OTel) | OTel Collector → KEDA (uses Collector metrics as scale triggers) |
| Purpose | Improves observability of KEDA itself | Autoscales Collectors based on their throughput |
| Status | Experimental, built into KEDA v2.12+ | Open source (Apache 2.0), provided by Kedify, requires separate installation |
| Use case | When you want to trace KEDA's own scaling behavior | When you want to scale OTel Collectors based on metrics |
The pattern implemented in this article uses the Kedify otel-add-on approach. Kedify is a company specializing in the KEDA ecosystem, and otel-add-on is an Apache 2.0 open-source project. It acts as a bridge: receiving OTel Collector metrics via OTLP, aggregating them with PromQL-like queries, and delivering the result to KEDA over gRPC.
2-Tier Architecture: Solving the Constraint by Design
The answer that satisfies both the statefulness requirement and the autoscaling requirement simultaneously is a 2-tier structure with clearly separated responsibilities.
[Application Pods]
│ OTLP/gRPC
▼
┌─────────────────────────────────────────────┐
│ Tier 1: Gateway Collector (Deployment) │
│ - loadbalancing exporter (TraceID hash) │
│ - Stateless → freely scalable with KEDA │
└───────────────────┬─────────────────────────┘
│ Headless Service DNS
▼
┌─────────────────────────────────────────────┐
│ Tier 2: Tail Sampling Collector (StatefulSet)│
│ - tailsamplingprocessor │
│ - memory_limiter processor │
│ - Stable DNS guarantees TraceID affinity │
└───────────────────┬─────────────────────────┘
│
▼
[Jaeger / Grafana Tempo / Datadog]Tier 1 (Gateway) hashes TraceIDs with the loadbalancing exporter and consistently delivers spans to a specific Tier 2 instance. Because it is stateless, it can be scaled freely with KEDA or HPA.
Tier 2 (Tail Sampling Backend) is composed of a StatefulSet and Headless Service. Each Pod has a stable DNS name — otel-tail-sampling-0, otel-tail-sampling-1, etc. — so Tier 1 can always route to the same instance.
Glossary: A Headless Service is a Kubernetes service configured with
clusterIP: Nonethat exposes each Pod's DNS individually. It is essential whenever each Pod in a StatefulSet needs a unique, stable address.
Practical Application
Prerequisites: To apply the examples below, your Kubernetes cluster must already have the
observabilitynamespace, the KEDA operator, and the Kedify otel-add-on installed. The ConfigMap and StatefulSet YAMLs must always be applied together (kubectl apply -f) for correct operation.
Step 1: Configure the Gateway Collector
The heart of the Tier 1 Collector is the loadbalancingexporter. It consistently hashes the TraceID of incoming spans and routes them to a fixed instance in the Tier 2 StatefulSet.
# otel-gateway-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-gateway-config
namespace: observability
data:
config.yaml: |
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
memory_limiter:
check_interval: 1s
limit_mib: 512
spike_limit_mib: 128
exporters:
loadbalancing:
protocol:
otlp:
tls:
insecure: true
resolver:
dns:
# Headless Service DNS of the Tier 2 StatefulSet
hostname: otel-tail-sampling-headless.observability.svc.cluster.local
port: 4317
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter]
exporters: [loadbalancing]| Setting | Role |
|---|---|
loadbalancing.resolver.dns |
Dynamically discovers the Tier 2 Pod list via the Headless Service |
memory_limiter |
Sets a memory ceiling to prevent OOM during traffic spikes |
tls.insecure: true |
Skips TLS for intra-cluster traffic (configure separately for production) |
Step 2: Configure the Tail Sampling StatefulSet
Apply the following two files (otel-tail-sampling-statefulset.yaml and otel-tail-sampling-config.yaml) together.
# otel-tail-sampling-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: otel-tail-sampling
namespace: observability
spec:
serviceName: otel-tail-sampling-headless # Links to the Headless Service
replicas: 3
selector:
matchLabels:
app: otel-tail-sampling
template:
metadata:
labels:
app: otel-tail-sampling
spec:
containers:
- name: collector
# Pin a specific version instead of latest to maintain reproducible builds.
# Latest releases: https://github.com/open-telemetry/opentelemetry-collector-contrib/releases
image: otel/opentelemetry-collector-contrib:v0.120.0
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "1"
volumeMounts:
- name: config
mountPath: /conf
volumes:
- name: config
configMap:
name: otel-tail-sampling-config
---
# Headless Service: gives each Pod its own DNS entry
apiVersion: v1
kind: Service
metadata:
name: otel-tail-sampling-headless
namespace: observability
spec:
clusterIP: None # The key Headless setting
selector:
app: otel-tail-sampling
ports:
- name: otlp-grpc
port: 4317
targetPort: 4317 # Specified explicitly to avoid ambiguity
---
# PodDisruptionBudget: guarantees minimum instances during rolling updates
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: otel-tail-sampling-pdb
namespace: observability
spec:
minAvailable: 2
selector:
matchLabels:
app: otel-tail-samplingThe sampling policy ConfigMap for the Tier 2 Collector:
# otel-tail-sampling-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-tail-sampling-config
namespace: observability
data:
config.yaml: |
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
processors:
memory_limiter:
check_interval: 1s
limit_mib: 800
spike_limit_mib: 200
tail_sampling:
# decision_wait (30s) × expected_new_traces_per_sec (1,000/s)
# = ~30,000 traces waiting for a decision at peak
# num_traces (100,000) is set to 3× that estimate for headroom.
# Also adjust memory limits to (avg trace size × num_traces).
decision_wait: 30s
num_traces: 100000
expected_new_traces_per_sec: 1000
policies:
# Always preserve traces that contain errors
- name: error-policy
type: status_code
status_code: {status_codes: [ERROR]}
# Preserve slow requests that took more than 2 seconds
- name: slow-traces-policy
type: latency
latency: {threshold_ms: 2000}
# Sample only 10% of everything else
- name: probabilistic-policy
type: probabilistic
probabilistic: {sampling_percentage: 10}
exporters:
otlp:
endpoint: jaeger-collector.observability.svc:4317
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, tail_sampling]
exporters: [otlp]| Policy | Condition | Retention |
|---|---|---|
error-policy |
HTTP 5xx, gRPC Error status | 100% |
slow-traces-policy |
Response time exceeds 2 seconds | 100% |
probabilistic-policy |
Normal requests that don't match the above | 10% |
decision_waitand memory:decision_wait(default 30s) ×expected_new_traces_per_sec(1,000/s) = ~30,000 traces waiting for a decision at any given time.num_tracesshould be set comfortably higher than this, and memorylimitsshould be tuned alongside it based on (average trace size ×num_traces). For services with fast traffic patterns, reducingdecision_waitto 10–15 seconds lowers memory footprint.
Step 3: Wire Up KEDA ScaledObject + Kedify otel-add-on
ScaledObjects that autoscale the Tier 1 Gateway Collector based on received span count, and the Tier 2 Tail Sampling Collector based on buffered trace count.
# keda-gateway-scaledobject.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: otel-gateway-scaler
namespace: observability
spec:
scaleTargetRef:
name: otel-gateway-collector # Name of the Tier 1 Deployment
minReplicaCount: 2 # Avoid cold start: keep at least 2
maxReplicaCount: 20
pollingInterval: 5 # Check metrics every 5 seconds
cooldownPeriod: 60 # Wait 60 seconds before scaling in
triggers:
- type: external
metadata:
scalerAddress: kedify-otel-add-on.observability.svc:4318
# rate() computes the 1-minute average instantaneous receive rate (spans/sec).
# KEDA ceiling-divides (returned value / targetValue) to determine replica count.
metricQuery: |
sum(rate(otelcol_receiver_accepted_spans{receiver="otlp"}[1m]))
targetValue: "10000" # Target spans/sec per instance
---
# Tier 2 Tail Sampling: scale based on number of traces in flight
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: otel-tail-sampling-scaler
namespace: observability
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: StatefulSet
name: otel-tail-sampling
minReplicaCount: 2
maxReplicaCount: 10
triggers:
- type: external
metadata:
scalerAddress: kedify-otel-add-on.observability.svc:4318
# Use the count metric rather than histogram buckets (_bucket)
# to get a single meaningful aggregate.
# Verify whether this metric is a cumulative counter or gauge in your environment.
metricQuery: |
sum(otelcol_processor_tail_sampling_num_traces_sampled)
targetValue: "80000" # Cluster-wide trace processing thresholdKedify otel-add-on data flow:
OTel Collector (pushes metrics via OTLP)
│ port 4317
▼
┌─────────────────────────────┐
│ kedify-otel-add-on │
│ - Internal TSDB (ring buf) │
│ - PromQL-like aggregation │
└──────────────┬──────────────┘
│ gRPC External Scaler (port 4318)
▼
KEDA Operator
│
▼
Replica count determined and appliedPros and Cons
Advantages
| Item | Details |
|---|---|
| Precise sampling | Preserves only meaningful traces based on errors, latency, and attributes — reduces storage costs |
| Sub-second scaling reaction | OTLP Push eliminates polling delay; reaction speed improves 10× or more vs pull-based approaches |
| Cost optimization | Replica count can be minimized during low load (controlled via minReplicaCount) |
| Flexible triggers | Fine-grained criteria using the Collector's own metrics (queue size, received spans, memory usage) |
| Standards-based | OpenTelemetry + KEDA combination avoids vendor lock-in; swap Jaeger, Tempo, Datadog freely |
Disadvantages and Caveats
| Item | Details | Mitigation |
|---|---|---|
| Tier 2 StatefulSet scaling constraints | Cannot apply HPA freely; risk of span loss within the trace window (30s) during scaling | Set PodDisruptionBudget (minAvailable: 2); shorten decision_wait to 10–15s to minimize impact |
| Memory spikes | Increased buffering during traffic bursts raises OOM risk | memory_limiter is mandatory; use the bytes_limiting policy to cap large traces |
| Cold start latency | After Scale-to-Zero, first request processing takes seconds to tens of seconds; spans can be lost during that window | Never use minReplicaCount: 0 in production. Keep at least 2 Gateway and 2 Tail Sampling instances |
| DNS sync delay | After a StatefulSet change, the loadbalancing exporter takes a few seconds to re-resolve DNS | Kubernetes DNS default TTL is 30s. Tune the Gateway's dns_refresh_delay to 10–15s and verify dnsPolicy: ClusterFirst |
| KEDA OTel integration experimental status | KEDA's built-in OTel metric export feature is Experimental | Use Kedify otel-add-on (stable) as the alternative; run thorough load tests before going to production |
Terminology note:
decision_waitis the maximum time the Tail Sampling Processor waits for all spans of a trace to arrive before making a sampling decision. The default is 30 seconds; services with fast traffic patterns can lower it to 10–15 seconds to reduce memory footprint.
The Most Common Mistakes in Production
-
Using a plain Deployment for the Tier 2 Collector: If you use a Deployment instead of a StatefulSet, stable DNS is not preserved when Pods are replaced, causing traces to scatter. Always use the StatefulSet + Headless Service combination.
-
Running production with
minReplicaCount: 0: Scale-to-Zero is attractive from a cost perspective, but when traffic arrives while the Collector is completely off, spans are irrecoverably lost during the cold start. If the Tier 2 Tail Sampling layer is down, spans that arrive in the meantime never get a sampling decision at all. Keep at least 2 instances each in Gateway Tier 1 and Tail Sampling Tier 2. -
Operating without
memory_limiter: If a traffic spike causes the Collector to buffer traces until it OOMs and restarts, all traces accumulated at that moment are lost. Configuringmemory_limiteris not optional — it is mandatory.
Closing Thoughts
Combining Tail Sampling with KEDA is more than a technology pairing. It is a pattern that solves two problems in a single architecture: "when and which data is valuable?" and "how do we elastically operate the infrastructure to process that data?" Through the 2-tier structure, KEDA ScaledObject, and Kedify otel-add-on combination covered in this article, you can build a pipeline that never misses an important trace even during traffic spikes.
Three steps you can take right now:
-
Practice the 2-tier structure locally: Spin up a local Kubernetes cluster with
kindorminikube, then apply the manifests from the KubeCon EU 2024 Sampling Tutorial. A fewkubectl apply -fcommands are all it takes to see the 2-tier pipeline working firsthand. -
Once Step 1 is working, try adding an OTel metrics trigger to a KEDA ScaledObject you're running in production. The Helm chart from Kedify otel-add-on GitHub lets you deploy the bridge component in under 10 minutes.
-
Incrementally refine your sampling policies to match your service's characteristics: Start with just two policies — errors (
status_code: ERROR) and slow requests (latency) — then observe actual trace retention rates and storage costs before gradually tuning theprobabilisticpolicy's sampling percentage. This approach keeps operational burden low.
Next article: Building a pipeline that auto-generates RED metrics (Rate, Errors, Duration) from traces using OpenTelemetry Collector's
spanmetricsconnector and connects them to a Grafana dashboard
References
Official Documentation
- Scaling the Collector | OpenTelemetry
- Sampling Concepts | OpenTelemetry
- tailsamplingprocessor README | opentelemetry-collector-contrib
- OpenTelemetry Collector Integration (Experimental) | KEDA
- ScaledObject Specification | KEDA
- otel-add-on | Kedify GitHub
Hands-on Guides
- OpenTelemetry Kubernetes Tracing Tutorial - Sampling | KubeCon EU 2024
- Scaling KEDA with OTel Collector Metrics | Kedify Blog
- OpenTelemetry Scaler Documentation | Kedify
- KEDA + OTel Custom Metrics Autoscaling Setup | oneuptime Blog
- Kubernetes Autoscaling with KEDA + Amazon Managed Prometheus | AWS Blog