Tail Sampling + KEDA: A 2-Tier OTel Architecture That Never Misses a Trace During Traffic Spikes

The most nerve-wracking moment in a distributed system is exactly when traffic explodes. When a flash sale kicks off or a viral event hits, your observability pipeline must handle the most data while being the most likely to collapse. If you rely on head sampling — which randomly drops traces the moment a request starts — the requests that actually caused problems vanish without a trace.

This article covers how to combine OpenTelemetry Tail Sampling with KEDA's event-driven autoscaling into a two-stage pipeline (2-tier architecture) that precisely preserves meaningful traces even during traffic spikes while automatically scaling your infrastructure. It includes the context around Kedify's otel-add-on (introduced in 2024) that makes this combination far easier to implement in production, along with YAML examples and operational tips you can apply directly to a real Kubernetes environment.

By the end, you'll understand why Tail Sampling's statefulness blocks naive horizontal scaling — and how a 2-tier Collector architecture combined with KEDA's push-based metrics elegantly solves that constraint.

Core Concepts

What Is Tail Sampling

In distributed tracing, sampling is the process of deciding "which requests' traces should we store?" The simplest approach, head sampling, makes a random keep-or-drop decision the moment a request begins. It's easy to implement, but has a fatal flaw: because the decision is made before you know whether the request actually errored or was slow, the most important traces can be discarded.

Tail sampling inverts this approach. It waits until all spans that make up a trace have been collected, then evaluates the entire trace before deciding whether to keep it. You can base the decision on whether there was an error, whether latency exceeded a threshold, or whether specific attributes are present — reducing noise while reliably preserving valuable traces.

Glossary: A span represents a single unit of work; multiple spans form a trace. For example, "HTTP request → DB query → cache lookup" are each a span, and the bundle of those three spans is the trace.

The Core Constraint of Tail Sampling: Statefulness

As powerful as tail sampling is, there is one operationally important constraint. All spans belonging to the same TraceID must arrive at the same Collector instance. Making a decision requires the complete trace to be in one place.

This is what prevents naive horizontal scaling. If you simply run multiple Collector instances, spans from the same trace get distributed across different instances, and no single instance ever has a complete trace.

python

# ❌ Wrong setup: spans scatter randomly
App → [Round-Robin LB] → Collector-0, Collector-1, Collector-2
                          (spans from the same TraceID end up split apart)
 
# ✅ Correct setup: consistent routing by TraceID
App → [TraceID Consistent-Hash LB] → Collector-0 (owns TraceID-A)
                                    → Collector-1 (owns TraceID-B)
                                    → Collector-2 (owns TraceID-C)

The loadbalancing exporter determines the routing target using a consistent hash algorithm keyed on TraceID. This ensures spans for a given TraceID always go to the same instance, even as the Collector cluster size changes. However, when scaling a StatefulSet in or out, the DNS list changes and a brief rebalancing occurs. During this window, some spans may be delivered to a different instance — setting a generous decision_wait minimizes the impact.

KEDA: Extending Event-Driven Autoscaling

KEDA (Kubernetes Event-Driven Autoscaling) overcomes the limitation of Kubernetes' built-in HPA (Horizontal Pod Autoscaler), which only understands CPU and memory metrics. KEDA supports 70+ external event sources as triggers — Kafka queue depth, Prometheus metrics, OpenTelemetry metrics, and more — and also supports Scale-to-Zero, reducing replicas all the way to zero when there is no traffic.

Push vs Pull scaling: The traditional Prometheus approach has KEDA pull metrics every 15–30 seconds before acting. With OTLP Push, the Collector sends metrics directly to KEDA, enabling scaling reactions within seconds.

KEDA v2.12 Built-in OTel Integration vs Kedify otel-add-on: What's the Difference

The names are similar and easy to confuse, but the two features have completely different purposes.

	KEDA v2.12 Built-in OTel Integration	Kedify otel-add-on
Direction	KEDA → OTel Collector (exports KEDA's internal metrics via OTel)	OTel Collector → KEDA (uses Collector metrics as scale triggers)
Purpose	Improves observability of KEDA itself	Autoscales Collectors based on their throughput
Status	Experimental, built into KEDA v2.12+	Open source (Apache 2.0), provided by Kedify, requires separate installation
Use case	When you want to trace KEDA's own scaling behavior	When you want to scale OTel Collectors based on metrics

The pattern implemented in this article uses the Kedify otel-add-on approach. Kedify is a company specializing in the KEDA ecosystem, and otel-add-on is an Apache 2.0 open-source project. It acts as a bridge: receiving OTel Collector metrics via OTLP, aggregating them with PromQL-like queries, and delivering the result to KEDA over gRPC.

2-Tier Architecture: Solving the Constraint by Design

The answer that satisfies both the statefulness requirement and the autoscaling requirement simultaneously is a 2-tier structure with clearly separated responsibilities.

[Application Pods]
        │ OTLP/gRPC
        ▼
┌─────────────────────────────────────────────┐
│  Tier 1: Gateway Collector (Deployment)      │
│  - loadbalancing exporter (TraceID hash)    │
│  - Stateless → freely scalable with KEDA    │
└───────────────────┬─────────────────────────┘
                    │ Headless Service DNS
                    ▼
┌─────────────────────────────────────────────┐
│  Tier 2: Tail Sampling Collector (StatefulSet)│
│  - tailsamplingprocessor                   │
│  - memory_limiter processor                │
│  - Stable DNS guarantees TraceID affinity   │
└───────────────────┬─────────────────────────┘
                    │
                    ▼
        [Jaeger / Grafana Tempo / Datadog]

Tier 1 (Gateway) hashes TraceIDs with the loadbalancing exporter and consistently delivers spans to a specific Tier 2 instance. Because it is stateless, it can be scaled freely with KEDA or HPA.

Tier 2 (Tail Sampling Backend) is composed of a StatefulSet and Headless Service. Each Pod has a stable DNS name — otel-tail-sampling-0, otel-tail-sampling-1, etc. — so Tier 1 can always route to the same instance.

Glossary: A Headless Service is a Kubernetes service configured with clusterIP: None that exposes each Pod's DNS individually. It is essential whenever each Pod in a StatefulSet needs a unique, stable address.

Practical Application

Prerequisites: To apply the examples below, your Kubernetes cluster must already have the observability namespace, the KEDA operator, and the Kedify otel-add-on installed. The ConfigMap and StatefulSet YAMLs must always be applied together (kubectl apply -f) for correct operation.

Step 1: Configure the Gateway Collector

The heart of the Tier 1 Collector is the loadbalancingexporter. It consistently hashes the TraceID of incoming spans and routes them to a fixed instance in the Tier 2 StatefulSet.

yaml

# otel-gateway-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-gateway-config
  namespace: observability
data:
  config.yaml: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
 
    processors:
      memory_limiter:
        check_interval: 1s
        limit_mib: 512
        spike_limit_mib: 128
 
    exporters:
      loadbalancing:
        protocol:
          otlp:
            tls:
              insecure: true
        resolver:
          dns:
            # Headless Service DNS of the Tier 2 StatefulSet
            hostname: otel-tail-sampling-headless.observability.svc.cluster.local
            port: 4317
 
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter]
          exporters: [loadbalancing]

Setting	Role
`loadbalancing.resolver.dns`	Dynamically discovers the Tier 2 Pod list via the Headless Service
`memory_limiter`	Sets a memory ceiling to prevent OOM during traffic spikes
`tls.insecure: true`	Skips TLS for intra-cluster traffic (configure separately for production)

Step 2: Configure the Tail Sampling StatefulSet

Apply the following two files (otel-tail-sampling-statefulset.yaml and otel-tail-sampling-config.yaml) together.

yaml

# otel-tail-sampling-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: otel-tail-sampling
  namespace: observability
spec:
  serviceName: otel-tail-sampling-headless   # Links to the Headless Service
  replicas: 3
  selector:
    matchLabels:
      app: otel-tail-sampling
  template:
    metadata:
      labels:
        app: otel-tail-sampling
    spec:
      containers:
        - name: collector
          # Pin a specific version instead of latest to maintain reproducible builds.
          # Latest releases: https://github.com/open-telemetry/opentelemetry-collector-contrib/releases
          image: otel/opentelemetry-collector-contrib:v0.120.0
          resources:
            requests:
              memory: "512Mi"
              cpu: "250m"
            limits:
              memory: "1Gi"
              cpu: "1"
          volumeMounts:
            - name: config
              mountPath: /conf
      volumes:
        - name: config
          configMap:
            name: otel-tail-sampling-config
---
# Headless Service: gives each Pod its own DNS entry
apiVersion: v1
kind: Service
metadata:
  name: otel-tail-sampling-headless
  namespace: observability
spec:
  clusterIP: None          # The key Headless setting
  selector:
    app: otel-tail-sampling
  ports:
    - name: otlp-grpc
      port: 4317
      targetPort: 4317     # Specified explicitly to avoid ambiguity
---
# PodDisruptionBudget: guarantees minimum instances during rolling updates
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: otel-tail-sampling-pdb
  namespace: observability
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: otel-tail-sampling

The sampling policy ConfigMap for the Tier 2 Collector:

yaml

# otel-tail-sampling-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-tail-sampling-config
  namespace: observability
data:
  config.yaml: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
 
    processors:
      memory_limiter:
        check_interval: 1s
        limit_mib: 800
        spike_limit_mib: 200
 
      tail_sampling:
        # decision_wait (30s) × expected_new_traces_per_sec (1,000/s)
        # = ~30,000 traces waiting for a decision at peak
        # num_traces (100,000) is set to 3× that estimate for headroom.
        # Also adjust memory limits to (avg trace size × num_traces).
        decision_wait: 30s
        num_traces: 100000
        expected_new_traces_per_sec: 1000
        policies:
          # Always preserve traces that contain errors
          - name: error-policy
            type: status_code
            status_code: {status_codes: [ERROR]}
          # Preserve slow requests that took more than 2 seconds
          - name: slow-traces-policy
            type: latency
            latency: {threshold_ms: 2000}
          # Sample only 10% of everything else
          - name: probabilistic-policy
            type: probabilistic
            probabilistic: {sampling_percentage: 10}
 
    exporters:
      otlp:
        endpoint: jaeger-collector.observability.svc:4317
        tls:
          insecure: true
 
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, tail_sampling]
          exporters: [otlp]

Policy	Condition	Retention
`error-policy`	HTTP 5xx, gRPC Error status	100%
`slow-traces-policy`	Response time exceeds 2 seconds	100%
`probabilistic-policy`	Normal requests that don't match the above	10%

decision_wait and memory: decision_wait (default 30s) × expected_new_traces_per_sec (1,000/s) = ~30,000 traces waiting for a decision at any given time. num_traces should be set comfortably higher than this, and memory limits should be tuned alongside it based on (average trace size × num_traces). For services with fast traffic patterns, reducing decision_wait to 10–15 seconds lowers memory footprint.

Step 3: Wire Up KEDA ScaledObject + Kedify otel-add-on

ScaledObjects that autoscale the Tier 1 Gateway Collector based on received span count, and the Tier 2 Tail Sampling Collector based on buffered trace count.

yaml

# keda-gateway-scaledobject.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: otel-gateway-scaler
  namespace: observability
spec:
  scaleTargetRef:
    name: otel-gateway-collector      # Name of the Tier 1 Deployment
  minReplicaCount: 2                  # Avoid cold start: keep at least 2
  maxReplicaCount: 20
  pollingInterval: 5                  # Check metrics every 5 seconds
  cooldownPeriod: 60                  # Wait 60 seconds before scaling in
  triggers:
    - type: external
      metadata:
        scalerAddress: kedify-otel-add-on.observability.svc:4318
        # rate() computes the 1-minute average instantaneous receive rate (spans/sec).
        # KEDA ceiling-divides (returned value / targetValue) to determine replica count.
        metricQuery: |
          sum(rate(otelcol_receiver_accepted_spans{receiver="otlp"}[1m]))
        targetValue: "10000"          # Target spans/sec per instance
---
# Tier 2 Tail Sampling: scale based on number of traces in flight
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: otel-tail-sampling-scaler
  namespace: observability
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: otel-tail-sampling
  minReplicaCount: 2
  maxReplicaCount: 10
  triggers:
    - type: external
      metadata:
        scalerAddress: kedify-otel-add-on.observability.svc:4318
        # Use the count metric rather than histogram buckets (_bucket)
        # to get a single meaningful aggregate.
        # Verify whether this metric is a cumulative counter or gauge in your environment.
        metricQuery: |
          sum(otelcol_processor_tail_sampling_num_traces_sampled)
        targetValue: "80000"          # Cluster-wide trace processing threshold

Kedify otel-add-on data flow:

OTel Collector (pushes metrics via OTLP)
        │ port 4317
        ▼
┌─────────────────────────────┐
│  kedify-otel-add-on          │
│  - Internal TSDB (ring buf) │
│  - PromQL-like aggregation  │
└──────────────┬──────────────┘
               │ gRPC External Scaler (port 4318)
               ▼
         KEDA Operator
               │
               ▼
    Replica count determined and applied

Pros and Cons

Advantages

Item	Details
Precise sampling	Preserves only meaningful traces based on errors, latency, and attributes — reduces storage costs
Sub-second scaling reaction	OTLP Push eliminates polling delay; reaction speed improves 10× or more vs pull-based approaches
Cost optimization	Replica count can be minimized during low load (controlled via `minReplicaCount`)
Flexible triggers	Fine-grained criteria using the Collector's own metrics (queue size, received spans, memory usage)
Standards-based	OpenTelemetry + KEDA combination avoids vendor lock-in; swap Jaeger, Tempo, Datadog freely

Disadvantages and Caveats

Item	Details	Mitigation
Tier 2 StatefulSet scaling constraints	Cannot apply HPA freely; risk of span loss within the trace window (30s) during scaling	Set `PodDisruptionBudget` (`minAvailable: 2`); shorten `decision_wait` to 10–15s to minimize impact
Memory spikes	Increased buffering during traffic bursts raises OOM risk	`memory_limiter` is mandatory; use the `bytes_limiting` policy to cap large traces
Cold start latency	After Scale-to-Zero, first request processing takes seconds to tens of seconds; spans can be lost during that window	Never use `minReplicaCount: 0` in production. Keep at least 2 Gateway and 2 Tail Sampling instances
DNS sync delay	After a StatefulSet change, the loadbalancing exporter takes a few seconds to re-resolve DNS	Kubernetes DNS default TTL is 30s. Tune the Gateway's `dns_refresh_delay` to 10–15s and verify `dnsPolicy: ClusterFirst`
KEDA OTel integration experimental status	KEDA's built-in OTel metric export feature is Experimental	Use Kedify otel-add-on (stable) as the alternative; run thorough load tests before going to production

Terminology note: decision_wait is the maximum time the Tail Sampling Processor waits for all spans of a trace to arrive before making a sampling decision. The default is 30 seconds; services with fast traffic patterns can lower it to 10–15 seconds to reduce memory footprint.

The Most Common Mistakes in Production

Using a plain Deployment for the Tier 2 Collector: If you use a Deployment instead of a StatefulSet, stable DNS is not preserved when Pods are replaced, causing traces to scatter. Always use the StatefulSet + Headless Service combination.
Running production with minReplicaCount: 0: Scale-to-Zero is attractive from a cost perspective, but when traffic arrives while the Collector is completely off, spans are irrecoverably lost during the cold start. If the Tier 2 Tail Sampling layer is down, spans that arrive in the meantime never get a sampling decision at all. Keep at least 2 instances each in Gateway Tier 1 and Tail Sampling Tier 2.
Operating without memory_limiter: If a traffic spike causes the Collector to buffer traces until it OOMs and restarts, all traces accumulated at that moment are lost. Configuring memory_limiter is not optional — it is mandatory.

Closing Thoughts

Combining Tail Sampling with KEDA is more than a technology pairing. It is a pattern that solves two problems in a single architecture: "when and which data is valuable?" and "how do we elastically operate the infrastructure to process that data?" Through the 2-tier structure, KEDA ScaledObject, and Kedify otel-add-on combination covered in this article, you can build a pipeline that never misses an important trace even during traffic spikes.

Three steps you can take right now:

Practice the 2-tier structure locally: Spin up a local Kubernetes cluster with kind or minikube, then apply the manifests from the KubeCon EU 2024 Sampling Tutorial. A few kubectl apply -f commands are all it takes to see the 2-tier pipeline working firsthand.
Once Step 1 is working, try adding an OTel metrics trigger to a KEDA ScaledObject you're running in production. The Helm chart from Kedify otel-add-on GitHub lets you deploy the bridge component in under 10 minutes.
Incrementally refine your sampling policies to match your service's characteristics: Start with just two policies — errors (status_code: ERROR) and slow requests (latency) — then observe actual trace retention rates and storage costs before gradually tuning the probabilistic policy's sampling percentage. This approach keeps operational burden low.

Next article: Building a pipeline that auto-generates RED metrics (Rate, Errors, Duration) from traces using OpenTelemetry Collector's spanmetrics connector and connects them to a Grafana dashboard

References

Official Documentation

Hands-on Guides

Tail Sampling + KEDA: A 2-Tier OTel Architecture That Never Misses a Trace During Traffic Spikes

Core Concepts

What Is Tail Sampling

Glossary: A span represents a single unit of work; multiple spans form a trace. For example, "HTTP request → DB query → cache lookup" are each a span, and the bundle of those three spans is the trace.

The Core Constraint of Tail Sampling: Statefulness

python

# ❌ Wrong setup: spans scatter randomly
App → [Round-Robin LB] → Collector-0, Collector-1, Collector-2
                          (spans from the same TraceID end up split apart)
 
# ✅ Correct setup: consistent routing by TraceID
App → [TraceID Consistent-Hash LB] → Collector-0 (owns TraceID-A)
                                    → Collector-1 (owns TraceID-B)
                                    → Collector-2 (owns TraceID-C)

KEDA: Extending Event-Driven Autoscaling

Push vs Pull scaling: The traditional Prometheus approach has KEDA pull metrics every 15–30 seconds before acting. With OTLP Push, the Collector sends metrics directly to KEDA, enabling scaling reactions within seconds.

KEDA v2.12 Built-in OTel Integration vs Kedify otel-add-on: What's the Difference

The names are similar and easy to confuse, but the two features have completely different purposes.

	KEDA v2.12 Built-in OTel Integration	Kedify otel-add-on
Direction	KEDA → OTel Collector (exports KEDA's internal metrics via OTel)	OTel Collector → KEDA (uses Collector metrics as scale triggers)
Purpose	Improves observability of KEDA itself	Autoscales Collectors based on their throughput
Status	Experimental, built into KEDA v2.12+	Open source (Apache 2.0), provided by Kedify, requires separate installation
Use case	When you want to trace KEDA's own scaling behavior	When you want to scale OTel Collectors based on metrics

2-Tier Architecture: Solving the Constraint by Design

The answer that satisfies both the statefulness requirement and the autoscaling requirement simultaneously is a 2-tier structure with clearly separated responsibilities.

[Application Pods]
        │ OTLP/gRPC
        ▼
┌─────────────────────────────────────────────┐
│  Tier 1: Gateway Collector (Deployment)      │
│  - loadbalancing exporter (TraceID hash)    │
│  - Stateless → freely scalable with KEDA    │
└───────────────────┬─────────────────────────┘
                    │ Headless Service DNS
                    ▼
┌─────────────────────────────────────────────┐
│  Tier 2: Tail Sampling Collector (StatefulSet)│
│  - tailsamplingprocessor                   │
│  - memory_limiter processor                │
│  - Stable DNS guarantees TraceID affinity   │
└───────────────────┬─────────────────────────┘
                    │
                    ▼
        [Jaeger / Grafana Tempo / Datadog]

Glossary: A Headless Service is a Kubernetes service configured with clusterIP: None that exposes each Pod's DNS individually. It is essential whenever each Pod in a StatefulSet needs a unique, stable address.

Practical Application

Prerequisites: To apply the examples below, your Kubernetes cluster must already have the observability namespace, the KEDA operator, and the Kedify otel-add-on installed. The ConfigMap and StatefulSet YAMLs must always be applied together (kubectl apply -f) for correct operation.

Step 1: Configure the Gateway Collector

The heart of the Tier 1 Collector is the loadbalancingexporter. It consistently hashes the TraceID of incoming spans and routes them to a fixed instance in the Tier 2 StatefulSet.

yaml

# otel-gateway-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-gateway-config
  namespace: observability
data:
  config.yaml: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
 
    processors:
      memory_limiter:
        check_interval: 1s
        limit_mib: 512
        spike_limit_mib: 128
 
    exporters:
      loadbalancing:
        protocol:
          otlp:
            tls:
              insecure: true
        resolver:
          dns:
            # Headless Service DNS of the Tier 2 StatefulSet
            hostname: otel-tail-sampling-headless.observability.svc.cluster.local
            port: 4317
 
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter]
          exporters: [loadbalancing]

Setting	Role
`loadbalancing.resolver.dns`	Dynamically discovers the Tier 2 Pod list via the Headless Service
`memory_limiter`	Sets a memory ceiling to prevent OOM during traffic spikes
`tls.insecure: true`	Skips TLS for intra-cluster traffic (configure separately for production)

Step 2: Configure the Tail Sampling StatefulSet

Apply the following two files (otel-tail-sampling-statefulset.yaml and otel-tail-sampling-config.yaml) together.

yaml

# otel-tail-sampling-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: otel-tail-sampling
  namespace: observability
spec:
  serviceName: otel-tail-sampling-headless   # Links to the Headless Service
  replicas: 3
  selector:
    matchLabels:
      app: otel-tail-sampling
  template:
    metadata:
      labels:
        app: otel-tail-sampling
    spec:
      containers:
        - name: collector
          # Pin a specific version instead of latest to maintain reproducible builds.
          # Latest releases: https://github.com/open-telemetry/opentelemetry-collector-contrib/releases
          image: otel/opentelemetry-collector-contrib:v0.120.0
          resources:
            requests:
              memory: "512Mi"
              cpu: "250m"
            limits:
              memory: "1Gi"
              cpu: "1"
          volumeMounts:
            - name: config
              mountPath: /conf
      volumes:
        - name: config
          configMap:
            name: otel-tail-sampling-config
---
# Headless Service: gives each Pod its own DNS entry
apiVersion: v1
kind: Service
metadata:
  name: otel-tail-sampling-headless
  namespace: observability
spec:
  clusterIP: None          # The key Headless setting
  selector:
    app: otel-tail-sampling
  ports:
    - name: otlp-grpc
      port: 4317
      targetPort: 4317     # Specified explicitly to avoid ambiguity
---
# PodDisruptionBudget: guarantees minimum instances during rolling updates
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: otel-tail-sampling-pdb
  namespace: observability
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: otel-tail-sampling

The sampling policy ConfigMap for the Tier 2 Collector:

yaml

# otel-tail-sampling-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-tail-sampling-config
  namespace: observability
data:
  config.yaml: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
 
    processors:
      memory_limiter:
        check_interval: 1s
        limit_mib: 800
        spike_limit_mib: 200
 
      tail_sampling:
        # decision_wait (30s) × expected_new_traces_per_sec (1,000/s)
        # = ~30,000 traces waiting for a decision at peak
        # num_traces (100,000) is set to 3× that estimate for headroom.
        # Also adjust memory limits to (avg trace size × num_traces).
        decision_wait: 30s
        num_traces: 100000
        expected_new_traces_per_sec: 1000
        policies:
          # Always preserve traces that contain errors
          - name: error-policy
            type: status_code
            status_code: {status_codes: [ERROR]}
          # Preserve slow requests that took more than 2 seconds
          - name: slow-traces-policy
            type: latency
            latency: {threshold_ms: 2000}
          # Sample only 10% of everything else
          - name: probabilistic-policy
            type: probabilistic
            probabilistic: {sampling_percentage: 10}
 
    exporters:
      otlp:
        endpoint: jaeger-collector.observability.svc:4317
        tls:
          insecure: true
 
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, tail_sampling]
          exporters: [otlp]

Policy	Condition	Retention
`error-policy`	HTTP 5xx, gRPC Error status	100%
`slow-traces-policy`	Response time exceeds 2 seconds	100%
`probabilistic-policy`	Normal requests that don't match the above	10%

decision_wait and memory: decision_wait (default 30s) × expected_new_traces_per_sec (1,000/s) = ~30,000 traces waiting for a decision at any given time. num_traces should be set comfortably higher than this, and memory limits should be tuned alongside it based on (average trace size × num_traces). For services with fast traffic patterns, reducing decision_wait to 10–15 seconds lowers memory footprint.

Step 3: Wire Up KEDA ScaledObject + Kedify otel-add-on

ScaledObjects that autoscale the Tier 1 Gateway Collector based on received span count, and the Tier 2 Tail Sampling Collector based on buffered trace count.

yaml

# keda-gateway-scaledobject.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: otel-gateway-scaler
  namespace: observability
spec:
  scaleTargetRef:
    name: otel-gateway-collector      # Name of the Tier 1 Deployment
  minReplicaCount: 2                  # Avoid cold start: keep at least 2
  maxReplicaCount: 20
  pollingInterval: 5                  # Check metrics every 5 seconds
  cooldownPeriod: 60                  # Wait 60 seconds before scaling in
  triggers:
    - type: external
      metadata:
        scalerAddress: kedify-otel-add-on.observability.svc:4318
        # rate() computes the 1-minute average instantaneous receive rate (spans/sec).
        # KEDA ceiling-divides (returned value / targetValue) to determine replica count.
        metricQuery: |
          sum(rate(otelcol_receiver_accepted_spans{receiver="otlp"}[1m]))
        targetValue: "10000"          # Target spans/sec per instance
---
# Tier 2 Tail Sampling: scale based on number of traces in flight
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: otel-tail-sampling-scaler
  namespace: observability
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: otel-tail-sampling
  minReplicaCount: 2
  maxReplicaCount: 10
  triggers:
    - type: external
      metadata:
        scalerAddress: kedify-otel-add-on.observability.svc:4318
        # Use the count metric rather than histogram buckets (_bucket)
        # to get a single meaningful aggregate.
        # Verify whether this metric is a cumulative counter or gauge in your environment.
        metricQuery: |
          sum(otelcol_processor_tail_sampling_num_traces_sampled)
        targetValue: "80000"          # Cluster-wide trace processing threshold

Kedify otel-add-on data flow:

OTel Collector (pushes metrics via OTLP)
        │ port 4317
        ▼
┌─────────────────────────────┐
│  kedify-otel-add-on          │
│  - Internal TSDB (ring buf) │
│  - PromQL-like aggregation  │
└──────────────┬──────────────┘
               │ gRPC External Scaler (port 4318)
               ▼
         KEDA Operator
               │
               ▼
    Replica count determined and applied

Pros and Cons

Advantages

Item	Details
Precise sampling	Preserves only meaningful traces based on errors, latency, and attributes — reduces storage costs
Sub-second scaling reaction	OTLP Push eliminates polling delay; reaction speed improves 10× or more vs pull-based approaches
Cost optimization	Replica count can be minimized during low load (controlled via `minReplicaCount`)
Flexible triggers	Fine-grained criteria using the Collector's own metrics (queue size, received spans, memory usage)
Standards-based	OpenTelemetry + KEDA combination avoids vendor lock-in; swap Jaeger, Tempo, Datadog freely

Disadvantages and Caveats

Item	Details	Mitigation
Tier 2 StatefulSet scaling constraints	Cannot apply HPA freely; risk of span loss within the trace window (30s) during scaling	Set `PodDisruptionBudget` (`minAvailable: 2`); shorten `decision_wait` to 10–15s to minimize impact
Memory spikes	Increased buffering during traffic bursts raises OOM risk	`memory_limiter` is mandatory; use the `bytes_limiting` policy to cap large traces
Cold start latency	After Scale-to-Zero, first request processing takes seconds to tens of seconds; spans can be lost during that window	Never use `minReplicaCount: 0` in production. Keep at least 2 Gateway and 2 Tail Sampling instances
DNS sync delay	After a StatefulSet change, the loadbalancing exporter takes a few seconds to re-resolve DNS	Kubernetes DNS default TTL is 30s. Tune the Gateway's `dns_refresh_delay` to 10–15s and verify `dnsPolicy: ClusterFirst`
KEDA OTel integration experimental status	KEDA's built-in OTel metric export feature is Experimental	Use Kedify otel-add-on (stable) as the alternative; run thorough load tests before going to production

Terminology note: decision_wait is the maximum time the Tail Sampling Processor waits for all spans of a trace to arrive before making a sampling decision. The default is 30 seconds; services with fast traffic patterns can lower it to 10–15 seconds to reduce memory footprint.

The Most Common Mistakes in Production

Using a plain Deployment for the Tier 2 Collector: If you use a Deployment instead of a StatefulSet, stable DNS is not preserved when Pods are replaced, causing traces to scatter. Always use the StatefulSet + Headless Service combination.
Running production with minReplicaCount: 0: Scale-to-Zero is attractive from a cost perspective, but when traffic arrives while the Collector is completely off, spans are irrecoverably lost during the cold start. If the Tier 2 Tail Sampling layer is down, spans that arrive in the meantime never get a sampling decision at all. Keep at least 2 instances each in Gateway Tier 1 and Tail Sampling Tier 2.
Operating without memory_limiter: If a traffic spike causes the Collector to buffer traces until it OOMs and restarts, all traces accumulated at that moment are lost. Configuring memory_limiter is not optional — it is mandatory.

Closing Thoughts

Three steps you can take right now:

Practice the 2-tier structure locally: Spin up a local Kubernetes cluster with kind or minikube, then apply the manifests from the KubeCon EU 2024 Sampling Tutorial. A few kubectl apply -f commands are all it takes to see the 2-tier pipeline working firsthand.
Once Step 1 is working, try adding an OTel metrics trigger to a KEDA ScaledObject you're running in production. The Helm chart from Kedify otel-add-on GitHub lets you deploy the bridge component in under 10 minutes.
Incrementally refine your sampling policies to match your service's characteristics: Start with just two policies — errors (status_code: ERROR) and slow requests (latency) — then observe actual trace retention rates and storage costs before gradually tuning the probabilistic policy's sampling percentage. This approach keeps operational burden low.

Next article: Building a pipeline that auto-generates RED metrics (Rate, Errors, Duration) from traces using OpenTelemetry Collector's spanmetrics connector and connects them to a Grafana dashboard

Core Concepts

What Is Tail Sampling

The Core Constraint of Tail Sampling: Statefulness

KEDA: Extending Event-Driven Autoscaling

KEDA v2.12 Built-in OTel Integration vs Kedify otel-add-on: What's the Difference

2-Tier Architecture: Solving the Constraint by Design

Practical Application

Step 1: Configure the Gateway Collector

Step 2: Configure the Tail Sampling StatefulSet

Step 3: Wire Up KEDA ScaledObject + Kedify otel-add-on

Pros and Cons

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Production

Closing Thoughts

References

Official Documentation

Hands-on Guides

Further Reading

Core Concepts

What Is Tail Sampling

The Core Constraint of Tail Sampling: Statefulness

KEDA: Extending Event-Driven Autoscaling

KEDA v2.12 Built-in OTel Integration vs Kedify otel-add-on: What's the Difference

2-Tier Architecture: Solving the Constraint by Design

Practical Application

Step 1: Configure the Gateway Collector

Step 2: Configure the Tail Sampling StatefulSet

Step 3: Wire Up KEDA ScaledObject + Kedify otel-add-on

Pros and Cons

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Production

Closing Thoughts

References

Official Documentation

Hands-on Guides

Further Reading

Recommended Posts

OTel spanmetrics Connector: How to Auto-Generate RED Metrics from Traces Without Code Changes and Connect to Grafana

Building an IDP with Backstage: The Story of Personally Implementing a Self-Service Deployment Environment

AI-Driven Frontend CI/CD: Transforming Deployment Pipelines with Predictive, Self-Healing, and Autonomous Testing

OpenTelemetry Tail Sampling Deep Dive: Composite Policy Design and Memory Optimization with decision_wait

Complete Guide to loadbalancingexporter: Guaranteeing Tail Sampling Accuracy with a 2-Tier Architecture

OTel Collector Tail Sampling Memory Optimization: A Configuration Guide for `decision_wait` and `num_traces` to Prevent Production OOM