100% Error Span Collection, Up to 95% Cost Reduction — Grafana Alloy + OpenTelemetry Tail-Based Sampling Practical Guide

When operating distributed systems, you always face the same dilemma. Storing every trace is prohibitively expensive, but random sampling risks losing the error traces you actually need. This goes beyond a simple cost issue — it's a core architectural decision that directly impacts your ability to respond to incidents.

Grafana Tempo's tail-based sampling fundamentally solves this dilemma. Because it decides whether to store a trace after it has been fully collected, conditional filtering like "preserve 100% of traces containing errors, keep only 5% of normal traces" becomes possible. With the right policy design, you can retain every single error trace while reducing total storage usage by up to 95%.

This post assumes you already have a Grafana Alloy + Tempo pipeline in place. For OTel SDK instrumentation or initial Alloy installation, refer to the Grafana Alloy official documentation and Grafana Tempo official documentation. If you already have a pipeline and want to refine your sampling strategy, this guide is for you — it covers everything from configuring the otelcol.processor.tail_sampling component and high-traffic memory tuning to a pre-production operational checklist.

Core Concepts

Head-Based vs Tail-Based Sampling: What's the Difference?

There are two main approaches to trace sampling.

Approach	Decision Point	Error Span Preservation	Implementation Complexity
Head-based sampling	At trace start	Not guaranteed (decided before errors occur)	Low
Tail-based sampling	After trace completion	100% guaranteed	Medium

Head-based sampling probabilistically decides whether to store a trace when it begins. It's simple to implement, but if an error occurs later, the trace is already lost if a drop decision was made. Tail-based sampling waits until all spans are collected, allowing storage decisions to be made with full knowledge of the trace's contents.

Terminology — Span: A unit of work in a distributed trace (e.g., an HTTP request, a DB query). Multiple spans combine to form a single Trace.

Grafana Alloy + Tempo Architecture Overview

Application (OTel SDK)
  └─▶ Grafana Alloy
        ├─ otelcol.processor.tail_sampling
        │   ├─ Policy 1: status_code = ERROR  → 100% preserved
        │   ├─ Policy 2: latency > 2000ms     → 100% preserved
        │   └─ Policy 3: probabilistic 5%     → keep remaining 5%
        └─▶ Grafana Tempo (S3 / GCS / MinIO)
              └─▶ Grafana Dashboard (TraceQL)

Grafana Alloy is the official successor to the former Grafana Agent Flow mode, and the otelcol.processor.tail_sampling component is the standard way to configure tail sampling. Tempo uses object storage (S3, GCS, MinIO, etc.) as its backend, dramatically reducing index storage costs compared to Jaeger or Zipkin.

Three Core Policy Types

Policy Type	Role	Key Parameters
`status_code`	Filter by OTel status code	`status_codes: ["ERROR"]`
`latency`	Filter by total trace latency	`threshold_ms: 2000`
`probabilistic`	Random sampling by probability	`sampling_percentage: 5`

Key Rule — The probabilistic policy must always be placed last. If placed earlier, error traces may be dropped by the probabilistic evaluation. The official Grafana documentation explicitly requires this.

decision_wait and Handling Incomplete Traces

The fundamental constraint of tail sampling is that evaluation doesn't begin until the last span of a trace arrives. Traces that remain incomplete after decision_wait has elapsed are evaluated against policies using only the spans collected so far — if they don't match any retention policy, they are dropped. Therefore, it's important to set decision_wait well above the expected completion time of the longest trace in your service.

num_traces is the maximum number of traces held in memory during decision_wait. If this value is too low, traces that haven't finished evaluation yet will be dropped prematurely.

Calculation Formula — num_traces ≥ traces per second × decision_wait (seconds) Example: 500 traces/sec, decision_wait = "10s" → minimum num_traces of 5,000

Now let's combine these three policies to build real configurations.

Practical Application

Example 1: 100% Errors + Low-Rate Sampling for Normal Traces (Basic Pattern)

The most common configuration. Errors and slow traces are fully preserved; only 5% of remaining normal traces are kept.

hcl

// Grafana Alloy — config.alloy
 
// A separate OTLP exporter declaration referenced in the output block is required.
// otelcol.exporter.otlp "tempo" {
//   client { endpoint = "tempo:4317" }
// }
 
otelcol.processor.tail_sampling "default" {
  decision_wait               = "10s"
  num_traces                  = 10000
  expected_new_traces_per_sec = 1000
 
  // Policy 1: Preserve 100% of traces containing error spans
  policy {
    name = "keep-errors"
    type = "status_code"
    status_code {
      status_codes = ["ERROR"]
    }
  }
 
  // Policy 2: Preserve all slow traces exceeding 2 seconds
  policy {
    name = "keep-slow-traces"
    type = "latency"
    latency {
      threshold_ms = 2000
    }
  }
 
  // Policy 3: Random 5% sampling of the rest — must be last
  policy {
    name = "probabilistic-sample"
    type = "probabilistic"
    probabilistic {
      sampling_percentage = 5
    }
  }
 
  output {
    traces = [otelcol.exporter.otlp.tempo.input]
  }
}

Parameter	Description	How to Set
`decision_wait`	Wait time until span collection is complete	At least the expected completion time of the longest trace in the service
`num_traces`	Maximum number of traces held in memory	At least the expected number of concurrently active traces (see formula in Core Concepts)
`expected_new_traces_per_sec`	Expected number of new traces per second	Measure based on actual traffic

Example 2: Fully Exclude Health Checks + 100% Error Collection

Traces from health check endpoints like /health and /readyz are mostly noise. Dropping them first reduces storage costs further.

hcl

// Grafana Alloy — config.alloy (health check exclusion pattern)
otelcol.processor.tail_sampling "default" {
  decision_wait               = "10s"
  num_traces                  = 10000
  expected_new_traces_per_sec = 1000
 
  // Policy 1: Only evaluate traces that don't match health check paths
  policy {
    name = "drop-healthcheck"
    type = "string_attribute"
    string_attribute {
      key          = "http.target"
      values       = ["/health", "/readyz", "/livez"]
      invert_match = true
    }
  }
 
  // Policy 2: Preserve 100% of error traces
  policy {
    name = "keep-errors"
    type = "status_code"
    status_code {
      status_codes = ["ERROR"]
    }
  }
 
  // Policy 3: Sample 10% of the rest — must be last
  policy {
    name = "baseline"
    type = "probabilistic"
    probabilistic {
      sampling_percentage = 10
    }
  }
 
  output {
    traces = [otelcol.exporter.otlp.tempo.input]
  }
}

invert_match = true — Marks traces that do not match the specified values as candidates for retention. Traces matching /health etc. don't satisfy any retention policy and are effectively dropped. Note that this works by excluding them from retention conditions rather than issuing a direct drop command.

Example 3: Rate Limiting with Composite Policy for High-Traffic Environments

In environments with tens of thousands of traces per second, the composite policy can cap the maximum number of spans per second while still prioritizing errors. If your traffic is below a few thousand traces per second, Example 1 is sufficient and this configuration may not be necessary.

The example below uses OTel Collector contrib's YAML configuration format. The composite configuration, which requires nested sub-policies alongside rate_allocation, is practical to write as OTel Collector YAML and then connect to the Alloy pipeline.

yaml

# OpenTelemetry Collector — config.yaml (composite policy)
processors:
  tail_sampling:
    decision_wait: 10s
    num_traces: 50000
    policies:
      - name: error-policy
        type: status_code
        status_code:
          status_codes: [ERROR]
      - name: composite-policy
        type: composite
        composite:
          max_total_spans_per_second: 10000  # upper limit on spans per second
          policy_order: [error-policy, latency-policy, probabilistic-policy]
          composite_sub_policy:
            - name: latency-policy
              type: latency
              latency:
                threshold_ms: 500
            - name: probabilistic-policy
              type: probabilistic
              probabilistic:
                sampling_percentage: 3
          rate_allocation:
            - policy: error-policy
              percent: 60   # prioritize 60% of allowed throughput for error traces
            - policy: latency-policy
              percent: 30
            - policy: probabilistic-policy
              percent: 10

The role of rate_allocation — Once max_total_spans_per_second places a ceiling on total throughput, this determines how capacity is divided among policies. Allocating 60% to error-policy ensures error traces are collected first even when traffic spikes hit the throughput limit. While the status_code = ERROR policy itself always preserves matching traces, rate allocation is needed to guarantee error trace priority in a throughput-capped environment.

Pros and Cons Analysis

Advantages

Item	Details
Complete error capture	Since decisions are made after trace completion, 100% of traces containing error spans can be preserved
Cost optimization	Low-rate sampling of normal traces reduces storage costs by up to 95%
Business value retention	Selectively preserves meaningful traces (errors, abnormal latency) compared to simple probabilistic sampling
Synergy with Tempo	Object storage backend allows cost-effective long-term retention of sampled traces
Adaptive Traces integration	Custom policy management available via managed UI in Grafana Cloud (GA in 2025)

Disadvantages and Caveats

Item	Details	Mitigation
Memory pressure	All spans held in memory during `decision_wait` → OOM risk under high traffic	Set Alloy memory limit generously based on `num_traces × average span size`
Single-instance constraint	Spans from the same trace must be routed to the same Alloy instance	Configure trace ID-based routing with `otelcol.exporter.loadbalancing`
Policy ordering errors	If `probabilistic` comes first, error traces may be dropped	Always maintain `status_code → latency → probabilistic` order
decision_wait tuning	Too short causes decisions on incomplete traces; too long increases memory usage	Set based on the expected completion time of the longest trace

Terminology supplement — otelcol.exporter.loadbalancing: An exporter that guarantees spans with the same trace ID are always delivered to the same instance when multiple Alloy instances are present. Essential for scaling out tail sampling.

Most Common Mistakes in Practice

Placing the probabilistic policy first — Error traces get dropped by the probabilistic evaluation. It must be moved to the last position.
Setting num_traces too low — If set below the expected number of concurrently active traces, traces that haven't finished evaluation yet will be dropped prematurely.
Multiple Alloy instances without load balancing — If spans from the same trace are distributed across different instances, tail sampling decisions become incomplete. A loadbalancing exporter must be placed in front of the Alloy instances.

Closing Thoughts

Tail-based sampling is currently the most practical trace strategy for simultaneously achieving two goals: missing zero errors while minimizing costs.

Three steps you can take right now:

Measure your current traffic.
- Determine your traces per second and average trace completion time.
- Example: 500 traces/sec, completion time 8s → num_traces = 5000, decision_wait = "10s"
Apply the basic pattern from Example 1 to your staging environment.
- Arrange policies in the order status_code(ERROR) → latency(2000ms) → probabilistic(5%).
- Use Alloy's tail_sampling_count_traces_sampled metric to verify actual retention and drop ratios.
Review the checklist below before applying to production.

Pre-Production Operational Checklist

sql

✅ Policy order confirmed: status_code → latency → probabilistic
✅ num_traces ≥ traces per second × decision_wait (seconds)
✅ decision_wait ≥ expected completion time of the longest trace in the service
✅ Alloy memory limit set to at least num_traces × average span size
✅ If running 2+ Alloy instances, otelcol.exporter.loadbalancing is configured
✅ Retention rate verified in staging using the tail_sampling_count_traces_sampled metric
✅ Confirm probabilistic policy is positioned last

Next post: How to detect error patterns in sampled traces using Grafana Tempo TraceQL and connect them to alerting

References

100% Error Span Collection, Up to 95% Cost Reduction — Grafana Alloy + OpenTelemetry Tail-Based Sampling Practical Guide | DEV BAK - 기술블로그

DevOps

100% Error Span Collection, Up to 95% Cost Reduction — Grafana Alloy + OpenTelemetry Tail-Based Sampling Practical Guide

Core Concepts

Head-Based vs Tail-Based Sampling: What's the Difference?

There are two main approaches to trace sampling.

Approach	Decision Point	Error Span Preservation	Implementation Complexity
Head-based sampling	At trace start	Not guaranteed (decided before errors occur)	Low
Tail-based sampling	After trace completion	100% guaranteed	Medium

Terminology — Span: A unit of work in a distributed trace (e.g., an HTTP request, a DB query). Multiple spans combine to form a single Trace.

Grafana Alloy + Tempo Architecture Overview

Application (OTel SDK)
  └─▶ Grafana Alloy
        ├─ otelcol.processor.tail_sampling
        │   ├─ Policy 1: status_code = ERROR  → 100% preserved
        │   ├─ Policy 2: latency > 2000ms     → 100% preserved
        │   └─ Policy 3: probabilistic 5%     → keep remaining 5%
        └─▶ Grafana Tempo (S3 / GCS / MinIO)
              └─▶ Grafana Dashboard (TraceQL)

Three Core Policy Types

Policy Type	Role	Key Parameters
`status_code`	Filter by OTel status code	`status_codes: ["ERROR"]`
`latency`	Filter by total trace latency	`threshold_ms: 2000`
`probabilistic`	Random sampling by probability	`sampling_percentage: 5`

Key Rule — The probabilistic policy must always be placed last. If placed earlier, error traces may be dropped by the probabilistic evaluation. The official Grafana documentation explicitly requires this.

decision_wait and Handling Incomplete Traces

num_traces is the maximum number of traces held in memory during decision_wait. If this value is too low, traces that haven't finished evaluation yet will be dropped prematurely.

Calculation Formula — num_traces ≥ traces per second × decision_wait (seconds) Example: 500 traces/sec, decision_wait = "10s" → minimum num_traces of 5,000

Now let's combine these three policies to build real configurations.

Practical Application

Example 1: 100% Errors + Low-Rate Sampling for Normal Traces (Basic Pattern)

The most common configuration. Errors and slow traces are fully preserved; only 5% of remaining normal traces are kept.

hcl

// Grafana Alloy — config.alloy
 
// A separate OTLP exporter declaration referenced in the output block is required.
// otelcol.exporter.otlp "tempo" {
//   client { endpoint = "tempo:4317" }
// }
 
otelcol.processor.tail_sampling "default" {
  decision_wait               = "10s"
  num_traces                  = 10000
  expected_new_traces_per_sec = 1000
 
  // Policy 1: Preserve 100% of traces containing error spans
  policy {
    name = "keep-errors"
    type = "status_code"
    status_code {
      status_codes = ["ERROR"]
    }
  }
 
  // Policy 2: Preserve all slow traces exceeding 2 seconds
  policy {
    name = "keep-slow-traces"
    type = "latency"
    latency {
      threshold_ms = 2000
    }
  }
 
  // Policy 3: Random 5% sampling of the rest — must be last
  policy {
    name = "probabilistic-sample"
    type = "probabilistic"
    probabilistic {
      sampling_percentage = 5
    }
  }
 
  output {
    traces = [otelcol.exporter.otlp.tempo.input]
  }
}

Parameter	Description	How to Set
`decision_wait`	Wait time until span collection is complete	At least the expected completion time of the longest trace in the service
`num_traces`	Maximum number of traces held in memory	At least the expected number of concurrently active traces (see formula in Core Concepts)
`expected_new_traces_per_sec`	Expected number of new traces per second	Measure based on actual traffic

Example 2: Fully Exclude Health Checks + 100% Error Collection

Traces from health check endpoints like /health and /readyz are mostly noise. Dropping them first reduces storage costs further.

hcl

// Grafana Alloy — config.alloy (health check exclusion pattern)
otelcol.processor.tail_sampling "default" {
  decision_wait               = "10s"
  num_traces                  = 10000
  expected_new_traces_per_sec = 1000
 
  // Policy 1: Only evaluate traces that don't match health check paths
  policy {
    name = "drop-healthcheck"
    type = "string_attribute"
    string_attribute {
      key          = "http.target"
      values       = ["/health", "/readyz", "/livez"]
      invert_match = true
    }
  }
 
  // Policy 2: Preserve 100% of error traces
  policy {
    name = "keep-errors"
    type = "status_code"
    status_code {
      status_codes = ["ERROR"]
    }
  }
 
  // Policy 3: Sample 10% of the rest — must be last
  policy {
    name = "baseline"
    type = "probabilistic"
    probabilistic {
      sampling_percentage = 10
    }
  }
 
  output {
    traces = [otelcol.exporter.otlp.tempo.input]
  }
}

invert_match = true — Marks traces that do not match the specified values as candidates for retention. Traces matching /health etc. don't satisfy any retention policy and are effectively dropped. Note that this works by excluding them from retention conditions rather than issuing a direct drop command.

Example 3: Rate Limiting with Composite Policy for High-Traffic Environments

yaml

# OpenTelemetry Collector — config.yaml (composite policy)
processors:
  tail_sampling:
    decision_wait: 10s
    num_traces: 50000
    policies:
      - name: error-policy
        type: status_code
        status_code:
          status_codes: [ERROR]
      - name: composite-policy
        type: composite
        composite:
          max_total_spans_per_second: 10000  # upper limit on spans per second
          policy_order: [error-policy, latency-policy, probabilistic-policy]
          composite_sub_policy:
            - name: latency-policy
              type: latency
              latency:
                threshold_ms: 500
            - name: probabilistic-policy
              type: probabilistic
              probabilistic:
                sampling_percentage: 3
          rate_allocation:
            - policy: error-policy
              percent: 60   # prioritize 60% of allowed throughput for error traces
            - policy: latency-policy
              percent: 30
            - policy: probabilistic-policy
              percent: 10

The role of rate_allocation — Once max_total_spans_per_second places a ceiling on total throughput, this determines how capacity is divided among policies. Allocating 60% to error-policy ensures error traces are collected first even when traffic spikes hit the throughput limit. While the status_code = ERROR policy itself always preserves matching traces, rate allocation is needed to guarantee error trace priority in a throughput-capped environment.

Pros and Cons Analysis

Advantages

Item	Details
Complete error capture	Since decisions are made after trace completion, 100% of traces containing error spans can be preserved
Cost optimization	Low-rate sampling of normal traces reduces storage costs by up to 95%
Business value retention	Selectively preserves meaningful traces (errors, abnormal latency) compared to simple probabilistic sampling
Synergy with Tempo	Object storage backend allows cost-effective long-term retention of sampled traces
Adaptive Traces integration	Custom policy management available via managed UI in Grafana Cloud (GA in 2025)

Disadvantages and Caveats

Item	Details	Mitigation
Memory pressure	All spans held in memory during `decision_wait` → OOM risk under high traffic	Set Alloy memory limit generously based on `num_traces × average span size`
Single-instance constraint	Spans from the same trace must be routed to the same Alloy instance	Configure trace ID-based routing with `otelcol.exporter.loadbalancing`
Policy ordering errors	If `probabilistic` comes first, error traces may be dropped	Always maintain `status_code → latency → probabilistic` order
decision_wait tuning	Too short causes decisions on incomplete traces; too long increases memory usage	Set based on the expected completion time of the longest trace

Terminology supplement — otelcol.exporter.loadbalancing: An exporter that guarantees spans with the same trace ID are always delivered to the same instance when multiple Alloy instances are present. Essential for scaling out tail sampling.

Most Common Mistakes in Practice

Placing the probabilistic policy first — Error traces get dropped by the probabilistic evaluation. It must be moved to the last position.
Setting num_traces too low — If set below the expected number of concurrently active traces, traces that haven't finished evaluation yet will be dropped prematurely.
Multiple Alloy instances without load balancing — If spans from the same trace are distributed across different instances, tail sampling decisions become incomplete. A loadbalancing exporter must be placed in front of the Alloy instances.

Closing Thoughts

Tail-based sampling is currently the most practical trace strategy for simultaneously achieving two goals: missing zero errors while minimizing costs.

Three steps you can take right now:

Measure your current traffic.
- Determine your traces per second and average trace completion time.
- Example: 500 traces/sec, completion time 8s → num_traces = 5000, decision_wait = "10s"
Apply the basic pattern from Example 1 to your staging environment.
- Arrange policies in the order status_code(ERROR) → latency(2000ms) → probabilistic(5%).
- Use Alloy's tail_sampling_count_traces_sampled metric to verify actual retention and drop ratios.
Review the checklist below before applying to production.

Pre-Production Operational Checklist

sql

✅ Policy order confirmed: status_code → latency → probabilistic
✅ num_traces ≥ traces per second × decision_wait (seconds)
✅ decision_wait ≥ expected completion time of the longest trace in the service
✅ Alloy memory limit set to at least num_traces × average span size
✅ If running 2+ Alloy instances, otelcol.exporter.loadbalancing is configured
✅ Retention rate verified in staging using the tail_sampling_count_traces_sampled metric
✅ Confirm probabilistic policy is positioned last

Next post: How to detect error patterns in sampled traces using Grafana Tempo TraceQL and connect them to alerting

Core Concepts

Head-Based vs Tail-Based Sampling: What's the Difference?

Grafana Alloy + Tempo Architecture Overview

Three Core Policy Types

decision_wait and Handling Incomplete Traces

Practical Application

Example 1: 100% Errors + Low-Rate Sampling for Normal Traces (Basic Pattern)

Example 2: Fully Exclude Health Checks + 100% Error Collection

Example 3: Rate Limiting with Composite Policy for High-Traffic Environments

Pros and Cons Analysis

Advantages

Disadvantages and Caveats

Most Common Mistakes in Practice

Closing Thoughts

References

Core Concepts

Head-Based vs Tail-Based Sampling: What's the Difference?

Grafana Alloy + Tempo Architecture Overview

Three Core Policy Types

decision_wait and Handling Incomplete Traces

Practical Application

Example 1: 100% Errors + Low-Rate Sampling for Normal Traces (Basic Pattern)

Example 2: Fully Exclude Health Checks + 100% Error Collection

Example 3: Rate Limiting with Composite Policy for High-Traffic Environments

Pros and Cons Analysis

Advantages

Disadvantages and Caveats

Most Common Mistakes in Practice

Closing Thoughts

References

Recommended Posts

How to Never Miss Errors with Grafana Tempo TraceQL: A Practical Guide for Sampling Environments

OpenTelemetry Collector Tail-based Sampling: How to Preserve 100% of Errors & Slow Requests While Cutting Storage Costs by 70%

Controlling Span Noise and Cardinality Explosion with filterprocessor · transformprocessor in OTel Collector (OpenTelemetry Collector)

TraceQL Deep Dive: A Practical Guide to Error Filtering, P99, and Mimir Cross-Signal Queries in Grafana Tempo 2.x

Grafana Loki + Tempo: Implementing Bidirectional Log-Trace Drill-Down with a Single Trace ID

Complete Guide to Prometheus + Grafana Monitoring — From Docker Compose to Kubernetes