100% Error Span Collection, Up to 95% Cost Reduction — Grafana Alloy + OpenTelemetry Tail-Based Sampling Practical Guide
When operating distributed systems, you always face the same dilemma. Storing every trace is prohibitively expensive, but random sampling risks losing the error traces you actually need. This goes beyond a simple cost issue — it's a core architectural decision that directly impacts your ability to respond to incidents.
Grafana Tempo's tail-based sampling fundamentally solves this dilemma. Because it decides whether to store a trace after it has been fully collected, conditional filtering like "preserve 100% of traces containing errors, keep only 5% of normal traces" becomes possible. With the right policy design, you can retain every single error trace while reducing total storage usage by up to 95%.
This post assumes you already have a Grafana Alloy + Tempo pipeline in place. For OTel SDK instrumentation or initial Alloy installation, refer to the Grafana Alloy official documentation and Grafana Tempo official documentation. If you already have a pipeline and want to refine your sampling strategy, this guide is for you — it covers everything from configuring the otelcol.processor.tail_sampling component and high-traffic memory tuning to a pre-production operational checklist.
Core Concepts
Head-Based vs Tail-Based Sampling: What's the Difference?
There are two main approaches to trace sampling.
| Approach | Decision Point | Error Span Preservation | Implementation Complexity |
|---|---|---|---|
| Head-based sampling | At trace start | Not guaranteed (decided before errors occur) | Low |
| Tail-based sampling | After trace completion | 100% guaranteed | Medium |
Head-based sampling probabilistically decides whether to store a trace when it begins. It's simple to implement, but if an error occurs later, the trace is already lost if a drop decision was made. Tail-based sampling waits until all spans are collected, allowing storage decisions to be made with full knowledge of the trace's contents.
Terminology — Span: A unit of work in a distributed trace (e.g., an HTTP request, a DB query). Multiple spans combine to form a single Trace.
Grafana Alloy + Tempo Architecture Overview
Application (OTel SDK)
└─▶ Grafana Alloy
├─ otelcol.processor.tail_sampling
│ ├─ Policy 1: status_code = ERROR → 100% preserved
│ ├─ Policy 2: latency > 2000ms → 100% preserved
│ └─ Policy 3: probabilistic 5% → keep remaining 5%
└─▶ Grafana Tempo (S3 / GCS / MinIO)
└─▶ Grafana Dashboard (TraceQL)Grafana Alloy is the official successor to the former Grafana Agent Flow mode, and the otelcol.processor.tail_sampling component is the standard way to configure tail sampling. Tempo uses object storage (S3, GCS, MinIO, etc.) as its backend, dramatically reducing index storage costs compared to Jaeger or Zipkin.
Three Core Policy Types
| Policy Type | Role | Key Parameters |
|---|---|---|
status_code |
Filter by OTel status code | status_codes: ["ERROR"] |
latency |
Filter by total trace latency | threshold_ms: 2000 |
probabilistic |
Random sampling by probability | sampling_percentage: 5 |
Key Rule — The
probabilisticpolicy must always be placed last. If placed earlier, error traces may be dropped by the probabilistic evaluation. The official Grafana documentation explicitly requires this.
decision_wait and Handling Incomplete Traces
The fundamental constraint of tail sampling is that evaluation doesn't begin until the last span of a trace arrives. Traces that remain incomplete after decision_wait has elapsed are evaluated against policies using only the spans collected so far — if they don't match any retention policy, they are dropped. Therefore, it's important to set decision_wait well above the expected completion time of the longest trace in your service.
num_traces is the maximum number of traces held in memory during decision_wait. If this value is too low, traces that haven't finished evaluation yet will be dropped prematurely.
Calculation Formula —
num_traces≥traces per second×decision_wait (seconds)Example: 500 traces/sec,decision_wait = "10s"→ minimumnum_tracesof 5,000
Now let's combine these three policies to build real configurations.
Practical Application
Example 1: 100% Errors + Low-Rate Sampling for Normal Traces (Basic Pattern)
The most common configuration. Errors and slow traces are fully preserved; only 5% of remaining normal traces are kept.
// Grafana Alloy — config.alloy
// A separate OTLP exporter declaration referenced in the output block is required.
// otelcol.exporter.otlp "tempo" {
// client { endpoint = "tempo:4317" }
// }
otelcol.processor.tail_sampling "default" {
decision_wait = "10s"
num_traces = 10000
expected_new_traces_per_sec = 1000
// Policy 1: Preserve 100% of traces containing error spans
policy {
name = "keep-errors"
type = "status_code"
status_code {
status_codes = ["ERROR"]
}
}
// Policy 2: Preserve all slow traces exceeding 2 seconds
policy {
name = "keep-slow-traces"
type = "latency"
latency {
threshold_ms = 2000
}
}
// Policy 3: Random 5% sampling of the rest — must be last
policy {
name = "probabilistic-sample"
type = "probabilistic"
probabilistic {
sampling_percentage = 5
}
}
output {
traces = [otelcol.exporter.otlp.tempo.input]
}
}| Parameter | Description | How to Set |
|---|---|---|
decision_wait |
Wait time until span collection is complete | At least the expected completion time of the longest trace in the service |
num_traces |
Maximum number of traces held in memory | At least the expected number of concurrently active traces (see formula in Core Concepts) |
expected_new_traces_per_sec |
Expected number of new traces per second | Measure based on actual traffic |
Example 2: Fully Exclude Health Checks + 100% Error Collection
Traces from health check endpoints like /health and /readyz are mostly noise. Dropping them first reduces storage costs further.
// Grafana Alloy — config.alloy (health check exclusion pattern)
otelcol.processor.tail_sampling "default" {
decision_wait = "10s"
num_traces = 10000
expected_new_traces_per_sec = 1000
// Policy 1: Only evaluate traces that don't match health check paths
policy {
name = "drop-healthcheck"
type = "string_attribute"
string_attribute {
key = "http.target"
values = ["/health", "/readyz", "/livez"]
invert_match = true
}
}
// Policy 2: Preserve 100% of error traces
policy {
name = "keep-errors"
type = "status_code"
status_code {
status_codes = ["ERROR"]
}
}
// Policy 3: Sample 10% of the rest — must be last
policy {
name = "baseline"
type = "probabilistic"
probabilistic {
sampling_percentage = 10
}
}
output {
traces = [otelcol.exporter.otlp.tempo.input]
}
}
invert_match = true— Marks traces that do not match the specified values as candidates for retention. Traces matching/healthetc. don't satisfy any retention policy and are effectively dropped. Note that this works by excluding them from retention conditions rather than issuing a direct drop command.
Example 3: Rate Limiting with Composite Policy for High-Traffic Environments
In environments with tens of thousands of traces per second, the composite policy can cap the maximum number of spans per second while still prioritizing errors. If your traffic is below a few thousand traces per second, Example 1 is sufficient and this configuration may not be necessary.
The example below uses OTel Collector contrib's YAML configuration format. The composite configuration, which requires nested sub-policies alongside rate_allocation, is practical to write as OTel Collector YAML and then connect to the Alloy pipeline.
# OpenTelemetry Collector — config.yaml (composite policy)
processors:
tail_sampling:
decision_wait: 10s
num_traces: 50000
policies:
- name: error-policy
type: status_code
status_code:
status_codes: [ERROR]
- name: composite-policy
type: composite
composite:
max_total_spans_per_second: 10000 # upper limit on spans per second
policy_order: [error-policy, latency-policy, probabilistic-policy]
composite_sub_policy:
- name: latency-policy
type: latency
latency:
threshold_ms: 500
- name: probabilistic-policy
type: probabilistic
probabilistic:
sampling_percentage: 3
rate_allocation:
- policy: error-policy
percent: 60 # prioritize 60% of allowed throughput for error traces
- policy: latency-policy
percent: 30
- policy: probabilistic-policy
percent: 10The role of
rate_allocation— Oncemax_total_spans_per_secondplaces a ceiling on total throughput, this determines how capacity is divided among policies. Allocating 60% toerror-policyensures error traces are collected first even when traffic spikes hit the throughput limit. While thestatus_code = ERRORpolicy itself always preserves matching traces, rate allocation is needed to guarantee error trace priority in a throughput-capped environment.
Pros and Cons Analysis
Advantages
| Item | Details |
|---|---|
| Complete error capture | Since decisions are made after trace completion, 100% of traces containing error spans can be preserved |
| Cost optimization | Low-rate sampling of normal traces reduces storage costs by up to 95% |
| Business value retention | Selectively preserves meaningful traces (errors, abnormal latency) compared to simple probabilistic sampling |
| Synergy with Tempo | Object storage backend allows cost-effective long-term retention of sampled traces |
| Adaptive Traces integration | Custom policy management available via managed UI in Grafana Cloud (GA in 2025) |
Disadvantages and Caveats
| Item | Details | Mitigation |
|---|---|---|
| Memory pressure | All spans held in memory during decision_wait → OOM risk under high traffic |
Set Alloy memory limit generously based on num_traces × average span size |
| Single-instance constraint | Spans from the same trace must be routed to the same Alloy instance | Configure trace ID-based routing with otelcol.exporter.loadbalancing |
| Policy ordering errors | If probabilistic comes first, error traces may be dropped |
Always maintain status_code → latency → probabilistic order |
| decision_wait tuning | Too short causes decisions on incomplete traces; too long increases memory usage | Set based on the expected completion time of the longest trace |
Terminology supplement —
otelcol.exporter.loadbalancing: An exporter that guarantees spans with the same trace ID are always delivered to the same instance when multiple Alloy instances are present. Essential for scaling out tail sampling.
Most Common Mistakes in Practice
- Placing the
probabilisticpolicy first — Error traces get dropped by the probabilistic evaluation. It must be moved to the last position. - Setting
num_tracestoo low — If set below the expected number of concurrently active traces, traces that haven't finished evaluation yet will be dropped prematurely. - Multiple Alloy instances without load balancing — If spans from the same trace are distributed across different instances, tail sampling decisions become incomplete. A
loadbalancingexporter must be placed in front of the Alloy instances.
Closing Thoughts
Tail-based sampling is currently the most practical trace strategy for simultaneously achieving two goals: missing zero errors while minimizing costs.
Three steps you can take right now:
-
Measure your current traffic.
- Determine your
traces per secondandaverage trace completion time. - Example: 500 traces/sec, completion time 8s →
num_traces = 5000,decision_wait = "10s"
- Determine your
-
Apply the basic pattern from Example 1 to your staging environment.
- Arrange policies in the order
status_code(ERROR) → latency(2000ms) → probabilistic(5%). - Use Alloy's
tail_sampling_count_traces_sampledmetric to verify actual retention and drop ratios.
- Arrange policies in the order
-
Review the checklist below before applying to production.
Pre-Production Operational Checklist
✅ Policy order confirmed: status_code → latency → probabilistic
✅ num_traces ≥ traces per second × decision_wait (seconds)
✅ decision_wait ≥ expected completion time of the longest trace in the service
✅ Alloy memory limit set to at least num_traces × average span size
✅ If running 2+ Alloy instances, otelcol.exporter.loadbalancing is configured
✅ Retention rate verified in staging using the tail_sampling_count_traces_sampled metric
✅ Confirm probabilistic policy is positioned lastNext post: How to detect error patterns in sampled traces using Grafana Tempo TraceQL and connect them to alerting
References
- Sampling | Grafana Tempo documentation
- Add tail sampling policies and strategies | Grafana Tempo documentation
- Enable tail-based sampling | Grafana Tempo documentation
- otelcol.processor.tail_sampling | Grafana Alloy documentation
- Introduction to Adaptive Traces | Grafana Cloud documentation
- Maximize data value and cut costs: Adaptive Telemetry for metrics, logs, traces, and profiles in Grafana Cloud | Grafana Labs
- Managing observability costs at scale | Grafana Labs Blog
- Tail Sampling Processor | OpenTelemetry Collector Contrib (GitHub)
- Sampling | OpenTelemetry
- Reduce Grafana Cloud Traces costs | Grafana Cloud documentation