OTel spanmetrics Connector: How to Auto-Generate RED Metrics from Traces Without Code Changes and Connect to Grafana
In a microservices environment, quickly assessing service health requires immediate answers to three questions: "How many requests are coming in right now?", "How many errors are occurring?", and "How long are responses taking?" These three are the RED metrics (Rate, Errors, Duration). The common assumption is that you need to instrument your application code separately to get these — but with OpenTelemetry Collector's spanmetrics connector, that's completely unnecessary.
The spanmetrics connector analyzes trace span data and automatically generates Prometheus-compatible RED metrics. You can build the entire pipeline through Collector configuration alone, without modifying a single line of application code. If you haven't adopted OTel yet, understanding the pipeline structure from this article means you can apply it immediately upon adoption.
After reading this article, you'll be able to understand how the spanmetrics connector works and build the complete pipeline yourself — writing real Collector YAML configuration and setting up Rate, Errors, and Duration panels in Grafana. We'll walk through each step: connector internals, full YAML configuration, PromQL queries, Alloy integration, and operational considerations.
Prerequisites for this article: This is written for backend and infrastructure engineers who are already running an OTel Collector and are familiar with basic pipeline configuration (OTLP Receiver, Batch Processor, Prometheus scraping).
Core Concepts
What Are RED Metrics
RED metrics are three golden signals for monitoring service health.
| Metric | Meaning | Indicator generated by spanmetrics |
|---|---|---|
| Rate | Requests processed per second | calls_total counter |
| Errors | Proportion of requests in an error state | calls_total{status_code="STATUS_CODE_ERROR"} |
| Duration | Distribution of request processing time | duration_bucket (histogram) |
Each span in a trace already carries information such as service name, operation name, status code, and span kind. The spanmetrics connector aggregates this information and converts it into metrics, eliminating the need for separate instrumentation.
A namespace prefix may be prepended to metric names. If you specify a
namespacesetting,calls_totalbecomes something liketraces_spanmetrics_calls_total. Before copying PromQL queries, it's recommended to first verify the actual metric names at the:8889/metricsendpoint.
The Role of the Connector Component
spanmetrics is a Connector component, not a Processor. This distinction matters for architectural design.
What is a Connector: A component within the OTel Collector that links two independent pipelines. It simultaneously acts as an Exporter in the traces pipeline and a Receiver in the metrics pipeline. A Processor only transforms data within a single pipeline, but a Connector can pass data across pipeline boundaries.
The pipeline data flow is as follows:
OTLP Receiver
↓
Batch Processor
↓
spanmetrics Connector ──→ (metrics pipeline) → Prometheus Exporter → Grafana
↓
Tempo / Jaeger Exporter (traces pipeline continues)Trace data is forwarded as-is to Tempo or Jaeger, while simultaneously the spanmetrics connector extracts metrics and exports them to Prometheus. It's an architecture that produces two signals from a single span flow.
Key Changes from 2024–2026: spanmetricsprocessor → spanmetricsconnector
Previously, the same functionality was provided as a Processor component named spanmetricsprocessor. It has now fully transitioned to spanmetricsconnector, included in the official Contrib distribution (v0.147.0+). If you're using the legacy Processor-based configuration, refer to the examples in this article to proceed with migration.
With these concepts in mind, let's build an actual Collector configuration file.
Practical Application
Example 1: Complete Basic Pipeline YAML Configuration
This is the most basic spanmetrics pipeline configuration. It receives traces from order-service, generates RED metrics, and exports them to Prometheus. (Configuration validated on v0.147.0)
connectors:
spanmetrics:
histogram:
explicit:
buckets: [5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2s, 5s]
dimensions:
- name: http.method
- name: http.route
- name: http.status_code
- name: service.version
exemplars:
enabled: true
metrics_flush_interval: 15s
metrics_expiration: 5m
aggregation_temporality: AGGREGATION_TEMPORALITY_CUMULATIVE
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
exporters:
prometheus:
endpoint: "0.0.0.0:8889"
otlp/tempo:
endpoint: tempo:4317 # Replace with jaeger exporter if using Jaeger
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [spanmetrics, otlp/tempo] # Connector appears as Exporter
metrics:
receivers: [spanmetrics] # Connector appears as Receiver
processors: [batch]
exporters: [prometheus]| Configuration Item | Role |
|---|---|
histogram.explicit.buckets |
Specifies latency bucket ranges for P50/P95/P99 calculations |
dimensions |
Span attributes to add to metrics (converted to labels) |
exemplars.enabled: true |
Embeds trace IDs in metric data points to enable drill-down in Grafana |
metrics_flush_interval: 15s |
How often aggregated metrics are exported (recommended: set to no more than Prometheus scrape_interval) |
metrics_expiration: 5m |
Expires a time series after 5 minutes of no new spans, saving memory |
aggregation_temporality |
Uses Cumulative mode for Prometheus scraping compatibility |
The relationship between
metrics_flush_intervalandscrape_interval: Ifmetrics_flush_intervalis longer than Prometheus'sscrape_interval(default 15s), metrics may not yet be reflected at scrape time, resulting in empty results. It is recommended to setmetrics_flush_intervalto be less than or equal to the Prometheusscrape_interval.
What is an Exemplar: A feature that attaches metadata such as
trace_idto Prometheus metric data points. In Grafana, clicking on a latency spike interval takes you directly to the actual trace from that moment, connecting the "detect metric anomaly → analyze trace root cause" workflow in a single click.
Additional configuration for Exemplar collection:
exemplars.enabled: truealone is not sufficient. Prometheus requires the--enable-feature=exemplar-storageflag at startup, andOpenMetricsText1.0.0must be added toscrape_protocolsinscrape_configsto collect in OpenMetrics format so that Exemplars are actually stored.
Example 2: Grafana Dashboard PromQL Queries
Once the Collector exposes Prometheus metrics, you can build RED panels in a Grafana dashboard with the following queries.
# Rate — requests per second for order-service (server spans only)
rate(calls_total{
service_name="order-service",
span_kind="SPAN_KIND_SERVER"
}[1m])
# Errors — error rate as a percentage of total requests for order-service
rate(calls_total{service_name="order-service",status_code="STATUS_CODE_ERROR"}[1m])
/
rate(calls_total{service_name="order-service"}[1m])
* 100
# Duration — P99 latency
histogram_quantile(
0.99,
rate(duration_bucket{service_name="order-service"}[5m])
)
# P50 latency (for comparison)
histogram_quantile(
0.50,
rate(duration_bucket{service_name="order-service"}[5m])
)To use
histogram_quantile, theduration_bucketmetric must exist. If the bucket configuration doesn't cover your service's actual response time range, P99 values may appear as+Inf, so it's recommended to tune the bucket range to match your service's characteristics.
Example 3: Integrated Configuration Using Grafana Alloy
Grafana Alloy is the successor to Grafana Agent and includes a built-in otelcol.connector.spanmetrics component. You can handle collection, transformation, and forwarding with a single Alloy binary without managing a separate OTel Collector.
// Alloy configuration file (config.alloy)
otelcol.receiver.otlp "default" {
grpc { endpoint = "0.0.0.0:4317" }
http { endpoint = "0.0.0.0:4318" }
output {
traces = [otelcol.processor.batch.default.input]
}
}
otelcol.processor.batch "default" {
output {
traces = [otelcol.connector.spanmetrics.default.input]
}
}
otelcol.connector.spanmetrics "default" {
histogram {
explicit {
buckets = ["5ms", "10ms", "25ms", "50ms", "100ms", "250ms", "500ms", "1s", "2s", "5s"]
}
}
// Same 4 dimensions as the YAML example
dimensions { name = "http.method" }
dimensions { name = "http.route" }
dimensions { name = "http.status_code" }
dimensions { name = "service.version" }
exemplars { enabled = true }
output {
metrics = [otelcol.exporter.prometheus.default.input]
traces = [otelcol.exporter.otlp.tempo.input]
}
}
otelcol.exporter.prometheus "default" {
forward_to = [prometheus.remote_write.mimir.receiver]
}
otelcol.exporter.otlp "tempo" {
client {
endpoint = "tempo:4317"
tls { insecure = true }
}
}The Alloy approach is particularly well-suited to the pattern of automatically generating per-service SLO dashboards in microservices environments. Grafana Cloud users may also want to consider the Tempo Metrics Generator option — refer to the table below for differences between the two approaches.
| Approach | Where metrics are generated | Best suited environment |
|---|---|---|
| spanmetrics connector | Collector / Alloy level | Existing OTel pipeline integration, self-hosted |
| Tempo Metrics Generator | Inside the Tempo server | Grafana Cloud, preference for simple setup |
Pros and Cons Analysis
Advantages
| Item | Description |
|---|---|
| Eliminates dual instrumentation | Automatically generates RED metrics from traces with no metric instrumentation in application code |
| Data consistency | Metrics and traces are derived from the same span, so there are no numerical discrepancies |
| Exemplar integration | Click on a latency spike in Grafana → one-click drill-down to the actual trace in Tempo |
| Flexible dimension configuration | Add HTTP attributes and custom tags as dimensions to generate granular metrics |
| Standard Prometheus compatibility | Immediate integration with existing Grafana / AlertManager ecosystem |
Disadvantages and Caveats
| Item | Description | Mitigation |
|---|---|---|
| Cardinality explosion | Time series count grows rapidly with the number of dimension attribute value combinations. OOM risk if combinations exceed 10,000 | Exclude high-cardinality values like user_id, request_id from dimensions |
| Increased memory usage | Aggregation state is kept in memory | Mitigate with aggregation_cardinality_limit and metrics_expiration settings |
| Collector SPOF | The spanmetrics connector Collector can become a single point of failure | Apply HA configuration or Gateway pattern |
| Data loss on restart | Some data may be lost during Collector restarts while Cumulative aggregation is in progress | Consider Delta temporality (verify Prometheus compatibility) |
| Buckets must be predefined | Explicit Histogram requires specifying bucket ranges in advance | Switch to Exponential Histogram or tune buckets to match service characteristics |
What is Cardinality: In time series databases, this refers to the number of unique label combinations. For example, adding
user_idas a dimension creates a time series per user, which can cause rapid memory growth in both Prometheus and the Collector.
Exponential Histogram: A histogram approach that dynamically represents the distribution without pre-specifying bucket boundaries. It is more suitable than the Explicit approach when a service's response time range is uncertain or changes frequently.
Most Common Mistakes in Practice
- Adding
user_id,trace_id,request_idto dimensions — This can cause cardinality explosion leading to OOM in both the Collector and Prometheus. It is strongly recommended that dimensions only include attributes with a bounded set of values (e.g., HTTP method, status code, route). - Forgetting to register the Connector in both places in
service.pipelines— spanmetrics must be listed in both theexportersof thetracespipeline and thereceiversof themetricspipeline. Registering it on only one side will cause the Collector to return an error on startup. - Not fully enabling Exemplar collection in Prometheus — Even with
exemplars.enabled: true, you still need both the--enable-feature=exemplar-storageflag when starting Prometheus and addingOpenMetricsText1.0.0toscrape_protocolsinscrape_configs. If either setting is missing, trace drill-down in Grafana will not work.
Closing Thoughts
The spanmetrics connector is the most practical way to build RED metrics and a Grafana dashboard without any application code changes, by leveraging traces you're already collecting.
Here are 3 steps you can take right now:
- Check your OTel Collector Contrib image. The
otel/opentelemetry-collector-contrib:0.147.0image includes the spanmetrics connector. If you're already using the Contrib image, you can skip straight to step 2. - Copy the basic YAML configuration from this article and apply it to your Collector. Add the
connectorsandservice.pipelinessections and restart the Collector — you should see thecalls_totalandduration_bucketmetrics appear at the:8889/metricsendpoint. - Add a Prometheus data source to Grafana and create RED panels with the three PromQL queries. Placing the Rate, Errors, and Duration panels side by side in a single dashboard row completes a basic SLO dashboard that gives you an at-a-glance view of service health.
Next article: Adding Grafana Tempo's Service Graph on top of the RED metrics pipeline built in this article lets you see the dependency map between microservices and per-call RED metrics on a single screen simultaneously. The next article will cover how to integrate that.
References
- Span Metrics Connector Official README | opentelemetry-collector-contrib
- otelcol.connector.spanmetrics | Grafana Alloy Official Documentation
- Metrics from traces | Grafana Tempo Official Documentation
- Use Alloy to generate span metrics from spans | Grafana Tempo Official Documentation
- Convert OpenTelemetry Traces to Metrics with SpanMetrics | Last9 Blog
- Converting Traces to Metrics for Grafana Dashboards | nsalexamy GitHub Pages
- Span Metrics Connector | Splunk Observability Cloud Official Documentation
- How to Use the Span Metrics Connector to Generate RED Metrics | OneUptime Blog
- How to Build a Grafana RED Metrics Dashboard from OpenTelemetry Span Metrics | OneUptime Blog
- OpenTelemetry Collector Data Flow Dashboard | Grafana Dashboards