OTel spanmetrics Connector: How to Auto-Generate RED Metrics from Traces Without Code Changes and Connect to Grafana

In a microservices environment, quickly assessing service health requires immediate answers to three questions: "How many requests are coming in right now?", "How many errors are occurring?", and "How long are responses taking?" These three are the RED metrics (Rate, Errors, Duration). The common assumption is that you need to instrument your application code separately to get these — but with OpenTelemetry Collector's spanmetrics connector, that's completely unnecessary.

The spanmetrics connector analyzes trace span data and automatically generates Prometheus-compatible RED metrics. You can build the entire pipeline through Collector configuration alone, without modifying a single line of application code. If you haven't adopted OTel yet, understanding the pipeline structure from this article means you can apply it immediately upon adoption.

After reading this article, you'll be able to understand how the spanmetrics connector works and build the complete pipeline yourself — writing real Collector YAML configuration and setting up Rate, Errors, and Duration panels in Grafana. We'll walk through each step: connector internals, full YAML configuration, PromQL queries, Alloy integration, and operational considerations.

Prerequisites for this article: This is written for backend and infrastructure engineers who are already running an OTel Collector and are familiar with basic pipeline configuration (OTLP Receiver, Batch Processor, Prometheus scraping).

Core Concepts

What Are RED Metrics

RED metrics are three golden signals for monitoring service health.

Metric	Meaning	Indicator generated by spanmetrics
Rate	Requests processed per second	`calls_total` counter
Errors	Proportion of requests in an error state	`calls_total{status_code="STATUS_CODE_ERROR"}`
Duration	Distribution of request processing time	`duration_bucket` (histogram)

Each span in a trace already carries information such as service name, operation name, status code, and span kind. The spanmetrics connector aggregates this information and converts it into metrics, eliminating the need for separate instrumentation.

A namespace prefix may be prepended to metric names. If you specify a namespace setting, calls_total becomes something like traces_spanmetrics_calls_total. Before copying PromQL queries, it's recommended to first verify the actual metric names at the :8889/metrics endpoint.

The Role of the Connector Component

spanmetrics is a Connector component, not a Processor. This distinction matters for architectural design.

What is a Connector: A component within the OTel Collector that links two independent pipelines. It simultaneously acts as an Exporter in the traces pipeline and a Receiver in the metrics pipeline. A Processor only transforms data within a single pipeline, but a Connector can pass data across pipeline boundaries.

The pipeline data flow is as follows:

OTLP Receiver
      ↓
Batch Processor
      ↓
spanmetrics Connector ──→ (metrics pipeline) → Prometheus Exporter → Grafana
      ↓
Tempo / Jaeger Exporter   (traces pipeline continues)

Trace data is forwarded as-is to Tempo or Jaeger, while simultaneously the spanmetrics connector extracts metrics and exports them to Prometheus. It's an architecture that produces two signals from a single span flow.

Key Changes from 2024–2026: spanmetricsprocessor → spanmetricsconnector

Previously, the same functionality was provided as a Processor component named spanmetricsprocessor. It has now fully transitioned to spanmetricsconnector, included in the official Contrib distribution (v0.147.0+). If you're using the legacy Processor-based configuration, refer to the examples in this article to proceed with migration.

With these concepts in mind, let's build an actual Collector configuration file.

Practical Application

Example 1: Complete Basic Pipeline YAML Configuration

This is the most basic spanmetrics pipeline configuration. It receives traces from order-service, generates RED metrics, and exports them to Prometheus. (Configuration validated on v0.147.0)

yaml

connectors:
  spanmetrics:
    histogram:
      explicit:
        buckets: [5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2s, 5s]
    dimensions:
      - name: http.method
      - name: http.route
      - name: http.status_code
      - name: service.version
    exemplars:
      enabled: true
    metrics_flush_interval: 15s
    metrics_expiration: 5m
    aggregation_temporality: AGGREGATION_TEMPORALITY_CUMULATIVE
 
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
 
processors:
  batch:
 
exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
  otlp/tempo:
    endpoint: tempo:4317  # Replace with jaeger exporter if using Jaeger
    tls:
      insecure: true
 
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [spanmetrics, otlp/tempo]  # Connector appears as Exporter
    metrics:
      receivers: [spanmetrics]              # Connector appears as Receiver
      processors: [batch]
      exporters: [prometheus]

Configuration Item	Role
`histogram.explicit.buckets`	Specifies latency bucket ranges for P50/P95/P99 calculations
`dimensions`	Span attributes to add to metrics (converted to labels)
`exemplars.enabled: true`	Embeds trace IDs in metric data points to enable drill-down in Grafana
`metrics_flush_interval: 15s`	How often aggregated metrics are exported (recommended: set to no more than Prometheus scrape_interval)
`metrics_expiration: 5m`	Expires a time series after 5 minutes of no new spans, saving memory
`aggregation_temporality`	Uses Cumulative mode for Prometheus scraping compatibility

The relationship between metrics_flush_interval and scrape_interval: If metrics_flush_interval is longer than Prometheus's scrape_interval (default 15s), metrics may not yet be reflected at scrape time, resulting in empty results. It is recommended to set metrics_flush_interval to be less than or equal to the Prometheus scrape_interval.

What is an Exemplar: A feature that attaches metadata such as trace_id to Prometheus metric data points. In Grafana, clicking on a latency spike interval takes you directly to the actual trace from that moment, connecting the "detect metric anomaly → analyze trace root cause" workflow in a single click.

Additional configuration for Exemplar collection: exemplars.enabled: true alone is not sufficient. Prometheus requires the --enable-feature=exemplar-storage flag at startup, and OpenMetricsText1.0.0 must be added to scrape_protocols in scrape_configs to collect in OpenMetrics format so that Exemplars are actually stored.

Example 2: Grafana Dashboard PromQL Queries

Once the Collector exposes Prometheus metrics, you can build RED panels in a Grafana dashboard with the following queries.

promql

# Rate — requests per second for order-service (server spans only)
rate(calls_total{
  service_name="order-service",
  span_kind="SPAN_KIND_SERVER"
}[1m])
 
# Errors — error rate as a percentage of total requests for order-service
rate(calls_total{service_name="order-service",status_code="STATUS_CODE_ERROR"}[1m])
/
rate(calls_total{service_name="order-service"}[1m])
* 100
 
# Duration — P99 latency
histogram_quantile(
  0.99,
  rate(duration_bucket{service_name="order-service"}[5m])
)
 
# P50 latency (for comparison)
histogram_quantile(
  0.50,
  rate(duration_bucket{service_name="order-service"}[5m])
)

To use histogram_quantile, the duration_bucket metric must exist. If the bucket configuration doesn't cover your service's actual response time range, P99 values may appear as +Inf, so it's recommended to tune the bucket range to match your service's characteristics.

Example 3: Integrated Configuration Using Grafana Alloy

Grafana Alloy is the successor to Grafana Agent and includes a built-in otelcol.connector.spanmetrics component. You can handle collection, transformation, and forwarding with a single Alloy binary without managing a separate OTel Collector.

hcl

// Alloy configuration file (config.alloy)
 
otelcol.receiver.otlp "default" {
  grpc { endpoint = "0.0.0.0:4317" }
  http { endpoint = "0.0.0.0:4318" }
 
  output {
    traces = [otelcol.processor.batch.default.input]
  }
}
 
otelcol.processor.batch "default" {
  output {
    traces = [otelcol.connector.spanmetrics.default.input]
  }
}
 
otelcol.connector.spanmetrics "default" {
  histogram {
    explicit {
      buckets = ["5ms", "10ms", "25ms", "50ms", "100ms", "250ms", "500ms", "1s", "2s", "5s"]
    }
  }
 
  // Same 4 dimensions as the YAML example
  dimensions { name = "http.method" }
  dimensions { name = "http.route" }
  dimensions { name = "http.status_code" }
  dimensions { name = "service.version" }
 
  exemplars { enabled = true }
 
  output {
    metrics = [otelcol.exporter.prometheus.default.input]
    traces  = [otelcol.exporter.otlp.tempo.input]
  }
}
 
otelcol.exporter.prometheus "default" {
  forward_to = [prometheus.remote_write.mimir.receiver]
}
 
otelcol.exporter.otlp "tempo" {
  client {
    endpoint = "tempo:4317"
    tls { insecure = true }
  }
}

The Alloy approach is particularly well-suited to the pattern of automatically generating per-service SLO dashboards in microservices environments. Grafana Cloud users may also want to consider the Tempo Metrics Generator option — refer to the table below for differences between the two approaches.

Approach	Where metrics are generated	Best suited environment
spanmetrics connector	Collector / Alloy level	Existing OTel pipeline integration, self-hosted
Tempo Metrics Generator	Inside the Tempo server	Grafana Cloud, preference for simple setup

Pros and Cons Analysis

Advantages

Item	Description
Eliminates dual instrumentation	Automatically generates RED metrics from traces with no metric instrumentation in application code
Data consistency	Metrics and traces are derived from the same span, so there are no numerical discrepancies
Exemplar integration	Click on a latency spike in Grafana → one-click drill-down to the actual trace in Tempo
Flexible dimension configuration	Add HTTP attributes and custom tags as dimensions to generate granular metrics
Standard Prometheus compatibility	Immediate integration with existing Grafana / AlertManager ecosystem

Disadvantages and Caveats

Item	Description	Mitigation
Cardinality explosion	Time series count grows rapidly with the number of dimension attribute value combinations. OOM risk if combinations exceed 10,000	Exclude high-cardinality values like `user_id`, `request_id` from dimensions
Increased memory usage	Aggregation state is kept in memory	Mitigate with `aggregation_cardinality_limit` and `metrics_expiration` settings
Collector SPOF	The spanmetrics connector Collector can become a single point of failure	Apply HA configuration or Gateway pattern
Data loss on restart	Some data may be lost during Collector restarts while Cumulative aggregation is in progress	Consider Delta temporality (verify Prometheus compatibility)
Buckets must be predefined	Explicit Histogram requires specifying bucket ranges in advance	Switch to Exponential Histogram or tune buckets to match service characteristics

What is Cardinality: In time series databases, this refers to the number of unique label combinations. For example, adding user_id as a dimension creates a time series per user, which can cause rapid memory growth in both Prometheus and the Collector.

Exponential Histogram: A histogram approach that dynamically represents the distribution without pre-specifying bucket boundaries. It is more suitable than the Explicit approach when a service's response time range is uncertain or changes frequently.

Most Common Mistakes in Practice

Adding user_id, trace_id, request_id to dimensions — This can cause cardinality explosion leading to OOM in both the Collector and Prometheus. It is strongly recommended that dimensions only include attributes with a bounded set of values (e.g., HTTP method, status code, route).
Forgetting to register the Connector in both places in service.pipelines — spanmetrics must be listed in both the exporters of the traces pipeline and the receivers of the metrics pipeline. Registering it on only one side will cause the Collector to return an error on startup.
Not fully enabling Exemplar collection in Prometheus — Even with exemplars.enabled: true, you still need both the --enable-feature=exemplar-storage flag when starting Prometheus and adding OpenMetricsText1.0.0 to scrape_protocols in scrape_configs. If either setting is missing, trace drill-down in Grafana will not work.

Closing Thoughts

The spanmetrics connector is the most practical way to build RED metrics and a Grafana dashboard without any application code changes, by leveraging traces you're already collecting.

Here are 3 steps you can take right now:

Check your OTel Collector Contrib image. The otel/opentelemetry-collector-contrib:0.147.0 image includes the spanmetrics connector. If you're already using the Contrib image, you can skip straight to step 2.
Copy the basic YAML configuration from this article and apply it to your Collector. Add the connectors and service.pipelines sections and restart the Collector — you should see the calls_total and duration_bucket metrics appear at the :8889/metrics endpoint.
Add a Prometheus data source to Grafana and create RED panels with the three PromQL queries. Placing the Rate, Errors, and Duration panels side by side in a single dashboard row completes a basic SLO dashboard that gives you an at-a-glance view of service health.

Next article: Adding Grafana Tempo's Service Graph on top of the RED metrics pipeline built in this article lets you see the dependency map between microservices and per-call RED metrics on a single screen simultaneously. The next article will cover how to integrate that.

References

OTel spanmetrics Connector: How to Auto-Generate RED Metrics from Traces Without Code Changes and Connect to Grafana | DEV BAK - 기술블로그

DevOps

OTel spanmetrics Connector: How to Auto-Generate RED Metrics from Traces Without Code Changes and Connect to Grafana

Prerequisites for this article: This is written for backend and infrastructure engineers who are already running an OTel Collector and are familiar with basic pipeline configuration (OTLP Receiver, Batch Processor, Prometheus scraping).

Core Concepts

What Are RED Metrics

RED metrics are three golden signals for monitoring service health.

Metric	Meaning	Indicator generated by spanmetrics
Rate	Requests processed per second	`calls_total` counter
Errors	Proportion of requests in an error state	`calls_total{status_code="STATUS_CODE_ERROR"}`
Duration	Distribution of request processing time	`duration_bucket` (histogram)

A namespace prefix may be prepended to metric names. If you specify a namespace setting, calls_total becomes something like traces_spanmetrics_calls_total. Before copying PromQL queries, it's recommended to first verify the actual metric names at the :8889/metrics endpoint.

The Role of the Connector Component

spanmetrics is a Connector component, not a Processor. This distinction matters for architectural design.

What is a Connector: A component within the OTel Collector that links two independent pipelines. It simultaneously acts as an Exporter in the traces pipeline and a Receiver in the metrics pipeline. A Processor only transforms data within a single pipeline, but a Connector can pass data across pipeline boundaries.

The pipeline data flow is as follows:

OTLP Receiver
      ↓
Batch Processor
      ↓
spanmetrics Connector ──→ (metrics pipeline) → Prometheus Exporter → Grafana
      ↓
Tempo / Jaeger Exporter   (traces pipeline continues)

Key Changes from 2024–2026: spanmetricsprocessor → spanmetricsconnector

With these concepts in mind, let's build an actual Collector configuration file.

Practical Application

Example 1: Complete Basic Pipeline YAML Configuration

This is the most basic spanmetrics pipeline configuration. It receives traces from order-service, generates RED metrics, and exports them to Prometheus. (Configuration validated on v0.147.0)

yaml

connectors:
  spanmetrics:
    histogram:
      explicit:
        buckets: [5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2s, 5s]
    dimensions:
      - name: http.method
      - name: http.route
      - name: http.status_code
      - name: service.version
    exemplars:
      enabled: true
    metrics_flush_interval: 15s
    metrics_expiration: 5m
    aggregation_temporality: AGGREGATION_TEMPORALITY_CUMULATIVE
 
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
 
processors:
  batch:
 
exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
  otlp/tempo:
    endpoint: tempo:4317  # Replace with jaeger exporter if using Jaeger
    tls:
      insecure: true
 
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [spanmetrics, otlp/tempo]  # Connector appears as Exporter
    metrics:
      receivers: [spanmetrics]              # Connector appears as Receiver
      processors: [batch]
      exporters: [prometheus]

Configuration Item	Role
`histogram.explicit.buckets`	Specifies latency bucket ranges for P50/P95/P99 calculations
`dimensions`	Span attributes to add to metrics (converted to labels)
`exemplars.enabled: true`	Embeds trace IDs in metric data points to enable drill-down in Grafana
`metrics_flush_interval: 15s`	How often aggregated metrics are exported (recommended: set to no more than Prometheus scrape_interval)
`metrics_expiration: 5m`	Expires a time series after 5 minutes of no new spans, saving memory
`aggregation_temporality`	Uses Cumulative mode for Prometheus scraping compatibility

The relationship between metrics_flush_interval and scrape_interval: If metrics_flush_interval is longer than Prometheus's scrape_interval (default 15s), metrics may not yet be reflected at scrape time, resulting in empty results. It is recommended to set metrics_flush_interval to be less than or equal to the Prometheus scrape_interval.

What is an Exemplar: A feature that attaches metadata such as trace_id to Prometheus metric data points. In Grafana, clicking on a latency spike interval takes you directly to the actual trace from that moment, connecting the "detect metric anomaly → analyze trace root cause" workflow in a single click.

Additional configuration for Exemplar collection: exemplars.enabled: true alone is not sufficient. Prometheus requires the --enable-feature=exemplar-storage flag at startup, and OpenMetricsText1.0.0 must be added to scrape_protocols in scrape_configs to collect in OpenMetrics format so that Exemplars are actually stored.

Example 2: Grafana Dashboard PromQL Queries

Once the Collector exposes Prometheus metrics, you can build RED panels in a Grafana dashboard with the following queries.

promql

# Rate — requests per second for order-service (server spans only)
rate(calls_total{
  service_name="order-service",
  span_kind="SPAN_KIND_SERVER"
}[1m])
 
# Errors — error rate as a percentage of total requests for order-service
rate(calls_total{service_name="order-service",status_code="STATUS_CODE_ERROR"}[1m])
/
rate(calls_total{service_name="order-service"}[1m])
* 100
 
# Duration — P99 latency
histogram_quantile(
  0.99,
  rate(duration_bucket{service_name="order-service"}[5m])
)
 
# P50 latency (for comparison)
histogram_quantile(
  0.50,
  rate(duration_bucket{service_name="order-service"}[5m])
)

To use histogram_quantile, the duration_bucket metric must exist. If the bucket configuration doesn't cover your service's actual response time range, P99 values may appear as +Inf, so it's recommended to tune the bucket range to match your service's characteristics.

Example 3: Integrated Configuration Using Grafana Alloy

hcl

// Alloy configuration file (config.alloy)
 
otelcol.receiver.otlp "default" {
  grpc { endpoint = "0.0.0.0:4317" }
  http { endpoint = "0.0.0.0:4318" }
 
  output {
    traces = [otelcol.processor.batch.default.input]
  }
}
 
otelcol.processor.batch "default" {
  output {
    traces = [otelcol.connector.spanmetrics.default.input]
  }
}
 
otelcol.connector.spanmetrics "default" {
  histogram {
    explicit {
      buckets = ["5ms", "10ms", "25ms", "50ms", "100ms", "250ms", "500ms", "1s", "2s", "5s"]
    }
  }
 
  // Same 4 dimensions as the YAML example
  dimensions { name = "http.method" }
  dimensions { name = "http.route" }
  dimensions { name = "http.status_code" }
  dimensions { name = "service.version" }
 
  exemplars { enabled = true }
 
  output {
    metrics = [otelcol.exporter.prometheus.default.input]
    traces  = [otelcol.exporter.otlp.tempo.input]
  }
}
 
otelcol.exporter.prometheus "default" {
  forward_to = [prometheus.remote_write.mimir.receiver]
}
 
otelcol.exporter.otlp "tempo" {
  client {
    endpoint = "tempo:4317"
    tls { insecure = true }
  }
}

Approach	Where metrics are generated	Best suited environment
spanmetrics connector	Collector / Alloy level	Existing OTel pipeline integration, self-hosted
Tempo Metrics Generator	Inside the Tempo server	Grafana Cloud, preference for simple setup

Pros and Cons Analysis

Advantages

Item	Description
Eliminates dual instrumentation	Automatically generates RED metrics from traces with no metric instrumentation in application code
Data consistency	Metrics and traces are derived from the same span, so there are no numerical discrepancies
Exemplar integration	Click on a latency spike in Grafana → one-click drill-down to the actual trace in Tempo
Flexible dimension configuration	Add HTTP attributes and custom tags as dimensions to generate granular metrics
Standard Prometheus compatibility	Immediate integration with existing Grafana / AlertManager ecosystem

Disadvantages and Caveats

Item	Description	Mitigation
Cardinality explosion	Time series count grows rapidly with the number of dimension attribute value combinations. OOM risk if combinations exceed 10,000	Exclude high-cardinality values like `user_id`, `request_id` from dimensions
Increased memory usage	Aggregation state is kept in memory	Mitigate with `aggregation_cardinality_limit` and `metrics_expiration` settings
Collector SPOF	The spanmetrics connector Collector can become a single point of failure	Apply HA configuration or Gateway pattern
Data loss on restart	Some data may be lost during Collector restarts while Cumulative aggregation is in progress	Consider Delta temporality (verify Prometheus compatibility)
Buckets must be predefined	Explicit Histogram requires specifying bucket ranges in advance	Switch to Exponential Histogram or tune buckets to match service characteristics

What is Cardinality: In time series databases, this refers to the number of unique label combinations. For example, adding user_id as a dimension creates a time series per user, which can cause rapid memory growth in both Prometheus and the Collector.

Exponential Histogram: A histogram approach that dynamically represents the distribution without pre-specifying bucket boundaries. It is more suitable than the Explicit approach when a service's response time range is uncertain or changes frequently.

Most Common Mistakes in Practice

Adding user_id, trace_id, request_id to dimensions — This can cause cardinality explosion leading to OOM in both the Collector and Prometheus. It is strongly recommended that dimensions only include attributes with a bounded set of values (e.g., HTTP method, status code, route).
Forgetting to register the Connector in both places in service.pipelines — spanmetrics must be listed in both the exporters of the traces pipeline and the receivers of the metrics pipeline. Registering it on only one side will cause the Collector to return an error on startup.
Not fully enabling Exemplar collection in Prometheus — Even with exemplars.enabled: true, you still need both the --enable-feature=exemplar-storage flag when starting Prometheus and adding OpenMetricsText1.0.0 to scrape_protocols in scrape_configs. If either setting is missing, trace drill-down in Grafana will not work.

Closing Thoughts

The spanmetrics connector is the most practical way to build RED metrics and a Grafana dashboard without any application code changes, by leveraging traces you're already collecting.

Here are 3 steps you can take right now:

Check your OTel Collector Contrib image. The otel/opentelemetry-collector-contrib:0.147.0 image includes the spanmetrics connector. If you're already using the Contrib image, you can skip straight to step 2.
Copy the basic YAML configuration from this article and apply it to your Collector. Add the connectors and service.pipelines sections and restart the Collector — you should see the calls_total and duration_bucket metrics appear at the :8889/metrics endpoint.
Add a Prometheus data source to Grafana and create RED panels with the three PromQL queries. Placing the Rate, Errors, and Duration panels side by side in a single dashboard row completes a basic SLO dashboard that gives you an at-a-glance view of service health.

Next article: Adding Grafana Tempo's Service Graph on top of the RED metrics pipeline built in this article lets you see the dependency map between microservices and per-call RED metrics on a single screen simultaneously. The next article will cover how to integrate that.

Core Concepts

What Are RED Metrics

The Role of the Connector Component

Key Changes from 2024–2026: spanmetricsprocessor → spanmetricsconnector

Practical Application

Example 1: Complete Basic Pipeline YAML Configuration

Example 2: Grafana Dashboard PromQL Queries

Example 3: Integrated Configuration Using Grafana Alloy

Pros and Cons Analysis

Advantages

Disadvantages and Caveats

Most Common Mistakes in Practice

Closing Thoughts

References

Core Concepts

What Are RED Metrics

The Role of the Connector Component

Key Changes from 2024–2026: spanmetricsprocessor → spanmetricsconnector

Practical Application

Example 1: Complete Basic Pipeline YAML Configuration

Example 2: Grafana Dashboard PromQL Queries

Example 3: Integrated Configuration Using Grafana Alloy

Pros and Cons Analysis

Advantages

Disadvantages and Caveats

Most Common Mistakes in Practice

Closing Thoughts

References

Recommended Posts

Building an IDP with Backstage: The Story of Personally Implementing a Self-Service Deployment Environment

AI-Driven Frontend CI/CD: Transforming Deployment Pipelines with Predictive, Self-Healing, and Autonomous Testing

Declaratively Automating Infrastructure with GitOps — From Deployment to Automated Recovery with Argo CD

Tail Sampling + KEDA: A 2-Tier OTel Architecture That Never Misses a Trace During Traffic Spikes

OpenTelemetry Tail Sampling Deep Dive: Composite Policy Design and Memory Optimization with decision_wait

Complete Guide to loadbalancingexporter: Guaranteeing Tail Sampling Accuracy with a 2-Tier Architecture