The Complete Guide to MCP Server Observability: From Prometheus Metrics and Distributed Trace to Anomaly Detection

A single complex agent workflow generates tens to hundreds of tool calls within a single user session. Each call consumes latency, generates errors, and records security events. If this flow remains a black box, the failure to detect latency spikes or abnormal call patterns in advance can lead to system failures or security incidents. The problem is that because MCP is a relatively new protocol based on JSON-RPC 2.0, it is difficult to secure sufficient visibility using existing APM tools alone.

In this article, we examine the MCP observability pipeline step-by-step, consisting of four stages: OpenTelemetry MCP semantic conventions, Prometheus metric instrumentation, distributed tracing context propagation, and SIEM anomaly detection. After reading this article, you will be able to personally configure a pipeline that integrates structured logs, metrics, and distributed tracing into an MCP server and detects abnormal tool call patterns in real time.

Before you begin, here are the prerequisites. You should be able to follow most of the examples if you have basic Python or TypeScript syntax and experience with Docker. Otel Collector YAML and PromQL are explained one by one in each section, so even beginners can read through this without difficulty. The actual target audience is backend or MLOps engineers who are currently deploying MCP to production or are considering its deployment.

Key Concepts

MCP and the 3 Key Elements of Observability

MCP (Model Context Protocol) is a standard communication protocol between AI agents and tools designed by Anthropic and released as open source in 2024. It runs on top of JSON-RPC 2.0 and enables AI agents (Clients) to communicate in a standardized manner with MCP servers (Servers) that provide external tools or data sources.

📌 What is a Tool Call? It is a unit operation in which an agent requests a request from the MCP server. Internally, the JSON-RPC request is serialized and transmitted as a tools/call message type.

In an MCP environment, observability consists of three axes.

Element	Role	Representative Tool
Structured Logs	Logs of Tool Call Requests/Responses and Error Causes	Loki, Datadog Logs
Metrics	Time series aggregation of latency, calls, and error rates	Prometheus, Grafana Mimir
Distributed Traces	Linking Agent Inference → Tool Execution Flow	Jaeger, Grafana Tempo

The overall pipeline structure integrating the three elements is as follows.

MCP Client (에이전트)
    │ tools/call + _meta.traceparent (W3C Trace Context)
    ▼
MCP Server (Python/FastAPI)
    │ OTLP gRPC (트레이스·메트릭·로그)      │ 구조화 JSON 로그
    ▼                                         ▼
OTel Collector                           Datadog Cloud SIEM
    │                                     (이상 탐지 룰)
    ├──▶ Prometheus (메트릭)
    │         │
    │         └──▶ Grafana (대시보드·알람)
    │
    ├──▶ Grafana Tempo (분산 추적)
    │
    └──▶ Loki (로그)

OpenTelemetry MCP Semantic Convention

OpenTelemetry has defined MCP-specific Span properties and metrics under the gen-ai/mcp/ namespace as semantic conventions. Using MCP-specific conventions instead of existing RPC conventions is recommended.

⚠️ Stability Notice: Currently, most OTel semantic conventions related to Gen AI are in the experimental state. We recommend fixing your OTel SDK version before production deployment and regularly checking the release notes for convention changes.

Span names follow the following format.

{mcp.method.name} {target}
예: tools/call weather_tool

Here are examples of key Span attributes.

Attribute Key	Meaning	Example Value
`mcp.method.name`	Called MCP method	`tools/call`
`mcp.tool.name`	Name of Executed Tool	`get_weather`
`mcp.session.id`	Session identifier (used only for trace properties)	`sess_abc123`
`gen_ai.usage.input_tokens`	Number of input tokens	`412`
`error.type`	Error Classification	`timeout`

Distributed Trace Context Propagation

The fact that MCP is based on JSON-RPC 2.0 creates one important limitation.

📌 JSON-RPC 2.0 Brief Overview: In JSON-RPC 2.0, the params object is a standard field that holds method arguments. _meta is a non-standard namespace for protocol extensions, reserved in the MCP specification for applications to use freely.

Since native W3C Trace Context propagation via HTTP headers is not supported, we use a method of injecting traceparent and tracestate into the params._meta property bag.

json

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "get_weather",
    "arguments": { "city": "Seoul" },
    "_meta": {
      "traceparent": "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01",
      "tracestate": "vendor=myapp"
    }
  },
  "id": 1
}

On the server side, this value is parsed and restored to the OTel context.

python

from opentelemetry.propagate import extract
from opentelemetry.trace import get_tracer
from typing import Any, TypedDict
 
tracer = get_tracer("mcp-server")
 
 
class McpParams(TypedDict, total=False):
    name: str
    arguments: dict[str, Any]
    _meta: dict[str, str]
 
 
class McpRequest(TypedDict):
    jsonrpc: str
    method: str
    params: McpParams
    id: int | str
 
 
def execute_tool(params: McpParams) -> Any:
    """실제 Tool 실행 로직으로 대체하세요."""
    raise NotImplementedError(f"Tool '{params.get('name')}' not implemented")
 
 
def handle_tool_call(request: McpRequest) -> Any:
    """_meta에서 W3C Trace Context를 추출해 OTel 스팬과 연결합니다."""
    meta = request.get("params", {}).get("_meta", {})
    carrier = {
        "traceparent": meta.get("traceparent", ""),
        "tracestate": meta.get("tracestate", ""),
    }
    ctx = extract(carrier)
    tool_name = request["params"].get("name", "unknown")
 
    with tracer.start_as_current_span(
        f"tools/call {tool_name}",
        context=ctx,
        attributes={
            "mcp.method.name": "tools/call",
            "mcp.tool.name": tool_name,
        },
    ):
        return execute_tool(request["params"])

Practical Application

The four steps below can be performed sequentially or applied independently. The Python MCP server (Steps 1, 2, and 4) and the TypeScript agent (Step 3) are a combination frequently seen in actual architectures. This is because Python is primarily used for server implementations combined with ML/AI libraries, while TypeScript is mainly used for the agent orchestration layer. If you use only Python, you can implement the TypeScript logic in Step 3 identically using the Python OTel SDK.

Step 1: Instrumenting Prometheus Metrics on a Python MCP Server

This is an example of collecting latency and error rates per tool by combining prometheus-fastapi-instrumentator with a FastAPI-based MCP server.

python

from fastapi import FastAPI
from prometheus_fastapi_instrumentator import Instrumentator
from prometheus_client import Histogram, Counter
from typing import Any, TypedDict
import time
 
app = FastAPI()
 
 
class ToolCallRequest(TypedDict):
    params: dict[str, Any]
 
 
# MCP 전용 커스텀 메트릭 정의
# 주의: 낮은 카디널리티 레이블(tool_name, status)만 사용합니다.
# user_id처럼 고유 값이 많은 필드는 Prometheus 레이블이 아닌
# 트레이스 속성이나 로그 필드로 기록하세요.
tool_duration = Histogram(
    "mcp_tool_invocation_duration_seconds",
    "MCP Tool Call 실행 시간",
    labelnames=["tool_name", "status"],
    buckets=[0.01, 0.05, 0.1, 0.5, 1.0, 5.0],
)
 
tool_errors = Counter(
    "mcp_tool_errors_total",
    "MCP Tool Call 에러 총 횟수",
    labelnames=["tool_name", "error_type"],
)
 
tool_calls = Counter(
    "mcp_tool_calls_total",
    "MCP Tool Call 총 호출 수",
    labelnames=["tool_name"],
)
 
# FastAPI 기본 메트릭 자동 계측
Instrumentator().instrument(app).expose(app)
 
 
async def execute_tool(tool_name: str, arguments: dict[str, Any]) -> Any:
    """실제 Tool 실행 로직으로 대체하세요."""
    raise NotImplementedError(f"Tool '{tool_name}' not implemented")
 
 
@app.post("/mcp/tools/call")
async def call_tool(request: ToolCallRequest) -> Any:
    tool_name = request["params"].get("name", "unknown")
    tool_calls.labels(tool_name=tool_name).inc()
 
    start = time.time()
    try:
        result = await execute_tool(
            tool_name, request["params"].get("arguments", {})
        )
        duration = time.time() - start
        tool_duration.labels(tool_name=tool_name, status="success").observe(duration)
        return result
    except Exception as e:
        duration = time.time() - start
        tool_duration.labels(tool_name=tool_name, status="error").observe(duration)
        tool_errors.labels(tool_name=tool_name, error_type=type(e).__name__).inc()
        raise

When Prometheus scrapes metrics exposed from the /metrics endpoint, you can query latency in Grafana using PromQL as shown below.

# Tool별 p95 레이턴시
histogram_quantile(
  0.95,
  rate(mcp_tool_invocation_duration_seconds_bucket[5m])
) by (tool_name)
 
# 분당 에러율
rate(mcp_tool_errors_total[1m]) by (tool_name, error_type)
 
# Tool별 호출량 추이
rate(mcp_tool_calls_total[10m]) by (tool_name)

💡 What is Cardinality? In Prometheus, it refers to the number of unique combinations of labels. If you use values that change every time, such as session_id, or labels for thousands of users, such as user_id, the number of time series increases explosively, leading to memory issues. If user-specific analysis is required, it is recommended to record user_id as a trace attribute or log field, and to use only Prometheus labels with limited value types, such as tool_name and status.

Step 2: Configuring an Integrated Pipeline with OpenTelemetry Collector

By using OTel Collector as a central hub, you can collect metrics, traces, and logs generated by MCP servers into a single pipeline and route them to multiple backends.

yaml

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  # MCP 서버 Prometheus /metrics 스크랩
  prometheus:
    config:
      scrape_configs:
        - job_name: "mcp-server"
          scrape_interval: 15s
          static_configs:
            - targets: ["mcp-server:8000"]
 
processors:
  # PII 마스킹 — Tool 인자에서 민감 정보 제거
  # replace_pattern()은 OTTL(OpenTelemetry Transformation Language) 문법입니다.
  # 아래 정규식의 YAML 이스케이프 처리는 실제 환경에서 반드시 검증하세요.
  transform/mask_pii:
    error_mode: ignore
    trace_statements:
      - context: span
        statements:
          - replace_pattern(attributes["mcp.tool.arguments"],
              "\"password\"\\s*:\\s*\"[^\"]+\"",
              "\"password\": \"[REDACTED]\"")
 
  # 배치 처리로 오버헤드 감소
  batch:
    send_batch_size: 1000
    timeout: 10s
 
  # 메모리 제한
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
 
exporters:
  # Grafana Tempo로 트레이스 전송
  otlp/tempo:
    endpoint: tempo:4317
    tls:
      insecure: true
 
  # Prometheus Remote Write로 메트릭 전송
  prometheusremotewrite:
    endpoint: "http://prometheus:9090/api/v1/write"
 
  # Loki로 로그 전송
  loki:
    endpoint: http://loki:3100/loki/api/v1/push
 
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [transform/mask_pii, batch]
      exporters: [otlp/tempo]
    metrics:
      receivers: [otlp, prometheus]
      processors: [batch, memory_limiter]
      exporters: [prometheusremotewrite]
    logs:
      receivers: [otlp]
      processors: [transform/mask_pii, batch]
      exporters: [loki]

Step 3: Connecting Agent Inference and Tool Calls with End-to-End Trace

This is an example of configuring a parent-child trace structure in TypeScript that connects the Agent Inference span to the MCP Tool Call span.

📌 Node.js Version Note: The example below uses crypto.randomUUID(). It is supported in Node.js 15 and above. For environments 14 or lower, please install the uuid package and replace it with import { v4 as uuidv4 } from 'uuid'.

typescript

import { trace, context, SpanStatusCode } from "@opentelemetry/api";
import { W3CTraceContextPropagator } from "@opentelemetry/core";
 
const tracer = trace.getTracer("mcp-agent", "1.0.0");
const propagator = new W3CTraceContextPropagator();
 
interface ToolPlan {
  name: string;
  sessionId: string;
  args: Record<string, unknown>;
}
 
// Step 1에서 구성한 MCP 서버의 /mcp/tools/call 엔드포인트를 호출합니다.
async function callMcpServer(request: unknown): Promise<unknown> {
  // 예: fetch("http://mcp-server:8000/mcp/tools/call", { body: JSON.stringify(request) })
  throw new Error("callMcpServer: 실제 HTTP 요청 로직으로 대체하세요");
}
 
async function planToolCalls(userQuery: string): Promise<ToolPlan[]> {
  // 실제 LLM 추론 로직으로 대체하세요
  return [];
}
 
async function runAgentWithObservability(userQuery: string): Promise<void> {
  // 에이전트 추론 스팬 시작 (루트)
  return await tracer.startActiveSpan(
    "gen_ai.agent reasoning",
    {
      attributes: {
        "gen_ai.system": "anthropic",
        "gen_ai.request.model": "claude-sonnet-4-6",
        "user.query": userQuery,
      },
    },
    async (agentSpan) => {
      try {
        const toolsToCall = await planToolCalls(userQuery);
 
        for (const tool of toolsToCall) {
          // Tool Call 스팬을 에이전트 스팬의 자식으로 생성
          await tracer.startActiveSpan(
            `tools/call ${tool.name}`,
            {
              attributes: {
                "mcp.method.name": "tools/call",
                "mcp.tool.name": tool.name,
                "mcp.session.id": tool.sessionId,
              },
            },
            async (toolSpan) => {
              try {
                // _meta에 traceparent 주입 (W3C Trace Context 전파)
                const carrier: Record<string, string> = {};
                propagator.inject(context.active(), carrier);
 
                const mcpRequest = {
                  jsonrpc: "2.0",
                  method: "tools/call",
                  params: {
                    name: tool.name,
                    arguments: tool.args,
                    _meta: {
                      traceparent: carrier["traceparent"] ?? "",
                      tracestate: carrier["tracestate"] ?? "",
                    },
                  },
                  id: crypto.randomUUID(), // Node.js 15+ 필요
                };
 
                const result = await callMcpServer(mcpRequest);
                toolSpan.setStatus({ code: SpanStatusCode.OK });
                return result;
              } catch (err) {
                toolSpan.setStatus({
                  code: SpanStatusCode.ERROR,
                  message: String(err),
                });
                toolSpan.recordException(err as Error);
                throw err;
              } finally {
                toolSpan.end();
              }
            }
          );
        }
 
        agentSpan.setStatus({ code: SpanStatusCode.OK });
      } finally {
        agentSpan.end();
      }
    }
  );
}

The trace generated with this structure is visualized in Grafana Tempo or Jaeger as a hierarchical structure as shown below.

gen_ai.agent reasoning (350ms)
├── tools/call get_weather (45ms)
├── tools/call query_database (180ms)
│   └── db.query SELECT ... (120ms)
└── tools/call send_report (80ms)

Step 4: Detecting Tool Call Anomalies with Datadog Cloud SIEM

This is an example of collecting structured logs from an MCP server into Datadog and applying anomaly detection rules.

python

import logging
import json
from datetime import datetime, timezone
from typing import Optional
 
logger = logging.getLogger("mcp.security")
 
 
def log_tool_call(
    tool_name: str,
    user_id: str,
    session_id: str,
    status: str,
    duration_ms: float,
    error: Optional[str] = None,
) -> None:
    """Datadog SIEM이 파싱할 수 있는 구조화 로그를 출력합니다."""
    log_entry: dict = {
        "timestamp": datetime.now(timezone.utc).isoformat(),
        "event.category": "mcp.tool_call",
        "mcp.tool.name": tool_name,
        "usr.id": user_id,            # Datadog 표준 사용자 속성
        "mcp.session.id": session_id,
        "http.status": 200 if status == "success" else 500,
        "duration_ms": duration_ms,
        "status": status,
    }
    if error:
        log_entry["error.message"] = error
        log_entry["error.kind"] = "MCPToolError"
 
    logger.info(json.dumps(log_entry))

Below is the pseudocode for conceptual explanation of detection rules that can be configured in Datadog Cloud SIEM. Since some syntax may differ from the actual Datadog Detection Rule query language, it is recommended to refer to the official Datadog documentation and adjust it to suit your environment.

sql

# 탐지 룰 1: 특정 Tool 호출 빈도 급증
# 사용자별 Tool Call이 기준 편차 3σ를 초과할 때 경보
@event.category:mcp.tool_call
| anomaly(count, direction:above, threshold:3)
  by usr.id, mcp.tool.name
  over 5m
 
# 탐지 룰 2: 연속 인증 실패 후 Tool 호출 시도
# 같은 세션에서 401 이후 5분 내 tools/call 탐지
@event.category:mcp.tool_call AND @http.status:401
| sequence by mcp.session.id
  [within 5m] @http.status:401
  [within 5m] @event.category:mcp.tool_call
 
# 탐지 룰 3: 비정상 시간대 고빈도 호출 (야간 자동화 탐지)
@event.category:mcp.tool_call
| @timestamp:[23:00 TO 06:00]
| count > 100 by usr.id over 10m

Pros and Cons Analysis

Advantages

Item	Content
Standardized Measurement	OTel Semantic Conventions Support MCP, Enabling Measurement Without Vendor Lock-in
Reuse existing infrastructure	Connect existing monitoring stacks such as Prometheus, Grafana, and Datadog as is
Rich Context	Log Tool Name, Arguments, Token Usage, and Latency in a Single Span
Security and Observability Integration	Handle SIEM Anomaly Detection and Performance Monitoring in the Same Pipeline
Agent Workflow Visibility	Track multi-step agent executions with end-to-end traces

Disadvantages and Precautions

Item	Content	Response Plan
Context propagation non-standard	JSON-RPC-based MCP does not natively support W3C Trace Context	Implemented by injecting `traceparent` into `params._meta`
Argument sensitivity issue	Tool arguments may contain PII and secrets	Apply masking policy with OTel Collector's transform processor
Cardinality Explosion	Time Series Count Surges When Using Fields with Many Unique Values as Prometheus Labels	Use Only Labels with Limited Value Types, Like `tool_name`, `status`
Ecosystem immature	OTel MCP semantic convention is in `experimental` state and automated instrumentation libraries are lacking	SDK version fixed, regular monitoring of emerging libraries such as OpenLIT and `opentelemetry-instrumentation-mcp`
Latency Overhead	10–30ms overhead may occur in high-frequency workloads when generating a span for every tool call	Mitigated by adjusting batch processor settings and sampling rate
Cost Attribution	Separate design required for token tracking to attribute costs by user/team	Aggregate `gen_ai.usage.input_tokens` attribute as Prometheus Counter

The Most Common Mistakes in Practice

Recording Tool arguments directly to Log Span: Passwords, API keys, and personal information are exposed in telemetry data. It is recommended to set rules to mask sensitive fields in the OTel Collector's transform processor.
Using high-cardinality values as Prometheus labels: Session IDs or user IDs with thousands or more are not suitable as Prometheus labels. It is recommended to record these identifiers in trace properties or log fields, and to use only Prometheus labels with restricted value types, such as tool_name and status.
Creating only Tool Call spans without Agent Root spans: If Tool Call spans are isolated, it is impossible to track which user request triggers the tools and in what order. An effective structure is to first open Agent Inference as a Root span and then connect the subsequent Tool Calls as Child spans.

In Conclusion

MCP Observability goes beyond simple log collection; it is an integrated pipeline that connects the entire flow from agent inference to tool execution into a single trace and detects abnormal patterns in real time. With all four stages in place, latency spikes, error explosions, and abnormal call patterns will not remain as black boxes.

Here are 3 steps you can start right now. Each step can be applied independently or stacked in sequence.

Add Prometheus Metrics: Expose the /metrics endpoint with pip install prometheus-fastapi-instrumentator and add the mcp_tool_invocation_duration_seconds Histogram to the Tool Call handler. If Prometheus and Grafana are already configured, you can complete the integration in one go.
OTel Collector Deployment and Trace Pipeline Configuration: Starting from otel-collector-config.yaml in Step 2, connect Grafana Tempo or Jaeger as the trace backend and apply context propagation through params._meta. At this stage, the entire flow of agent inference-tool execution is connected into a single trace.
Applying Anomaly Detection Rules: If you are using Datadog, you can configure rules in Cloud SIEM by referring to the detection concept in Step 4. If you are using an open-source stack, you can configure alerts for abnormal tool call frequencies using a combination of Grafana Alerts and Prometheus anomaly detection.

Once the observability pipeline is set up, the next task is to control the MCP requests themselves at the gateway layer. In the next post, we will look at how to deploy Kong AI Gateway 3.12 as an MCP proxy to handle OAuth 2.1 authentication, rate limiting, and team cost attribution in a single layer.

Next Post: We will cover how to deploy Kong AI Gateway 3.12 as a proxy on an MCP server to handle OAuth 2.1 authentication, rate limiting, and team cost attribution in a single layer.

Reference Materials

Official Document

Engineering Blog

Community Guide

The Complete Guide to MCP Server Observability: From Prometheus Metrics and Distributed Trace to Anomaly Detection | DEV BAK - 기술블로그

The Complete Guide to MCP Server Observability: From Prometheus Metrics and Distributed Trace to Anomaly Detection

Key Concepts

MCP and the 3 Key Elements of Observability

In an MCP environment, observability consists of three axes.

Element	Role	Representative Tool
Structured Logs	Logs of Tool Call Requests/Responses and Error Causes	Loki, Datadog Logs
Metrics	Time series aggregation of latency, calls, and error rates	Prometheus, Grafana Mimir
Distributed Traces	Linking Agent Inference → Tool Execution Flow	Jaeger, Grafana Tempo

The overall pipeline structure integrating the three elements is as follows.

MCP Client (에이전트)
    │ tools/call + _meta.traceparent (W3C Trace Context)
    ▼
MCP Server (Python/FastAPI)
    │ OTLP gRPC (트레이스·메트릭·로그)      │ 구조화 JSON 로그
    ▼                                         ▼
OTel Collector                           Datadog Cloud SIEM
    │                                     (이상 탐지 룰)
    ├──▶ Prometheus (메트릭)
    │         │
    │         └──▶ Grafana (대시보드·알람)
    │
    ├──▶ Grafana Tempo (분산 추적)
    │
    └──▶ Loki (로그)

OpenTelemetry MCP Semantic Convention

Span names follow the following format.

{mcp.method.name} {target}
예: tools/call weather_tool

Here are examples of key Span attributes.

Attribute Key	Meaning	Example Value
`mcp.method.name`	Called MCP method	`tools/call`
`mcp.tool.name`	Name of Executed Tool	`get_weather`
`mcp.session.id`	Session identifier (used only for trace properties)	`sess_abc123`
`gen_ai.usage.input_tokens`	Number of input tokens	`412`
`error.type`	Error Classification	`timeout`

Distributed Trace Context Propagation

The fact that MCP is based on JSON-RPC 2.0 creates one important limitation.

Since native W3C Trace Context propagation via HTTP headers is not supported, we use a method of injecting traceparent and tracestate into the params._meta property bag.

json

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "get_weather",
    "arguments": { "city": "Seoul" },
    "_meta": {
      "traceparent": "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01",
      "tracestate": "vendor=myapp"
    }
  },
  "id": 1
}

On the server side, this value is parsed and restored to the OTel context.

python

from opentelemetry.propagate import extract
from opentelemetry.trace import get_tracer
from typing import Any, TypedDict
 
tracer = get_tracer("mcp-server")
 
 
class McpParams(TypedDict, total=False):
    name: str
    arguments: dict[str, Any]
    _meta: dict[str, str]
 
 
class McpRequest(TypedDict):
    jsonrpc: str
    method: str
    params: McpParams
    id: int | str
 
 
def execute_tool(params: McpParams) -> Any:
    """실제 Tool 실행 로직으로 대체하세요."""
    raise NotImplementedError(f"Tool '{params.get('name')}' not implemented")
 
 
def handle_tool_call(request: McpRequest) -> Any:
    """_meta에서 W3C Trace Context를 추출해 OTel 스팬과 연결합니다."""
    meta = request.get("params", {}).get("_meta", {})
    carrier = {
        "traceparent": meta.get("traceparent", ""),
        "tracestate": meta.get("tracestate", ""),
    }
    ctx = extract(carrier)
    tool_name = request["params"].get("name", "unknown")
 
    with tracer.start_as_current_span(
        f"tools/call {tool_name}",
        context=ctx,
        attributes={
            "mcp.method.name": "tools/call",
            "mcp.tool.name": tool_name,
        },
    ):
        return execute_tool(request["params"])

Practical Application

Step 1: Instrumenting Prometheus Metrics on a Python MCP Server

This is an example of collecting latency and error rates per tool by combining prometheus-fastapi-instrumentator with a FastAPI-based MCP server.

python

from fastapi import FastAPI
from prometheus_fastapi_instrumentator import Instrumentator
from prometheus_client import Histogram, Counter
from typing import Any, TypedDict
import time
 
app = FastAPI()
 
 
class ToolCallRequest(TypedDict):
    params: dict[str, Any]
 
 
# MCP 전용 커스텀 메트릭 정의
# 주의: 낮은 카디널리티 레이블(tool_name, status)만 사용합니다.
# user_id처럼 고유 값이 많은 필드는 Prometheus 레이블이 아닌
# 트레이스 속성이나 로그 필드로 기록하세요.
tool_duration = Histogram(
    "mcp_tool_invocation_duration_seconds",
    "MCP Tool Call 실행 시간",
    labelnames=["tool_name", "status"],
    buckets=[0.01, 0.05, 0.1, 0.5, 1.0, 5.0],
)
 
tool_errors = Counter(
    "mcp_tool_errors_total",
    "MCP Tool Call 에러 총 횟수",
    labelnames=["tool_name", "error_type"],
)
 
tool_calls = Counter(
    "mcp_tool_calls_total",
    "MCP Tool Call 총 호출 수",
    labelnames=["tool_name"],
)
 
# FastAPI 기본 메트릭 자동 계측
Instrumentator().instrument(app).expose(app)
 
 
async def execute_tool(tool_name: str, arguments: dict[str, Any]) -> Any:
    """실제 Tool 실행 로직으로 대체하세요."""
    raise NotImplementedError(f"Tool '{tool_name}' not implemented")
 
 
@app.post("/mcp/tools/call")
async def call_tool(request: ToolCallRequest) -> Any:
    tool_name = request["params"].get("name", "unknown")
    tool_calls.labels(tool_name=tool_name).inc()
 
    start = time.time()
    try:
        result = await execute_tool(
            tool_name, request["params"].get("arguments", {})
        )
        duration = time.time() - start
        tool_duration.labels(tool_name=tool_name, status="success").observe(duration)
        return result
    except Exception as e:
        duration = time.time() - start
        tool_duration.labels(tool_name=tool_name, status="error").observe(duration)
        tool_errors.labels(tool_name=tool_name, error_type=type(e).__name__).inc()
        raise

When Prometheus scrapes metrics exposed from the /metrics endpoint, you can query latency in Grafana using PromQL as shown below.

# Tool별 p95 레이턴시
histogram_quantile(
  0.95,
  rate(mcp_tool_invocation_duration_seconds_bucket[5m])
) by (tool_name)
 
# 분당 에러율
rate(mcp_tool_errors_total[1m]) by (tool_name, error_type)
 
# Tool별 호출량 추이
rate(mcp_tool_calls_total[10m]) by (tool_name)

Step 2: Configuring an Integrated Pipeline with OpenTelemetry Collector

By using OTel Collector as a central hub, you can collect metrics, traces, and logs generated by MCP servers into a single pipeline and route them to multiple backends.

yaml

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  # MCP 서버 Prometheus /metrics 스크랩
  prometheus:
    config:
      scrape_configs:
        - job_name: "mcp-server"
          scrape_interval: 15s
          static_configs:
            - targets: ["mcp-server:8000"]
 
processors:
  # PII 마스킹 — Tool 인자에서 민감 정보 제거
  # replace_pattern()은 OTTL(OpenTelemetry Transformation Language) 문법입니다.
  # 아래 정규식의 YAML 이스케이프 처리는 실제 환경에서 반드시 검증하세요.
  transform/mask_pii:
    error_mode: ignore
    trace_statements:
      - context: span
        statements:
          - replace_pattern(attributes["mcp.tool.arguments"],
              "\"password\"\\s*:\\s*\"[^\"]+\"",
              "\"password\": \"[REDACTED]\"")
 
  # 배치 처리로 오버헤드 감소
  batch:
    send_batch_size: 1000
    timeout: 10s
 
  # 메모리 제한
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
 
exporters:
  # Grafana Tempo로 트레이스 전송
  otlp/tempo:
    endpoint: tempo:4317
    tls:
      insecure: true
 
  # Prometheus Remote Write로 메트릭 전송
  prometheusremotewrite:
    endpoint: "http://prometheus:9090/api/v1/write"
 
  # Loki로 로그 전송
  loki:
    endpoint: http://loki:3100/loki/api/v1/push
 
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [transform/mask_pii, batch]
      exporters: [otlp/tempo]
    metrics:
      receivers: [otlp, prometheus]
      processors: [batch, memory_limiter]
      exporters: [prometheusremotewrite]
    logs:
      receivers: [otlp]
      processors: [transform/mask_pii, batch]
      exporters: [loki]

Step 3: Connecting Agent Inference and Tool Calls with End-to-End Trace

This is an example of configuring a parent-child trace structure in TypeScript that connects the Agent Inference span to the MCP Tool Call span.

typescript

import { trace, context, SpanStatusCode } from "@opentelemetry/api";
import { W3CTraceContextPropagator } from "@opentelemetry/core";
 
const tracer = trace.getTracer("mcp-agent", "1.0.0");
const propagator = new W3CTraceContextPropagator();
 
interface ToolPlan {
  name: string;
  sessionId: string;
  args: Record<string, unknown>;
}
 
// Step 1에서 구성한 MCP 서버의 /mcp/tools/call 엔드포인트를 호출합니다.
async function callMcpServer(request: unknown): Promise<unknown> {
  // 예: fetch("http://mcp-server:8000/mcp/tools/call", { body: JSON.stringify(request) })
  throw new Error("callMcpServer: 실제 HTTP 요청 로직으로 대체하세요");
}
 
async function planToolCalls(userQuery: string): Promise<ToolPlan[]> {
  // 실제 LLM 추론 로직으로 대체하세요
  return [];
}
 
async function runAgentWithObservability(userQuery: string): Promise<void> {
  // 에이전트 추론 스팬 시작 (루트)
  return await tracer.startActiveSpan(
    "gen_ai.agent reasoning",
    {
      attributes: {
        "gen_ai.system": "anthropic",
        "gen_ai.request.model": "claude-sonnet-4-6",
        "user.query": userQuery,
      },
    },
    async (agentSpan) => {
      try {
        const toolsToCall = await planToolCalls(userQuery);
 
        for (const tool of toolsToCall) {
          // Tool Call 스팬을 에이전트 스팬의 자식으로 생성
          await tracer.startActiveSpan(
            `tools/call ${tool.name}`,
            {
              attributes: {
                "mcp.method.name": "tools/call",
                "mcp.tool.name": tool.name,
                "mcp.session.id": tool.sessionId,
              },
            },
            async (toolSpan) => {
              try {
                // _meta에 traceparent 주입 (W3C Trace Context 전파)
                const carrier: Record<string, string> = {};
                propagator.inject(context.active(), carrier);
 
                const mcpRequest = {
                  jsonrpc: "2.0",
                  method: "tools/call",
                  params: {
                    name: tool.name,
                    arguments: tool.args,
                    _meta: {
                      traceparent: carrier["traceparent"] ?? "",
                      tracestate: carrier["tracestate"] ?? "",
                    },
                  },
                  id: crypto.randomUUID(), // Node.js 15+ 필요
                };
 
                const result = await callMcpServer(mcpRequest);
                toolSpan.setStatus({ code: SpanStatusCode.OK });
                return result;
              } catch (err) {
                toolSpan.setStatus({
                  code: SpanStatusCode.ERROR,
                  message: String(err),
                });
                toolSpan.recordException(err as Error);
                throw err;
              } finally {
                toolSpan.end();
              }
            }
          );
        }
 
        agentSpan.setStatus({ code: SpanStatusCode.OK });
      } finally {
        agentSpan.end();
      }
    }
  );
}

The trace generated with this structure is visualized in Grafana Tempo or Jaeger as a hierarchical structure as shown below.

gen_ai.agent reasoning (350ms)
├── tools/call get_weather (45ms)
├── tools/call query_database (180ms)
│   └── db.query SELECT ... (120ms)
└── tools/call send_report (80ms)

Step 4: Detecting Tool Call Anomalies with Datadog Cloud SIEM

This is an example of collecting structured logs from an MCP server into Datadog and applying anomaly detection rules.

python

import logging
import json
from datetime import datetime, timezone
from typing import Optional
 
logger = logging.getLogger("mcp.security")
 
 
def log_tool_call(
    tool_name: str,
    user_id: str,
    session_id: str,
    status: str,
    duration_ms: float,
    error: Optional[str] = None,
) -> None:
    """Datadog SIEM이 파싱할 수 있는 구조화 로그를 출력합니다."""
    log_entry: dict = {
        "timestamp": datetime.now(timezone.utc).isoformat(),
        "event.category": "mcp.tool_call",
        "mcp.tool.name": tool_name,
        "usr.id": user_id,            # Datadog 표준 사용자 속성
        "mcp.session.id": session_id,
        "http.status": 200 if status == "success" else 500,
        "duration_ms": duration_ms,
        "status": status,
    }
    if error:
        log_entry["error.message"] = error
        log_entry["error.kind"] = "MCPToolError"
 
    logger.info(json.dumps(log_entry))

sql

# 탐지 룰 1: 특정 Tool 호출 빈도 급증
# 사용자별 Tool Call이 기준 편차 3σ를 초과할 때 경보
@event.category:mcp.tool_call
| anomaly(count, direction:above, threshold:3)
  by usr.id, mcp.tool.name
  over 5m
 
# 탐지 룰 2: 연속 인증 실패 후 Tool 호출 시도
# 같은 세션에서 401 이후 5분 내 tools/call 탐지
@event.category:mcp.tool_call AND @http.status:401
| sequence by mcp.session.id
  [within 5m] @http.status:401
  [within 5m] @event.category:mcp.tool_call
 
# 탐지 룰 3: 비정상 시간대 고빈도 호출 (야간 자동화 탐지)
@event.category:mcp.tool_call
| @timestamp:[23:00 TO 06:00]
| count > 100 by usr.id over 10m

Pros and Cons Analysis

Advantages

Item	Content
Standardized Measurement	OTel Semantic Conventions Support MCP, Enabling Measurement Without Vendor Lock-in
Reuse existing infrastructure	Connect existing monitoring stacks such as Prometheus, Grafana, and Datadog as is
Rich Context	Log Tool Name, Arguments, Token Usage, and Latency in a Single Span
Security and Observability Integration	Handle SIEM Anomaly Detection and Performance Monitoring in the Same Pipeline
Agent Workflow Visibility	Track multi-step agent executions with end-to-end traces

Disadvantages and Precautions

Item	Content	Response Plan
Context propagation non-standard	JSON-RPC-based MCP does not natively support W3C Trace Context	Implemented by injecting `traceparent` into `params._meta`
Argument sensitivity issue	Tool arguments may contain PII and secrets	Apply masking policy with OTel Collector's transform processor
Cardinality Explosion	Time Series Count Surges When Using Fields with Many Unique Values as Prometheus Labels	Use Only Labels with Limited Value Types, Like `tool_name`, `status`
Ecosystem immature	OTel MCP semantic convention is in `experimental` state and automated instrumentation libraries are lacking	SDK version fixed, regular monitoring of emerging libraries such as OpenLIT and `opentelemetry-instrumentation-mcp`
Latency Overhead	10–30ms overhead may occur in high-frequency workloads when generating a span for every tool call	Mitigated by adjusting batch processor settings and sampling rate
Cost Attribution	Separate design required for token tracking to attribute costs by user/team	Aggregate `gen_ai.usage.input_tokens` attribute as Prometheus Counter

The Most Common Mistakes in Practice

Recording Tool arguments directly to Log Span: Passwords, API keys, and personal information are exposed in telemetry data. It is recommended to set rules to mask sensitive fields in the OTel Collector's transform processor.
Using high-cardinality values as Prometheus labels: Session IDs or user IDs with thousands or more are not suitable as Prometheus labels. It is recommended to record these identifiers in trace properties or log fields, and to use only Prometheus labels with restricted value types, such as tool_name and status.
Creating only Tool Call spans without Agent Root spans: If Tool Call spans are isolated, it is impossible to track which user request triggers the tools and in what order. An effective structure is to first open Agent Inference as a Root span and then connect the subsequent Tool Calls as Child spans.

In Conclusion

Here are 3 steps you can start right now. Each step can be applied independently or stacked in sequence.

Add Prometheus Metrics: Expose the /metrics endpoint with pip install prometheus-fastapi-instrumentator and add the mcp_tool_invocation_duration_seconds Histogram to the Tool Call handler. If Prometheus and Grafana are already configured, you can complete the integration in one go.
OTel Collector Deployment and Trace Pipeline Configuration: Starting from otel-collector-config.yaml in Step 2, connect Grafana Tempo or Jaeger as the trace backend and apply context propagation through params._meta. At this stage, the entire flow of agent inference-tool execution is connected into a single trace.
Applying Anomaly Detection Rules: If you are using Datadog, you can configure rules in Cloud SIEM by referring to the detection concept in Step 4. If you are using an open-source stack, you can configure alerts for abnormal tool call frequencies using a combination of Grafana Alerts and Prometheus anomaly detection.

Next Post: We will cover how to deploy Kong AI Gateway 3.12 as a proxy on an MCP server to handle OAuth 2.1 authentication, rate limiting, and team cost attribution in a single layer.

Reference Materials

Official Document

Engineering Blog

Community Guide

Key Concepts

MCP and the 3 Key Elements of Observability

OpenTelemetry MCP Semantic Convention

Distributed Trace Context Propagation

Practical Application

Step 1: Instrumenting Prometheus Metrics on a Python MCP Server

Step 2: Configuring an Integrated Pipeline with OpenTelemetry Collector

Step 3: Connecting Agent Inference and Tool Calls with End-to-End Trace

Step 4: Detecting Tool Call Anomalies with Datadog Cloud SIEM

Pros and Cons Analysis

Advantages

Disadvantages and Precautions

The Most Common Mistakes in Practice

In Conclusion

Reference Materials

Key Concepts

MCP and the 3 Key Elements of Observability

OpenTelemetry MCP Semantic Convention

Distributed Trace Context Propagation

Practical Application

Step 1: Instrumenting Prometheus Metrics on a Python MCP Server

Step 2: Configuring an Integrated Pipeline with OpenTelemetry Collector

Step 3: Connecting Agent Inference and Tool Calls with End-to-End Trace

Step 4: Detecting Tool Call Anomalies with Datadog Cloud SIEM

Pros and Cons Analysis

Advantages

Disadvantages and Precautions

The Most Common Mistakes in Practice

In Conclusion

Reference Materials

Recommended Posts

Applying OAuth 2.1 Authentication, Token Rate Limiting, and Team Cost Attribution to MCP Servers with Kong AI Gateway 3.12 Without Code Modification

AI Agent Security Monitored at the Kernel — In-depth Analysis of eBPF-Based Runtime Governance Architecture

AI Agent Security in Code: A Practical Guide to Defending Against Target Hijacking, Memory Poisoning, and Cascading Failures

Deploying an MCP Server with Streamable HTTP and OAuth 2.1 — From Multi-User Environments to Azure AD Integration

MCP Multi-tenant Security: Structurally Blocking Inter-tenant Data Leaks with Cloudflare Durable Objects

Guide to Building an Enterprise Model Context Protocol Server Securely Shared by the Entire Team: Practical Implementation of Streamable HTTP and OAuth 2.1