Deploying an MCP Server with Streamable HTTP and OAuth 2.1 — From Multi-User Environments to Azure AD Integration
We have entered an era where AI agents are moving beyond simple demos and being integrated into actual business systems. While stdio transport was sufficient for a local environment for an individual developer, the situation changes in enterprise environments where dozens of team members utilize Claude or Copilot simultaneously. A true MCP server that the entire team can use reliably must simultaneously solve these three problems: state management, authentication, and horizontal scaling.
After reading this article, you will be able to deploy an MCP server with dozens of team members connected simultaneously, along with authentication integration, right today. We will explore TypeScript and Python SDK code examples, practical Azure AD integration, and common pitfalls encountered in the field. In this article, "production ready" means three things: stateless horizontal scaling, OAuth 2.1-based enterprise authentication, and compatibility with existing HTTP infrastructure.
Key Concepts
What is MCP and why use it instead of REST API?
MCP (Model Context Protocol) is an open protocol led by Anthropic that defines a standardized method of communication between AI models and external tools and data sources. The MCP server exposes tools, resources, and prompts that AI clients (such as Claude Desktop, VS Code Copilot, etc.) can call.
You can provide tools to AI agents using existing methods, such as directly calling REST APIs or creating platform-specific plugins. However, this approach has the problem that if Claude Desktop and VS Code Copilot want to use the same tools, different integration code must be written for each. MCP solves this integration problem all at once. Once you implement the MCP server, the same tools can be used by all AI clients that support MCP.
Transport Layer: From SSE to Streamable HTTP
The transport layer determines how MCP messages actually travel. Previously, two types of transport were used.
- stdio: Optimized for communication between local processes. Suitable for personal developer environments.
- SSE (Server-Sent Events): Although intended for remote deployment, it had structural complexity requiring separate management of dedicated server-to-client and client-to-server POST channels, and there were also issues with conflicts with load balancers.
As of the March 26, 2025 specification, SSE became officially Deprecated and Streamable HTTP took its place.
SSE (Server-Sent Events): A technology where a server sends a unidirectional event stream to a client over HTTP. In MCP, this unidirectional limitation required a separate POST channel, and this dual-channel structure was the root cause of conflicts with load balancers.
How Streamable HTTP Transport Works
Streamable HTTP handles both POST and GET requests on a single HTTP endpoint (/mcp). When a client sends a request, the server may return a simple JSON response or switch to a streaming response in SSE format, depending on the situation.
클라이언트 MCP 서버
| |
|--- POST /mcp (initialize) ------>|
|<-- 200 OK (JSON or SSE) ---------|
| |
|--- POST /mcp (tools/call) ------>|
|<-- 200 OK + SSE stream ----------| ← 긴 작업의 경우
| data: {"progress": 50} |
| data: {"result": "완료"} |
| |
|--- GET /mcp (listen) ----------->| ← 서버 푸시 필요 시
|<-- SSE stream (server-initiated)-|The key features can be summarized as follows.
| Features | Description |
|---|---|
| Single Endpoint | /mcp Handle all communications as one |
| Optional Stateless Operation | Choosing the stateless design is advantageous for horizontal expansion (see below) |
| Optional Streaming | Simple requests automatically converted to JSON, complex tasks to SSE |
| Compatible with existing infrastructure | Directly integrates with standard HTTP load balancers, proxies, and API gateways |
| Resumable Stream | Stream can be resumed based on EventID after network disconnection |
Stateless is optional. The Streamable HTTP protocol also supports stateful sessions. Setting sessionIdGenerator maintains the session, while setting it to undefined enables stateless mode. The code examples in this article are written based on stateless mode, which is advantageous for horizontal scaling.
Resumable Stream: Each SSE event includes the id field (EventID). When a disconnected client reconnects, it sends the last received EventID to the server with the Last-Event-ID header, and the server resends only subsequent events.
OAuth 2.1 Enterprise Authentication Architecture
Differences between OAuth 2.1 and 2.0: OAuth 2.1 is a version that eliminates the security-vulnerable flows of OAuth 2.0. Implicit Flow and Password Grants have been abolished, and PKCE has been made mandatory for all clients. If you are using an existing 2.0 implementation, these two are the most significant changes.
OAuth 2.1 on an MCP server features a clear separation of roles. The MCP server acts solely as a Resource Server. In other words, it validates Bearer tokens and processes requests, but does not issue tokens directly. Token issuance is handled by external Authorization Servers such as Azure AD, Okta, and Auth0.
MCP 클라이언트 인가 서버 MCP 서버
(Claude Desktop 등) (Azure AD/Okta) (내 서버)
| | |
|-- 1. /.well-known/oauth-protected-resource ----->|
|<-- 2. 인가 서버 위치 반환 -----------------------|
| | |
|-- 3. PKCE 코드 요청 ---->| |
|<-- 4. 인가 코드 반환 ----| |
|-- 5. 토큰 교환 --------->| |
|<-- 6. Access Token 발급--| |
| | |
|-- 7. POST /mcp (Bearer Token) ----------------->|
| | 8. JWKS로 토큰 검증|
|<-------------------------------- 9. 응답 --------|There are three key elements of security.
PKCE (Proof Key for Code Exchange): Prevents authorization code interception attacks. The client generates a random value (code_verifier) and transmits its hash (code_challenge) to the authorization server, which then proves the original value during token exchange. The MCP specification recommends the S256 hash method.
RFC 8707 Resource Indicators: When requesting a token, specify the target MCP server URI as the resource parameter. Since tokens issued in this way are valid only on that server, this prevents stolen tokens from being reused for other services.
Protected Resource Metadata: If the MCP server exposes the /.well-known/oauth-protected-resource endpoint, it can automatically determine which authorized server clients should request a token from. Automatic discovery is enabled without manual configuration.
Practical Application
The code examples cover three scenarios. Please refer to the criteria below to determine which example suits your situation.
| Example | Suitable Situation |
|---|---|
| Example 1 (TypeScript) | Node.js-based team, leveraging existing Express infrastructure, when you want to implement OAuth verification directly at the code level |
| Example 2 (Python) | Python-based team, FastAPI/Starlette ecosystem, when rapid prototyping is needed |
| Example 3 (Fully Integrated with Azure) | When using an Azure environment and wanting to delegate authentication logic to infrastructure and focus solely on business logic |
Example 1: Building a Stateless Streamable HTTP Server with TypeScript SDK
Starting with TypeScript SDK v1.10.0, StreamableHTTPServerTransport is built-in. Enabling stateless mode with sessionIdGenerator: undefined prevents session state from being retained in server memory, allowing any instance behind a load balancer to handle requests.
import express from "express";
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import { createRemoteJWKSet, jwtVerify, type JWTPayload } from "jose";
const TENANT_ID = process.env.AZURE_TENANT_ID!;
// 모듈 초기화 시 한 번만 JWKS 클라이언트 생성 (요청마다 생성하면 불필요한 오버헤드 발생)
const JWKS = createRemoteJWKSet(
new URL(`https://login.microsoftonline.com/${TENANT_ID}/discovery/v2.0/keys`)
);
// express.Request를 확장한 타입으로 req.user 타입 안전성 확보
type AuthedRequest = express.Request & { user: JWTPayload };
const app = express();
app.use(express.json());
// OAuth 2.1 Protected Resource Metadata 엔드포인트
app.get("/.well-known/oauth-protected-resource", (req, res) => {
res.json({
resource: "https://mcp.example.com",
authorization_servers: [
`https://login.microsoftonline.com/${TENANT_ID}/v2.0`,
],
bearer_methods_supported: ["header"],
});
});
// Bearer 토큰 검증 미들웨어
async function verifyToken(
req: express.Request,
res: express.Response,
next: express.NextFunction
) {
const authHeader = req.headers.authorization;
if (!authHeader?.startsWith("Bearer ")) {
res.status(401).json({ error: "unauthorized" });
return;
}
const token = authHeader.slice(7);
try {
const { payload } = await jwtVerify(token, JWKS, {
issuer: `https://login.microsoftonline.com/${TENANT_ID}/v2.0`,
audience: "https://mcp.example.com", // RFC 8707: aud 클레임 검증
});
(req as AuthedRequest).user = payload;
next();
} catch (err) {
res.status(401).json({ error: "invalid_token" });
}
}
// 무상태 MCP 엔드포인트
app.all("/mcp", verifyToken, async (req, res) => {
const server = new McpServer({ name: "enterprise-mcp", version: "1.0.0" });
server.tool("get_user_info", "현재 사용자 정보 반환", {}, async () => {
const user = (req as AuthedRequest).user;
return {
content: [
{ type: "text", text: `사용자: ${user.name} (${user.email})` },
],
};
});
const transport = new StreamableHTTPServerTransport({
sessionIdGenerator: undefined, // 무상태: 세션 ID 없음
});
await server.connect(transport);
await transport.handleRequest(req, res, req.body);
});
app.listen(3000, () => console.log("MCP 서버 실행 중: http://localhost:3000"));| Code Point | Role |
|---|---|
Top of file createRemoteJWKSet |
Executes only once when module is loaded. If placed inside middleware, reference cost is incurred for every request |
type AuthedRequest |
(req as any) Ensure type safety with explicit types instead |
process.env.AZURE_TENANT_ID |
Environment variables instead of hardcoding. Set in the .env file in the form of AZURE_TENANT_ID=<값> |
sessionIdGenerator: undefined |
Stateless mode enabled. Each request is processed independently |
audience: "https://mcp.example.com" |
RFC 8707 Compliant: aud Validate token target server with claim |
Example 2: Building a Stateless Server with Python SDK
In the Python SDK, you can enable stateless mode with a single stateless_http=True option.
import asyncio
import os
import time
import httpx
from jose import jwt, JWTError
from mcp.server.fastmcp import FastMCP
TENANT_ID = os.environ["AZURE_TENANT_ID"]
# JWKS 캐싱 (1시간 TTL)
# 비동기 환경에서 동시 요청이 몰릴 경우 중복 갱신을 방지하기 위해
# asyncio.Lock으로 임계 구역을 보호하는 것을 권장합니다.
_jwks_cache: dict = {}
_jwks_lock = asyncio.Lock()
async def verify_token(token: str) -> dict:
"""Azure AD JWKS로 Bearer 토큰 검증"""
cache_key = "azure_jwks"
async with _jwks_lock:
if cache_key not in _jwks_cache or time.time() > _jwks_cache[cache_key]["expires"]:
async with httpx.AsyncClient() as client:
resp = await client.get(
f"https://login.microsoftonline.com/{TENANT_ID}/discovery/v2.0/keys"
)
_jwks_cache[cache_key] = {
"keys": resp.json(),
"expires": time.time() + 3600, # 1시간 캐싱
}
try:
payload = jwt.decode(
token,
_jwks_cache[cache_key]["keys"],
algorithms=["RS256"],
audience="https://mcp.example.com", # RFC 8707 검증
issuer=f"https://login.microsoftonline.com/{TENANT_ID}/v2.0",
)
return payload
except JWTError as e:
raise ValueError(f"토큰 검증 실패: {e}")
# FastMCP로 서버 정의
mcp = FastMCP("enterprise-mcp")
@mcp.tool()
async def search_documents(query: str) -> str:
"""기업 문서 검색 도구"""
# 실제 구현에서는 검색 서비스 호출
return f"'{query}'에 대한 검색 결과: 3건"
@mcp.tool()
async def get_employee_info(employee_id: str) -> str:
"""직원 정보 조회 (HR 시스템 연동)"""
return f"직원 ID {employee_id}: 홍길동 / 개발팀"
# 무상태 Streamable HTTP 앱으로 변환
# stateless_http=True: 세션 상태 없음, json_response=True: 스트리밍 불필요 시 JSON 반환
app = mcp.streamable_http_app(stateless_http=True, json_response=True)
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)Example 3: Azure AD + Azure Container Apps Fully Integrated Configuration
The Microsoft ISE (Industry Solutions Engineering) reference pattern combines Azure Container Apps (ACA) Easy Auth with Managed Identity to minimize authentication logic at the code level. If you are not in an Azure environment, you can implement the same pattern as Managed Identity using AWS IAM Roles or GCP Workload identities.
// azure-mcp-server/src/server.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import { DefaultAzureCredential } from "@azure/identity";
import { SecretClient } from "@azure/keyvault-secrets";
// Managed Identity로 Key Vault 접근 (클라이언트 시크릿 불필요)
const credential = new DefaultAzureCredential();
const kvClient = new SecretClient(
"https://my-keyvault.vault.azure.net",
credential
);
export function createMcpServer() {
const server = new McpServer({
name: "azure-enterprise-mcp",
version: "1.0.0",
});
server.tool(
"get_secret",
"Key Vault에서 시크릿 조회",
{ secretName: { type: "string", description: "시크릿 이름" } },
async ({ secretName }) => {
// ACA Easy Auth가 Bearer 토큰을 이미 검증했으므로
// 여기서는 비즈니스 로직에만 집중할 수 있습니다
const secret = await kvClient.getSecret(secretName);
return {
content: [{ type: "text", text: `값: ${secret.value}` }],
};
}
);
return server;
}# Azure Container Apps 배포 설정
# Microsoft.App/containerApps 리소스 properties 블록 일부
properties:
configuration:
ingress:
external: true
targetPort: 3000
template:
containers:
- name: mcp-server
image: myregistry.azurecr.io/enterprise-mcp:latest
env:
- name: AZURE_CLIENT_ID
value: "[managed-identity-client-id]" # 시크릿 없이 Managed Identity 사용
- name: AZURE_TENANT_ID
secretRef: azure-tenant-id| Components | Roles |
|---|---|
| Azure AD App Registration (Server) | Defining API Scopes for MCP Server |
| Azure AD App Registration (Client) | Request Token via PKCE Flow |
| ACA Easy Auth | Automatic Bearer Token Verification, No Code-Level Authentication Required |
| Managed Identity | Authenticate without secrets when accessing downstream Azure services |
| JWKS 1-Hour Caching | Balance of Minimized Latency + Key Rolling Response |
Pros and Cons Analysis
Advantages
Streamable HTTP Transport
- You can utilize your existing HTTP infrastructure with a single
/mcpendpoint. - With a stateless design, you can freely increase and decrease instances.
- Scale-to-zero operation is possible on Lambda, Azure Functions, and Cloudflare Workers.
- Guarantees stream continuity based on EventID even in unstable network environments
- It integrates with standard load balancers, API gateways, and reverse proxies without additional configuration.
OAuth 2.1 Enterprise Integration
- You can reuse existing enterprise IdPs such as Azure AD, Okta, and Auth0.
- Defends against token theft and reuse attacks with the combination of PKCE + Resource Indicators.
- Suitable for managing multiple MCP servers with a single authorization server and implementing RBAC.
You can automate client configuration with the
/.well-knownendpoint.
Disadvantages and Precautions
| Item | Content | Response Plan |
|---|---|---|
| Stateful Session | Session externalization required until stateless transition | Store Mcp-Session-Id-based sessions in Redis or DynamoDB |
| Load Balancer Conflicts | Potential Conflicts Between Stateful Sessions and Round-Robin Load Balancing | Temporarily Applying Fully Stateless Transition or NGINX Plus Session Affinity |
| Differences in SDK maturity | Support levels vary by language | TypeScript and Python are stable. Java (Spring AI) is also supported, but checking the ecosystem is recommended |
| Initial setup complexity | Requires 2 app registrations, scope design, and PKCE flow implementation | When utilizing Azure APIM, APIM can act as an OAuth proxy |
| Client Compatibility | Clients exist that do not support the resource parameter (RFC 8707) |
Currently, it can be handled as optional. Its importance is trending upward in the MCP roadmap. |
| Token Expiration Handling | Requires direct implementation of Refresh Token renewal logic | Utilize SDK-level token renewal middleware or design with short TTL + automatic reissuance |
Bearer Token: An access token delivered in the HTTP Authorization header in the format Bearer <token>. Since anyone possessing the token can use it, HTTPS enforcement and a short expiration time setting are essential.
JWKS (JSON Web Key Set): This is a set of public keys published by the authorization server. The MCP server uses this JWKS to verify the signature of the JWT token. While 1-hour caching is recommended for performance, it is also advisable to have logic in place to update it immediately during key rotation.
The Most Common Mistakes in Practice
audClaim Validation Skip: If theaudienceof the JWT token is not validated, tokens issued for other services may be processed as valid on the MCP server.audience: "https://mcp.example.com"validation must be included.- Mixing stateful sessions with stateless design: If you configure the load balancer as round-robin while setting up
sessionIdGenerator, sessions will be distributed to other instances, causing errors. It must be completely stateless (sessionIdGenerator: undefined) or session affinity must be guaranteed. - Remote key lookup on every request without JWKS caching: Fetching JWKS from the authorization server for every request causes latency to spike and may hit the rate limit. It is recommended to implement 1-hour TTL caching along with key rolling detection logic.
In Conclusion
By applying this architecture, you can simultaneously solve the three problems of authentication, deployment, and scalability using the standard methods of OAuth 2.1 and Streamable HTTP. As the importance of Resource Indicators (RFC 8707) is gradually increasing in the MCP roadmap and the standardization of a completely stateless session model is also being discussed, learning this pattern now can significantly reduce future migration costs.
3 Steps to Start Right Now:
- Try running the Streamable HTTP server locally. Install
@modelcontextprotocol/sdk,express, andjoseusing your preferred package manager (npm,pnpm, andyarnare all fine), paste the TypeScript example above, and start the server withts-node server.ts. If a response is returned withcurl -X POST http://localhost:3000/mcp -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","method":"initialize","id":1}', you have confirmed that the server is working correctly. - Proceed with registering an Azure AD app. Create two app registrations (one for the server and one for the client) in the Azure Portal, and add a custom scope such as
mcp.accessto the server app. If you populate theauthorization_serversfield of the/.well-known/oauth-protected-resourceresponse with the Entra ID endpoint, the structure is set up at this point so that the client can automatically discover the authorization server. - Deploy to Docker or Azure Container Apps. Build a server configured for stateless mode into a container, and deploy it to two or more instances. If the correct response is returned even when the same request is processed by different instances, you have personally verified that horizontal scaling is working.
Next Post: From MCP Server Tool Call Logs and User Usage Tracking to Anomaly Detection — Building an Enterprise MCP Observability Pipeline with Prometheus and OpenTelemetry
Reference Materials
- MCP Official Specifications - Transports (2025-03-26)
- MCP Official Specification - Authorization (Draft)
- MCP 블로그 - Exploring the Future of MCP Transports
- MCP Blog - The 2026 MCP Roadmap
- Why MCP Deprecated SSE and Went with Streamable HTTP | fka.dev
- Auth0 - Why MCP's Move Away from SSE Simplifies Security
- The New MCP Authorization Specification (OAuth 2.1 + RFC 8707) | dasroot.net
- Building a Secure MCP Server with OAuth 2.1 and Azure AD | Microsoft ISE Blog
- Securing MCP Servers in Production with Azure API Management | Medium
- Authentication and Authorization in MCP | Stack Overflow Blog
- MCP OAuth 2.1 Best Practices | osohq
- Stytch - MCP Authentication and Authorization Implementation Guide
- Aembit - MCP, OAuth 2.1, PKCE, and the Future of AI Authorization
- Cloudflare - Streamable HTTP MCP Servers (Python 지원)
- Cloudflare Agents Docs - MCP Authorization
- AWS Samples - Serverless MCP Servers | GitHub
- MCP TypeScript SDK | GitHub
- MCP Python SDK | GitHub
- Spring AI - Streamable HTTP MCP Server Starter
- RFC 8707 - Resource Indicators for OAuth 2.0
- Microsoft Learn - Configure MCP Server Authorization (Azure App Service)
- Stateless MCP Server Reference | GitHub (yigitkonur)