Applying OAuth 2.1 Authentication, Token Rate Limiting, and Team Cost Attribution to MCP Servers with Kong AI Gateway 3.12 Without Code Modification
If your team operates three MCP servers, it is highly likely that you are writing authentication logic three times by now. OAuth token validation code is duplicated across each MCP server responsible for creating GitHub issues, querying the internal inventory database, and sending Slack messages, and you may still be manually tallying how many tokens each team has consumed. As the number of agents increases, this problem becomes exponentially, rather than linearly, more complex.
Kong AI Gateway is a platform that extends the role of traditional API gateways (authentication, rate limiting, and routing) to handle AI/MCP traffic. With the addition of Kong version 3.12 in 2025, a dedicated set of plugins for MCP traffic was introduced, enabling OAuth 2.1-based authentication, token-based rate limiting, and team-based cost attribution to be handled declaratively at the gateway layer without adding code to each MCP server. This article examines specific methods for deploying Kong AI Gateway in front of MCP servers to handle authentication, rate limiting, and cost tracking in a single layer, as well as common pitfalls encountered in practice. After reading this article, you will understand the entire flow of applying OAuth 2.1 authentication and team quotas to existing MCP servers using Kong 3.12's YAML declarative configuration, and be able to apply this to your own environment.
Prerequisites: If you are already familiar with MCP and OAuth 2.1 concepts, you may skip directly to the Practical Applications section. If you are new to Kong, we recommend reading through the core concepts section first.
Key Concepts
Role Relationship of Three Technologies
Before diving into the detailed explanation, let's first examine the role each of the three technologies discussed in this article plays.
┌─────────────────────────────────────────────────────────┐
│ 전체 아키텍처 │
│ │
│ [Authorization Server] ←─────── 토큰 발급 │
│ Keycloak / Auth0 / Okta │
│ │ JWKS 공개키 제공 │
│ ▼ │
│ [Kong AI Gateway 3.12] ←─────── 토큰 검증·속도 제한 │
│ Resource Server 역할 │
│ │ 검증된 요청만 전달 │
│ ▼ │
│ [MCP 서버들] ←─────── 실제 도구 실행 │
│ GitHub MCP / 사내 API MCP 등 │
└─────────────────────────────────────────────────────────┘
MCP → AI 에이전트와 도구 서버 간의 표준 통신 프로토콜
OAuth 2.1 → "누가 어떤 권한으로 접근하는가"를 검증하는 인증 표준
Kong → 위 두 가지를 코드 수정 없이 게이트웨이 레이어에서 처리MCP: Standard Protocol Connecting AI Agents and Tools
MCP (Model Context Protocol) is an open protocol released by Anthropic in 2024 that standardizes the communication method between AI agents (MCP Clients) and external tools and data sources (MCP Servers). It uses JSON-RPC 2.0 as the transport protocol and enables calling file systems, databases, external APIs, etc., through a consistent "Tool" interface.
What is MCP? Just as USB-C connects various devices with a single connector, MCP acts as a "standard connector" between AI agents and tool servers. Even if different tools use different API formats, they can be called in the same way over MCP.
The MCP specification is maturing rapidly. Initially, there was no authentication mechanism, but OAuth support was added in March 2025, and the official specification was finalized in June 2025. In the MCP specification published on November 25, 2025, Asynchronous Tasks and Resource Indicators were enhanced, and OAuth 2.1 became the official authentication method.
There is one point to note. The initial MCP specification used SSE (Server-Sent Events) as the transport, but in the latest specification, the SSE-only transport has been deprecated and Streamable HTTP has been introduced. Since the examples in this article are based on SSE, it is recommended to check the transport settings separately in environments that follow the latest specifications.
OAuth 2.1: Modernized Authentication Standard
OAuth 2.1 (RFC 9700) is a specification that modernizes OAuth 2.0 by removing security vulnerabilities.
| Changes | OAuth 2.0 | OAuth 2.1 |
|---|---|---|
| PKCE | Optional | Required |
| Implicit Grant | Allow | Remove |
| Resource Indicators | Undefined | Formalized in RFC 8707 |
What is PKCE (Proof Key for Code Exchange, RFC 7636)? It is a mechanism to prevent authentication code interception attacks. The client generates a random code_verifier and first sends the hash value code_challenge to the authorization server for verification during code exchange.
In the MCP authentication structure, Kong does not play the role of issuing tokens.
What is a Resource Server? Kong serves as the Resource Server in the OAuth 2.1 architecture. The Authorization Server (Keycloak, Auth0, Okta, etc.) that actually issues tokens must be operated separately, while Kong performs only the role of verifying the validity of issued tokens and allowing or denying requests. It is important to confirm this beforehand, as the entire authentication flow will not function if only Kong is deployed without an Authorization Server.
MCP-Specific Plugin for Kong AI Gateway 3.12
Kong is an API Gateway platform that handles authentication, rate limiting, routing, and monitoring through a combination of plugins. In version 3.12, a set of plugins dedicated to MCP traffic was added.
| Plugins | Roles | Tiers |
|---|---|---|
ai-mcp-proxy |
MCP ↔ HTTP Bridge, Multi-Server Tool Aggregation | Enterprise |
ai-mcp-oauth2 |
OAuth 2.1 Token Validation, JWT → Virtual Consumer Mapping | Enterprise |
ai-rate-limiting-advanced |
Token-based Quota (Consumer · Team Unit) | Enterprise |
acl + MCP Tool ACL |
MCP Tool Name-Level Granular Access Control | Enterprise |
prometheus |
MCP Tool Call Metrics Collection (MCP Extension in 3.12) | Free |
Open Source Limitations: Core plugins such as ai-mcp-oauth2 and ai-rate-limiting-advanced are only available in Kong Enterprise (paid) or Konnect. Since these plugins are not included in the open source version, it is recommended that you check the license terms before implementation.
Practical Application
The four examples below are not independent scenarios but rather a structure built upon sequentially. It is easier to grasp the overall picture if you understand the process in the order of configuring the basic proxy in Example 1, adding team-based cost attribution in Example 2, converting the existing REST API into an MCP Tool in Example 3, and adding tool-level granular control in Example 4.
Example 1: Deploy Kong as a single proxy in front of the MCP server (Passthrough mode)
This is the most common deployment pattern. The existing MCP server is left as is, and Kong handles authentication, rate limiting, and monitoring in front of it.
AI 에이전트 (MCP Client)
│ (OAuth 2.1 Bearer Token 포함)
▼
Kong AI Gateway 3.12
├─ ai-mcp-oauth2 ← JWT 토큰 검증, sub → 가상 Consumer 매핑
├─ ai-rate-limiting-advanced ← 팀별 토큰 쿼터 적용
├─ ai-mcp-proxy ← MCP 트래픽 그대로 업스트림 전달
└─ prometheus ← Tool 호출 메트릭 수집
│
▼
MCP 서버 (GitHub MCP, 사내 API MCP 등)Below is an example of a YAML declarative configuration managed with the Kong deck CLI.
services:
- name: mcp-github
url: https://mcp.github.internal
routes:
- name: mcp-github-route
paths:
- /mcp/github
plugins:
- name: ai-mcp-oauth2
config:
issuer: https://auth.example.com # Authorization Server URL
credential_claim: sub # JWT의 sub 클레임 → 가상 Consumer ID
audience: https://mcp.example.com # Resource Indicator (RFC 8707)
- name: ai-rate-limiting-advanced
config:
limit_by: consumer
policy: local # ⚠️ 단일 인스턴스 환경에서만 적합
tokens_count_strategy: total_tokens # prompt + completion 토큰 합산
limits:
- limit: 100000
window_size: 3600 # 1시간당 10만 토큰 쿼터
- name: ai-mcp-proxy
config:
mode: passthrough-listener # MCP 트래픽 그대로 전달
- name: prometheus
config:
per_consumer: true # Consumer별 메트릭 분리 수집policy: local Caution: The policy: local in the example above aggregates quotas for only a single Kong instance. In a distributed environment running multiple Kong instances, quotas are aggregated independently per instance, which may lead to a problem where quotas are consumed up to N times the actual limit. For a multi-instance environment, it is recommended to switch to policy: redis and configure a Redis cluster alongside it.
To reflect this YAML in the actual Kong, use the deck CLI.
# 설정 적용
deck gateway sync kong.yaml
# 연결 확인
deck gateway ping| Setting Key | Description |
|---|---|
issuer |
URL of the Authorization Server that issued the JWT. Kong retrieves the public key from this server's JWKS endpoint to validate the token. |
credential_claim |
Specifies which claim in the JWT payload to use as the Consumer ID. sub, email, team, etc. can be used. |
tokens_count_strategy |
Rate Limiting standard token type. Select from total_tokens, prompt_tokens, completion_tokens |
window_size |
Quota initialization cycle (seconds). 3600 = 1 hour, 86400 = 1 day |
Example 2: Team Cost Attribution Pipeline
By mapping the team claim of the JWT token to a Consumer Group, you can track token consumption and costs at the team level.
# 1단계: Consumer Group 정의
consumer_groups:
- name: team-platform
- name: team-data
- name: team-frontend
# 2단계: Consumer Group별 Rate Limit 정책 (Override)
plugins:
- name: ai-rate-limiting-advanced
consumer_group: team-platform
config:
limits:
- limit: 500000 # 플랫폼 팀: 시간당 50만 토큰
window_size: 3600
- name: ai-rate-limiting-advanced
consumer_group: team-data
config:
limits:
- limit: 1000000 # 데이터 팀: 시간당 100만 토큰
window_size: 3600# 3단계: ai-mcp-oauth2에서 team 클레임을 Consumer Group으로 매핑
plugins:
- name: ai-mcp-oauth2
config:
issuer: https://auth.example.com
credential_claim: sub
consumer_group_claim: team # JWT의 team 클레임 → Consumer Group 자동 매핑
# 공식 문서에서 설정 키 이름을 반드시 확인하는 것을 권장합니다consumer_group_claim Verification Recommended: consumer_group_claim recommends checking the actual configuration key name and supported version in the official Kong ai-mcp-oauth2 plugin documentation. Key names may vary depending on the plugin version.
Once this pipeline is completed, the costs will be attributed to the next flow.
JWT 토큰 (team: "data") 수신
│
▼
ai-mcp-oauth2: team 클레임 → "team-data" Consumer Group으로 매핑
│
▼
ai-rate-limiting-advanced: "team-data" 그룹의 쿼터 차감
│
▼
prometheus 메트릭: team="team-data" 레이블로 토큰 소비량 기록
│
▼
Grafana 대시보드: 팀별 토큰 소비량 / 비용 시각화How is token consumption aggregated? Kong aggregates token usage (e.g., usage.total_tokens) by parsing it from the response bodies of upstream LLM providers (e.g., OpenAI, Anthropic). However, it is important to note that if the MCP server acts only as an intermediary for tools rather than directly calling the LLM, the "token aggregation point" is located at the LLM API layer, not the MCP layer. MCP tool execution costs (such as external API call costs) are not tracked using this method; therefore, it is more practical to aggregate these costs separately from the billing data of the relevant service and integrate them into FinOps reports.
Example 3: Converting REST API to MCP Tool Without Code Changes
This is a method to expose an existing RESTful API managed by Kong as an MCP Tool. If you enable http-to-mcp mode, a Tool schema is automatically generated based on the OpenAPI specification.
services:
- name: inventory-api
url: https://inventory.internal/api
plugins:
- name: ai-mcp-proxy
config:
mode: http-to-mcp # REST → MCP Tool 변환 모드
openapi_spec_url: https://inventory.internal/openapi.json
tool_name_prefix: inventory_ # Tool 이름 prefix (충돌 방지)openapi_spec_url Network Accessibility: The URL specified in openapi_spec_url must be directly accessible from the Kong instance. If using an external URL, it is recommended to check the official documentation, as the behavior may differ regarding whether Kong fetches the specification at startup or on a per-request basis. In an internal network environment, you can specify it in the Kubernetes FQDN format (https://inventory-svc.default.svc.cluster.local/openapi.json).
As a result of the conversion, the AI agent will call the inventory_get_product MCP Tool instead of GET /api/products/{id}. No modifications are required to the existing API code.
Example 4: Granular Access Control at the MCP Tool Level
You can restrict specific teams from calling dangerous tools (create_issue, delete_branch). This feature is an MCP Tool ACL pattern introduced in Kong 3.12, which can be found in the "Introducing MCP Tool ACLs" announcement on the official Kong blog.
plugins:
- name: acl
config:
allow:
- team-platform # 기본 접근 허용 그룹
- name: mcp-tool-acl # Tool 단위 세분화 제어
config:
rules:
- tool: create_issue
allow_groups:
- team-platform
- team-data
- tool: delete_branch
allow_groups:
- team-platform # 플랫폼 팀만 브랜치 삭제 가능
- tool: read_file
allow_groups:
- team-platform
- team-data
- team-frontend # 읽기 권한은 모든 팀에 허용What is MCP Tool ACL? Unlike the path-based access control of existing API Gateways, it is a pattern that defines permissions based on MCP Tool names (e.g., create_issue, read_file). It allows you to declaratively manage granular policies, such as "Read allowed, writes only for the senior team."
Pros and Cons Analysis
Advantages
| Item | Content |
|---|---|
| Single Control Plane | Handles authentication, rate limiting, cost tracking, and logging in a single layer. No need to redundantly implement security logic on each MCP server |
| Reuse Existing Kong Infrastructure | If your organization is already running Kong as an API Gateway, you can scale MCP traffic management without additional infrastructure |
| MCP Specification Compliance | Support for OAuth 2.1 Resource Server, PKCE, and Resource Indicators (RFC 8707) ensures compatibility with the latest MCP specifications |
| Grounded Tool ACL | Access control is possible at the MCP Tool name level rather than the API path level |
| FinOps Visibility | Identify AI costs by team with Prometheus metrics and Konnect Analytics at the tool call level |
| Kubernetes Friendly | Deploying Kong Data Plane and MCP servers in the same cluster enables internal communication via Kubernetes FQDN |
Disadvantages and Precautions
| Item | Content | Response Plan |
|---|---|---|
| Enterprise License Costs | Core plugins such as ai-rate-limiting-advanced and ai-mcp-oauth2 are enterprise-specific. Costs can exceed $50,000 per year for mid-sized companies. |
We recommend comparing features and costs with open source alternatives such as LiteLLM, IBM ContextForge, and Envoy AI Gateway before deciding on adoption. |
| Kong 3.12 or higher required | The AI MCP plugin is not supported on versions below 3.12 | We recommend thoroughly verifying the version upgrade in a staging environment before applying it to production |
| Stateful SSE Session | Session Affinity settings are required to properly proxy SSE-based long-term connections | You can configure the same session to be routed to the same instance using Kong's hash_on: consumer or hash_on: ip load balancing settings |
| Separate Authorization Server Required | Kong acts only as a Resource Server. It is a prerequisite to operate a separate IdP such as Keycloak, Auth0, or Okta. | Integration is easy if your organization has an existing IdP. For new implementations, there is an option to configure it alongside the open-source Keycloak. |
| MCP Tool Execution Costs Not Tracked | Cost calculation relies solely on the number of tokens in the LLM Provider response. Costs for the tool itself, such as external API calls, require separate aggregation. | You may consider collecting billing data for the relevant service separately and integrating it into FinOps reports. |
| Rapid MCP Specification Changes | The MCP specification was revised multiple times in 2025 alone, and you must continuously monitor the speed at which Kong plugins reflect the specifications. | We recommend regularly checking the Kong Gateway Changelog and the official MCP specification page. |
What is Session Affinity? It is a setting that causes a load balancer to always send requests from the same client to the same server instance. In connection-maintaining protocols like SSE, without this setting, connections may be dropped or state may be lost.
The Most Common Mistakes in Practice
- Deploying only Kong without an Authorization Server — Kong's
ai-mcp-oauth2plugin is responsible only for token validation (Resource Server). A separate Authorization Server, such as Keycloak or Auth0, is required to issue tokens; if this is overlooked, the entire authentication flow will not function. - If Session Affinity is not set for SSE sessions — If MCP clients maintain long-term connections via SSE and Kong load balances using the default round-robin method, requests from the same session may be distributed to different instances, potentially causing the connection to be dropped.
- When setting Rate Limiting based on Request Count — In MCP traffic, costs are proportional to token consumption, not the number of requests. Applying token-based quotas using the
tokens_count_strategy: total_tokenssetting is effective for practical cost control.
In Conclusion
Kong AI Gateway 3.12 is a realistic option for organizations running existing Kong infrastructure that allows them to declaratively manage authentication, rate limiting, and cost attribution for MCP traffic in a single layer. However, it requires prerequisites such as enterprise license costs, the operation of a separate Authorization Server, and Kong version upgrades. If you are already running Kong and an IdP and require rapid integration, you can configure the entire flow with just a few lines of YAML declaration; however, if these three prerequisites are not met, we recommend comparing it with open-source alternatives such as LiteLLM and IBM ContextForge.
3 Steps to Start Right Now:
- Installing Kong Gateway 3.12 and Preparing Deck CLI — You can check the latest version and installation method in the official Kong Gateway Changelog. After deploying Kong 3.12 to Docker or Kubernetes, it is recommended to verify the connection using
deck gateway ping. - GitHub MCP Server Security Tutorial Hands-on — By following the "Secure GitHub MCP Server traffic with Kong Gateway" tutorial in the official Kong documentation, you can experience OAuth 2.1 verification and ACL configuration firsthand. By configuring a local Keycloak and GitHub MCP server using Docker Compose, you can test the entire flow without external dependencies.
- Visualizing Team Metrics with Prometheus + Grafana — By enabling the
prometheusplugin and connecting the/metricsendpoint to Grafana, you can view MCP-related metrics such askong_ai_llm_provider_latency_msandkong_ai_requests_totalin real time. Filtering by theconsumer_grouplabel helps organize team-specific token consumption status into a dashboard.
Reference Materials
- Kong AI/MCP Gateway and Kong MCP Server Technical Breakdown | Kong Engineering Blog
- Introducing Kong's Enterprise MCP Gateway for Production-Ready AI | Kong
- Securing, Observing, and Governing MCP Servers with Kong AI Gateway | Kong
- AI MCP Proxy Plugin | Kong Official Documentation
- AI MCP OAuth2 Plugin | Kong Official Documentation
- AI Rate Limiting Advanced Plugin | Kong Official Documentation
- MCP Traffic Gateway | Kong Official Documentation
- How to: Secure GitHub MCP Server traffic with Kong Gateway | Kong 공식 문서
- Introducing MCP Tool ACLs: Fine-Grained Authorization for AI Agent Tools | Kong
- MCP Authorization Specification (2025-11-25) | Model Context Protocol 공식
- OAuth for MCP - Emerging Enterprise Patterns for Agent Authorization | GitGuardian Blog
- MCP, OAuth 2.1, PKCE, and the Future of AI Authorization | Aembit
- Building a Secure MCP Server with OAuth 2.1 and Azure AD | Microsoft ISE Blog
- Streamline AI Usage with Token Rate-Limiting & Tiered Access | Kong Engineering
- Microsoft mcp-gateway | GitHub
- IBM ContextForge | GitHub
- Kong Gateway Changelog