In-depth Analysis of MCP (Model Context Protocol) Multi-Agent Delegation Patterns — Designing Agent Chain Security with OAuth Token Exchange (RFC 8693) and Unidirectional Authority Reduction
Production failures occur in unexpected places. This was the case when a team automated a document summarization pipeline using agents. The orchestrator agent intended to grant only read:documents to the sub-agent, but passed the original access token as is without a token exchange. The sub-agent eventually permanently deleted the file with delete:documents permissions, and only the orchestrator agent's identity remained in the audit logs. It was impossible to track what the sub-agent did or through which delegation path it obtained the permissions.
This article covers methods to structurally prevent exactly such situations. It analyzes the trust boundary issues that arise when an agent delegates authority to another agent in an MCP environment, and specifically examines a design that cryptographically verifies and tracks the entire delegation chain by combining OAuth 2.0 Token Exchange (RFC 8693), unidirectional authority reduction, and On-Behalf-Of flows. Backend and full-stack developers familiar with the basics of JWT and OAuth 2.0 can read this immediately.
After reading this article, you will be able to directly implement a design that structurally blocks authority escalation in multi-agent delegation chains, leaves auditable traces at each delegation step, and selectively revokes the authority of specific agents.
Key Concepts
Trust Boundaries and Why Delegation Is Necessary
In traditional single-service systems, once user authentication is performed, the session remains valid for the entire lifetime of the request. However, in multi-agent systems, the process involves multiple hops, such as "User → Orchestrator Agent → Sub-agent A → MCP Server → Sub-agent A-1." A new trust boundary is established at each hop, and whenever this boundary is crossed, it must be verified whether "the entity sending this request truly represents the original user's intent."
MCP has a characteristic where identity propagation is structurally broken at each server interface. If an agent calls another agent without any mechanism, the receiving side has no way of knowing "who sent the request." This is the fundamental reason why the delegation pattern is necessary.
Trust Boundary: A boundary line between two components where different trust levels are applied. In an agent chain, this boundary occurs at every hop, and identity and authority must be re-verified when crossing the boundary.
Token Propagation — RFC 8693 and act Claim
The core mechanism for transferring identities in the delegation chain is the OAuth 2.0 Token Exchange (RFC 8693). When an agent exchanges its token for a new token, it can contain both the original user's identity (sub) and the current actor (act). As the delegation chain deepens, act claims are stacked in a nested structure.
{
"sub": "user-alice-001", // 원본 사용자 (변경되지 않음)
"act": {
"sub": "orchestrator-agent-A", // 1차 위임: 오케스트레이터
"act": {
"sub": "subagent-A1", // 2차 위임: 서브에이전트
"client_id": "mcp-tool-caller"
}
},
"scope": "read:documents", // 실제 부여된 권한
"aud": "mcp-server-docs", // 이 토큰을 수신할 MCP 서버
"exp": 1775000000 // 만료 시각 (Unix timestamp)
}Looking at this token, you can see at a glance that "Alice has delegated to Orchestrator A, A has delegated to Sub-Agent A1, and A1 is currently calling the MCP document server with read:documents authority." It is recommended that the MCP server record the entire sub and act chains in the audit log.
There is one important practical constraint. act As claims are nested more deeply, the JWT payload size becomes 200 per hop
It increases linearly in increments of 500 bytes. It can reach the HTTP header size limit (typically 8KB), and the cost of signature verification also increases proportionally to the depth. Delegation depth of 3~
Here is the key reason for the recommendation to limit to 4 hops.
// RFC 8693 Token Exchange 요청 예시 (TypeScript)
async function exchangeTokenForAgent(
userToken: string,
agentClientId: string,
requestedScopes: string[]
): Promise<string> {
const response = await fetch("https://auth.example.com/token", {
method: "POST",
headers: { "Content-Type": "application/x-www-form-urlencoded" },
body: new URLSearchParams({
grant_type: "urn:ietf:params:oauth:grant-type:token-exchange",
subject_token: userToken,
subject_token_type: "urn:ietf:params:oauth:token-type:access_token",
requested_token_type: "urn:ietf:params:oauth:token-type:access_token",
audience: "mcp-server-docs",
scope: requestedScopes.join(" "),
client_id: agentClientId,
}),
});
if (!response.ok) {
const errorBody = await response.text();
throw new Error(
`Token exchange failed (${response.status}): ${errorBody}`
);
}
const body = await response.json();
if (!body.access_token) {
throw new Error("Token exchange succeeded but no access_token returned");
}
return body.access_token; // act 클레임이 포함된 위임 토큰
}One-way Authority Reduction — Biscuit and Macaroon Selection Criteria
The Monotonic Attenuation principle is simple: A subordinate agent can never have broader authority than a superior agent. If an orchestrator holds the read, write authority, it can only delegate read to a subordinate agent, and delegating the admin authority is impossible.
There are two token formats that cryptographically enforce this principle. Macaroon adds caveats via the HMAC chain; it is simple to implement and supported by many languages. However, security vulnerabilities have been discovered in third-party caveats, and public verification (where the recipient independently confirms the signature) is difficult. Biscuit is a format designed specifically to overcome these limitations. Based on Ed25519 public-key cryptography, anyone with the public key can independently verify the signature, and it also securely handles third-party caveats. If you are starting a new project, Biscuit is a better choice.
The core characteristics of the cabinet are the same in both formats. Anyone can add (reduce privileges), but removal (increase privileges) is impossible without the signature of the original issuer.
Caveat: A verification condition attached to a token. Constraints such as "This token is valid only before 2026-04-15" or "This token allows read access only" can be cryptographically attached to the chain, and while addition is possible, removal can only be done by the original signer.
# Biscuit 토큰으로 단방향 권한 축소 구현 (Python)
# 실제 biscuit-python 패키지 API는 아래와 다를 수 있습니다.
# 블록 추가 시 키페어 처리 방식이 다르므로 공식 문서를 참고하세요.
# pip install biscuit-python
from biscuit_auth import BiscuitBuilder, BlockBuilder, KeyPair
root_keypair = KeyPair() # 실제 환경에서는 환경 변수나 HSM에서 로드합니다
# 오케스트레이터용 루트 토큰 발급
root_token = (
BiscuitBuilder()
.fact('right("docs", "read")')
.fact('right("logs", "write")')
.fact("budget(100)")
.build(root_keypair.private_key)
)
# 서브에이전트 A에게 위임 — 권한을 read only, 예산 40으로 축소
attenuated_token = root_token.append_block(
BlockBuilder()
.check('check if right($resource, "read")') # write 제거
.check("check if budget($b), $b <= 40") # 예산 40으로 제한
)
# 서브에이전트 A-1에게 재위임 — docs만 허용, 예산 10으로 추가 축소
further_attenuated = attenuated_token.append_block(
BlockBuilder()
.check('check if right("docs", "read")') # logs 제거
.check("check if budget($b), $b <= 10")
)So far, we have examined three foundational concepts: trust boundaries, token propagation, and one-way authority reduction. The following is the On-Behalf-Of flow that actually connects these three concepts.
On-Behalf-Of (OBO) Flow — Presenting Two Identities Simultaneously
When an agent acts on behalf of a user, the user identity and the agent identity must be presented simultaneously. These two identities must be separated to independently track whether "User Alice made the request" and "Agent A processed it."
사용자(Alice) IdP 에이전트 MCP 서버
│ │ │ │
│── 로그인 ────────>│ │ │
│<─ id_token ───────│ │ │
│── id_token 전달 ──────────────────────>│ │
│ │<── Token Exchange ─│ │
│ │ (RFC 8693 OBO) │ │
│ │── act 토큰 ────────>│ │
│ │ │── API 호출 ────>│
│ │ │ (sub=Alice, │
│ │ │ act=AgentA) │
│ │ │<── 응답 ────────│
│ │ │ │── 감사 로그
│ │ │ (sub+act 기록)The key point of this flow is that the IdP verifies both identities and encapsulates them into a single token. The MCP server records the entirety of sub (original user) and act (agent chain) from the received token in the audit log.
Practical Application
Example 1: Implementing Nested Delegation Chains with DelegateOS
DelegateOS is a TypeScript library that provides Ed25519 signature-based delegation tokens, one-way authority reduction, budget tracking, and MCP integration. The code below is a conceptual example illustrating the design philosophy of DelegateOS. Before actual implementation, it is recommended to check the GitHub repository to see if the package is currently published on npm.
import { DelegationChain, DelegationToken, Budget } from "@newtro/delegateos";
// orchestratorPrivateKey는 환경 변수나 HSM(Hardware Security Module)에서
// 로드해야 합니다. 코드에 하드코딩하면 안 됩니다.
// 예: const orchestratorPrivateKey = loadKeyFromEnv("ORCHESTRATOR_PRIVATE_KEY");
// 1. 오케스트레이터 루트 토큰 발급
const orchestratorChain = await DelegationChain.create({
subject: "user-alice-001",
actor: "orchestrator-agent",
permissions: ["read", "write"],
budget: new Budget({ tokens: 100, apiCalls: 500 }),
expiresIn: "1h",
signingKey: orchestratorPrivateKey,
});
// 2. 서브에이전트 A에게 위임 (read only, 예산 40)
const subAgentAToken: DelegationToken = await orchestratorChain.delegate({
to: "subagent-a",
permissions: ["read"], // write 제거
budget: new Budget({ tokens: 40, apiCalls: 200 }),
attestation: {
taskId: "task-summarize-docs",
reason: "문서 요약 작업",
timestamp: Date.now(),
},
});
// 3. 서브에이전트 A-1에게 재위임 (read:docs only, 예산 10)
const subAgentA1Token: DelegationToken = await subAgentAToken.delegate({
to: "subagent-a1",
permissions: ["read:docs"], // 리소스 범위 추가 축소
budget: new Budget({ tokens: 10, apiCalls: 50 }),
attestation: {
taskId: "task-summarize-docs-section-1",
reason: "1장 요약",
timestamp: Date.now(),
},
});
// 4. 서브에이전트 B에게 위임 (write:logs only, 예산 60)
const subAgentBToken: DelegationToken = await orchestratorChain.delegate({
to: "subagent-b",
permissions: ["write:logs"],
budget: new Budget({ tokens: 60, apiCalls: 300 }),
attestation: {
taskId: "task-write-audit-log",
reason: "감사 로그 기록",
timestamp: Date.now(),
},
});
// 5. MCP 서버 호출 시 토큰 전달
const mcpResponse = await mcpClient.callTool(
"read_document",
{ documentId: "doc-001" },
{
headers: {
Authorization: `Bearer ${subAgentA1Token.serialize()}`,
"X-Delegation-Chain": subAgentA1Token.getChainProof(),
},
}
);| Code Point | Description |
|---|---|
permissions: ["read"] |
Remove write from ["read", "write"] of the parent (orchestrator) — Apply one-way privilege reduction |
budget: new Budget(...) |
If the budget is greater than the parent, the library automatically raises an exception |
attestation |
Constructs an audit chain by generating signed proof at each delegation step |
getChainProof() |
Proof data that allows the MCP server to independently verify the entire delegation path |
signingKey |
Ed25519 private key loaded from environment variable or HSM |
Example 2: Applying an Execution Ring with the Microsoft Agent Governance Toolkit
Microsoft AGT introduces the concept of execution rings modeled after CPU privilege rings. Ring 0 represents the most trusted system level, while Ring 3 represents the least trusted external agent. Allowed operations are defined for each ring, and if a lower ring attempts to use the privileges of a higher ring, the policy engine blocks it immediately.
from agent_governance_toolkit import AgentRuntime, ExecutionRing, Policy
# 정책 정의: Ring 별 허용 작업
policy = Policy.from_yaml("""
rings:
ring0: # 오케스트레이터 (최고 신뢰)
allowed_actions: ["*"]
budget_limit: 10000
ring1: # 내부 서브에이전트
allowed_actions: ["read:*", "write:logs", "call:mcp_tools"]
budget_limit: 1000
max_delegation_depth: 2
ring2: # 외부 API 호출 에이전트
allowed_actions: ["read:public", "call:external_api"]
budget_limit: 100
max_delegation_depth: 0 # 재위임 금지
ring3: # 미검증 외부 에이전트
allowed_actions: [] # 모든 작업 차단
require_human_approval: true
""")
runtime = AgentRuntime(policy=policy)
# 에이전트 등록 및 링 배정
orchestrator = runtime.register_agent(
agent_id="orchestrator-001",
ring=ExecutionRing.RING0,
identity_token=orchestrator_jwt,
)
subagent_a = runtime.register_agent(
agent_id="subagent-a",
ring=ExecutionRing.RING1,
identity_token=subagent_a_jwt,
)
# 위임 실행 — Ring 정책 자동 적용
async def run_task():
async with runtime.delegation_context(
delegator=orchestrator,
delegatee=subagent_a,
task_scope=["read:documents"],
) as ctx:
result = await subagent_a.call_mcp_tool(
"read_document",
args={"id": "doc-001"},
delegation_context=ctx,
# Ring 정책 엔진이 모든 호출을 실시간으로 검사합니다
)
# 감사 로그 자동 생성 (EU AI Act, HIPAA, SOC 2 매핑 포함)
return resultNote: Ring 2 agents configured as max_delegation_depth: 0 cannot be further delegated. This setting prevents situations where external API calling agents create unexpected child agents.
Example 3: Delegating Third-Party APIs to Auth0 Token Vault
This is a pattern where, when a user grants an agent access to Google Drive, only a short-term token is issued via Token Vault instead of directly exposing the original OAuth token to the agent.
import { TokenVault } from "@auth0/agent-toolkit";
const vault = new TokenVault({
domain: "your-tenant.auth0.com",
clientId: process.env.AUTH0_CLIENT_ID,
});
// 에이전트가 Google Drive 접근이 필요할 때
async function getGoogleDriveAccess(
userToken: string,
agentId: string
): Promise<string> {
// Token Vault가 내부적으로 RFC 8693 Token Exchange 수행
const { access_token } = await vault.exchangeForService({
subjectToken: userToken,
targetService: "google-drive",
actor: agentId,
scopes: ["https://www.googleapis.com/auth/drive.readonly"],
maxDuration: 3600, // 원본 사용자 세션보다 길어질 수 없습니다
});
// 원본 Google OAuth 토큰은 Vault에만 존재, 에이전트에 노출되지 않습니다
return access_token; // 단기 위임 토큰만 반환
}
// MCP 도구 구현
const googleDriveTool = {
name: "list_drive_files",
handler: async (args: { folderId: string }, context: MCPContext) => {
const driveToken = await getGoogleDriveAccess(
context.userToken,
context.agentId
);
// driveToken은 1시간짜리 단기 토큰으로, 유출되더라도 피해가 최소화됩니다
return await googleDriveClient.listFiles(args.folderId, driveToken);
},
};Pros and Cons Analysis
Advantages
| Item | Content |
|---|---|
| Enforcement of Least Privilege | Structurally blocks privilege escalation through the unidirectional privilege reduction principle. It is impossible to gain higher privileges even through code bugs. |
| Auditability | You can achieve HIPAA, SOC 2, and EU AI Act compliance with a signed Attention Chain. |
| Separation of Responsibilities | User identity (sub) and agent identity (act) are separated, enabling granular anomaly detection and policy enforcement. |
| Granular Revocation | You can revoke permissions without touching the entire session by invalidating only the token of a specific sub-agent. |
| Budget Control | Cryptographically enforces costs and API call counts to prevent unlimited agent consumption. |
| Offline Verification | Biscuit-based tokens reduce latency by allowing signature verification with a public key without network round trips. |
Disadvantages and Precautions
I have sorted them by frequency of encounter in the workplace.
| Item | Content | Response Plan |
|---|---|---|
| ⚠️ Token Reuse · Missing Audience (Most Common) | Tokens without an audience are validly accepted by other MCP servers, leaving them vulnerable to replay attacks. | Specify aud as a specific MCP server as the OAuth 2.1 resource indicator, and bind nonce to the token. |
| ⚠️ Direct Transfer of Original Tokens (Most Common) | If original tokens are transferred to a sub-agent without a token exchange, one-way authority reduction completely breaks down and the audit chain disappears. | Mandate RFC 8693 Token Exchange at every delegation stage. |
| Delegation Chain Splicing | A compromised intermediary agent can create unintended permission combinations by combining subject_token and actor_token from different contexts. |
Bind audience and nonce to each token and verify context consistency at the token exchange endpoint. |
| Increased Complexity | As the delegation chain deepens, token validation logic and audit log management become more complex. JWT payloads also increase by hundreds of bytes per hop. | Limit the maximum delegation depth to 3–4 hops and abstract it using a governance library. |
| Performance Overhead | An additional token exchange round trip is added at each hop. Latency may increase significantly when relying on an external IdP. | Use offline verifiable Biscuit tokens or cache IdP responses briefly. |
| Standard Immature | IETF draft-klrc-aiagent-auth is still in the draft stage, and interoperability issues may arise due to implementation differences across frameworks. |
Core logic is implemented based on RFC 8693, and framework-specific adapters are managed separately. |
| Token Lifetime Management | If the agent runs for a long time, a token longer than the original user session may be issued. | Sets the upper limit for max_duration to the remaining time of the original session. |
| Consent Complexity | It is difficult to obtain consent in a form that users can understand regarding the fact that "which agent accesses which resource." | Explicitly display the agent name, accessed resource, and validity period on the consent screen. |
Delegation Chain Splicing: An attack in which an attacker combines a user token from context A with an agent token from context B to create an unintended combination of permissions. nonce is the core mechanism for defending against this attack. For every token exchange request, a random value, such as a UUID, is generated as a nonce claim and included in the issued token. The receiving server checks the short-term cache to verify whether nonce matches the corresponding exchange context and whether the value has already been used. Tokens issued in a different context are immediately rejected due to a nonce mismatch.
The Most Common Mistakes in Practice
- This involves transferring the same token as the orchestrator to the sub-agent. If the original token is transferred without a token exchange, the one-way authority reduction principle is completely broken, and the
actchain disappears from the audit log. This is the exact case of the accident mentioned in the introduction. - Do not restrict the
audienceof the delegation token to a specific MCP server. Tokens without an audience are validly accepted by other MCP servers, leaving them vulnerable to replay attacks. It is highly recommended to specifyaudas an OAuth 2.1 resource indicator. - This involves an agent directly storing and reusing OAuth tokens in memory or a file. If a token is leaked, the damage becomes significant. You can minimize the damage from leaks by using a pattern where only short-term tokens are issued through an external storage facility such as the Auth0 Token Vault.
In Conclusion
The core of multi-agent delegation design lies in the principle that "trust must always flow in a shrinking direction, and every delegation step must leave a verifiable trace." While it may seem complex, this principle can be directly implemented in code by combining RFC 8693's act claim, Biscuit's caviot, and DelegateOS's signed attation chain.
Which of the three examples to choose depends on the situation.
| Situation | Recommended Access |
|---|---|
| Open source project, requires granular control, TypeScript environment | DelegateOS (self-implemented based on concepts) |
| Enterprise Environments Require Compliance Automation (EU AI Act, SOC 2, HIPAA) | Microsoft Agent Governance Toolkit |
| Third-party API (Google, GitHub, etc.) delegation, SaaS environment, rapid adoption | Auth0 Token Vault |
| Common to all three | RFC 8693 Token Exchange + audience specified |
3 Steps to Start Right Now:
- Check the token flow of your existing MCP agent. If you are passing the original token as is when calling a sub-agent, you can start by introducing an IdP (such as ZITADEL or Keycloak) that supports RFC 8693 Token Exchange and replacing it with a delegated token containing the
actclaim. - Experiment with the boundary conditions of the delegation chain in a sandbox project. By directly observing how your selected library or IdP responds to authority escalation attempts, budget overruns, and audience mismatches, you can reduce unexpected gaps in your future production design.
- Designing the audit log structure in advance significantly reduces SOC 2 audit costs. If you log
sub(origin user),actchain(agent path),tool_name,timestamp, andbudget_consumedas a structured log for each MCP server call, it becomes much easier to reconstruct evidence when a HIPAA or SOC 2 audit request comes in.
Next Post: Prompt Injection Attacks on MCP Servers — Mechanisms by which External Content Takes Over Agents and Defense Design Patterns
Reference Materials
- Intelligent AI Delegation — Google DeepMind | arXiv
- DelegateOS: Cryptographic delegation tokens for multi-agent systems | Hacker News
- GitHub — newtro/delegateos
- Introducing the Agent Governance Toolkit | Microsoft Open Source Blog
- GitHub — microsoft/agent-governance-toolkit
- Authenticated Delegation and Authorized AI Agents | arXiv / MIT Media Lab
- The Ultimate Guide to MCP Auth: Identity, Consent, and Agent Security | Permit.io
- On-Behalf-Of authentication for AI agents | Scalekit
- Control the Chain, Secure the System: Fixing AI Agent Delegation | Okta
- Zero Trust for AI Agents: Delegation, Identity and Access Control | CyberArk
- Token Delegation and MCP server orchestration for multi-user AI systems | DEV Community
- MCP Authorization: Securing MCP Servers With Fine-Grained Access Control | Cerbos
- State of MCP Server Security 2025 | Astrix
- RFC 8693 — OAuth 2.0 Token Exchange | IETF
- IETF AI Agent Authentication and Authorization — draft-klrc-aiagent-auth | IETF
- GitHub — agentic-community/mcp-gateway-registry
- CoSAI WS4 — Secure Design of Agentic Systems | GitHub
- Advancing Multi-Agent Systems Through Model Context Protocol | arXiv