Designing Multi-Agent Orchestration with n8n AI Agent + MCP — Layered Architecture and Real-World Pitfalls

"Summarize today's emails, schedule the relevant meetings, and create issues in Linear" — I've personally experienced what happens when you try to cram a complex request like this into a single agent. The prompt grows bloated, the model gets confused about which of 20+ tools to use, and the results vary with every run. I started out thinking "I just need to write a better system prompt," but I quickly learned how naive that was in practice.

Combining n8n's AI Agent Tool node with MCP (Model Context Protocol) lets you solve this problem structurally. An orchestrator agent classifies intent, delegates domain responsibility to specialized sub-agents, and each agent binds only the MCP tools it needs. After reading this article, you'll be able to build an orchestrator + two or more sub-agent layered structure directly on the n8n canvas, and you'll know how to avoid the pitfalls that commonly appear in production environments.

This is especially useful if you've worked with n8n before, or if you're a backend developer who's grown tired of code-based frameworks like LangGraph or CrewAI.

Core Concepts

n8n AI Agent Node — An LLM Wrapper with a ReAct Loop

ReAct (Reasoning + Acting) loop: An autonomous execution pattern where an LLM cycles through reasoning about which tool to use → selecting and executing a tool → observing the result → deciding the next action.

The AI Agent node encapsulates this ReAct loop. System prompt, model selection, memory backend, and output schema can each be configured independently per node — making it possible to attach different models to different agents or use different memory strategies for each.

yaml

AI Agent Node Components
├── System Prompt      : Define this agent's role and constraints
├── Model              : GPT-4o / Claude Sonnet / Gemini, etc.
├── Memory             : Window / Redis / pgvector
├── Output Schema      : Enforce structured output (JSON Schema)
└── Tools              : MCP Client, Code, HTTP Request, etc.

Choosing a memory backend can be confusing. Here's how I distinguish them: Redis for short-term session state, pgvector for long-term RAG context, and Window Memory for in-conversation history. Attaching just one per agent based on its role is sufficient.

The AI Agent Tool node goes one step further, allowing you to register another AI Agent as a tool. Thanks to this node, officially released in the first half of 2025, you can visually compose a hierarchical agent structure on a single canvas without splitting sub-workflows into separate canvases.

AI Agent Tool node: A node that registers an agent into another agent's tools array. When the orchestrator issues a tool_call, the corresponding sub-agent executes and returns a result. It completes a nested agent hierarchy on a single canvas without invoking separate workflows.

MCP — A Standard Contract for Tool Discovery and Execution

MCP (Model Context Protocol) is an open protocol designed by Anthropic that lets AI agents discover and invoke external tools and data sources through a consistent interface. Instead of implementing different API specs for each SaaS product directly, you use a single MCP server to receive and consume a list of tools.

n8n supports MCP in two directions:

Node	Direction	Role
`MCP Client Tool`	Consume	Calls tools from an external MCP server from within an agent
`MCP Server Trigger`	Expose	Publishes n8n workflows themselves as an MCP server — callable by Claude Desktop, Cursor, etc.

What makes the combination of these two nodes interesting is that n8n is not just a client that consumes MCP tools, but can also act as an MCP server that exposes integrations with 1,650+ external services. This makes it possible for Claude Desktop to send Slack messages or query Salesforce through n8n.

Orchestration Architecture — End-to-End Flow

The core flow of a multi-agent pipeline is as follows:

sql

User Request (natural language)
        ↓
┌─────────────────────────────────┐
│        Orchestrator Agent        │
│  (Intent classification · Task decomposition) │
└──────┬──────────┬───────────┬───┘
       ↓          ↓           ↓
  Email Agent  Calendar Agent  Data Agent
  (MCP: Gmail)  (MCP: GCal)  (MCP: DB/Vector)
       ↓          ↓           ↓
       └──────────┴───────────┘
                  ↓
           Fan-In Node (Merge)
                  ↓
          Final Response Generation

The layered strategy of using a heavier model (GPT-4o, Claude Sonnet) for the orchestrator and lightweight models (GPT-4o-mini, Haiku) for sub-agents at the execution stage is key to significantly reducing token costs. Since intent classification accuracy determines the overall success rate of the pipeline, using a better model is justified there; a fixed domain at the execution stage is well within the capability of a lightweight model.

Structured Output: A method that forces agents to respond in a fixed format matching a JSON Schema instead of natural language. It greatly reduces the risk of LLM hallucination when passing data between agents.

Practical Application

Example 1: Business Assistant — Email, Calendar, and Task Integration

This is a scenario for handling composite requests like "Summarize today's meeting-related emails, schedule a team meeting for tomorrow, and create action items in Linear." It's a situation that comes up frequently in practice, and a single agent tends to fail at tool selection.

Orchestrator System Prompt

You are a business assistant orchestrator.
Analyze the user's request and route tasks to the appropriate sub-agents.
 
Available agents:
- email_agent: read/write Gmail messages
- calendar_agent: check/create Google Calendar events
- task_agent: create/update Linear issues and Notion pages
 
Always respond in the following JSON format:
{
  "intent": "string",
  "agents_to_invoke": ["email_agent", "calendar_agent"],
  "routing_reason": "string"
}

Once the orchestrator returns an agents_to_invoke array, a Switch node parses that array and branches to each sub-agent. You can create the email agent branch in the Switch node using the condition {{ $json.agents_to_invoke.includes('email_agent') }} and connect the remaining agents the same way. If multiple agents need to run simultaneously, you can use n8n's Split Out node to decompose the array and trigger each agent in parallel.

Sub-Agent Configuration

yaml

Email Agent
├── System Prompt : Gmail only. Do not call other tools.
├── Model         : claude-haiku-4-5 (cost reduction)
├── Tools         : MCP Client Tool → Gmail MCP Server
└── Output Schema : { "source": "email", "emails": [...], "summary": "..." }
 
Calendar Agent
├── System Prompt : Google Calendar only.
├── Model         : gpt-4o-mini
├── Tools         : MCP Client Tool → GCal MCP Server
└── Output Schema : { "source": "calendar", "events": [...], "conflicts": [...] }
 
Task Agent
├── System Prompt : Linear / Notion only.
├── Model         : gpt-4o-mini
├── Tools         : MCP Client Tool → Linear MCP + Notion MCP
└── Output Schema : { "source": "task", "created_issues": [...], "pages": [...] }

It's a good idea to always include a source field in the Output Schema. The Fan-In node downstream uses this field as the basis for merging each agent's results.

Component	Reason for Choice
Advanced model for orchestrator	Intent classification errors cause full pipeline failure, so accuracy comes first
Lightweight model for sub-agents	The domain is fixed, so complex reasoning is unnecessary
Enforced structured output	Prevents parsing errors when merging agent results in the Fan-In node
MCP Client Tool	Binds only the tools each agent needs → blocks unnecessary tool exposure

Example 2: MCP Context Reducer Pattern

What if there are 50+ tools when you connect an MCP server to the task agent in Example 1? Honestly, I initially thought "just give it all of them" — until I encountered a situation where GPT-4o repeatedly selected the wrong tools and the entire pipeline fell apart. The more tools there are, the lower the model's selection accuracy, and the more unnecessary tokens are wasted.

The solution proposed by the official n8n template (#4475) is the Context Reducer pattern.

sql

User Request
     ↓
Preprocessing Agent (Context Reducer)
 - From the full tool list, selects only 3–5 tools needed for this request
 - Returns the selected tool list as JSON
     ↓
Execution Agent
 - Uses only the minimal tool set provided by the preprocessing agent
 - Greatly reduced context size → cost savings + improved accuracy

tool_manifest Injection Method

{{ $json.tool_manifest }} is the result of querying the tool list from the MCP server connected to the MCP Client Tool. Specifically, an HTTP Request node or the MCP initialization step calls the server's /tools/list endpoint to retrieve tool names and descriptions, serializes them as JSON, sets them as $json.tool_manifest in a Code node, and injects them into the Context Reducer's prompt.

Context Reducer Agent Prompt

You are a tool selector. Given the user's request and the full tool manifest,
select ONLY the tools needed to fulfill this request.
 
Full tool manifest:
{{ $json.tool_manifest }}
 
User request:
{{ $json.user_request }}
 
Respond with:
{
  "selected_tools": ["tool_a", "tool_b"],
  "reason": "short explanation"
}

Combining this pattern with MCP Server Trigger turns the n8n workflow itself into an "intelligent MCP gateway." When an external AI client (such as Claude Desktop) calls it, n8n first reduces the context and responds with an optimized set of tools.

Example 3: Web Research Multi-Agent — Parallel Fan-Out/Fan-In

Research tasks require exploring multiple sources simultaneously, making parallel processing far more effective than a serial agent chain. I remember redesigning the architecture from scratch after initially building this serially and finding the response time was too long.

css

Research Orchestrator
│  (Topic analysis → Generate 3 search queries → Parallel branching)
│
├── Search Agent (MCP: Brave Search)
│    └── Output: { "source": "brave", "items": [...] }
│
├── Scraping Agent (MCP: Firecrawl)
│    └── Output: { "source": "firecrawl", "content": [...] }
│
└── Vector Search Agent (MCP: Supabase pgvector)
     └── Output: { "source": "supabase", "documents": [...] }
          │
          └──────────────────────┐
                                 ↓
                          Fan-In Node (Merge)
                                 ↓
                       Summarization & Citation Agent
                    (Integrate results → Generate report with sources)

Merging Results in the Fan-In Node

javascript

// n8n Code Node (JavaScript)
const results = $input.all();
 
const merged = {
  search_results: results.find(r => r.json.source === 'brave')?.json.items ?? [],
  scraped_content: results.find(r => r.json.source === 'firecrawl')?.json.content ?? [],
  internal_docs: results.find(r => r.json.source === 'supabase')?.json.documents ?? [],
  collected_at: new Date().toISOString()
};
 
return [{ json: merged }];

For this code to work correctly, the source field must be included in each agent's Output Schema. Fixing the values to 'brave', 'firecrawl', and 'supabase' respectively ensures there's no confusion at the Fan-In stage.

The key point is that the orchestrator is responsible only for generating the search queries, while the actual searching, scraping, and vector retrieval are handled completely independently by each agent. Even if one agent fails, the results from the others are preserved.

Pros and Cons

Advantages

Item	Details
Low-code accessibility	Complex multi-agent architectures can be composed visually without code
Single canvas	The `AI Agent Tool` node enables agent nesting without splitting into separate workflows
Standard protocol	MCP compliance ensures compatibility with various AI clients including Claude, GPT, and Gemini
Cost optimization	Model layering (mixing advanced and lightweight) can significantly reduce token costs
Rich integrations	Nearly any external service can be connected via 820 core + 830 community nodes
Memory flexibility	Free choice among Redis (short-term), pgvector (long-term RAG), and Window Memory (conversation context)

Disadvantages and Caveats

Item	Details	Mitigation
SSE connection stability	MCP Server Trigger is SSE-based — connection drops can occur in multi-webhook replica environments	Route `/mcp*` paths to a single webhook replica in Queue Mode
Reverse proxy configuration	Proxy buffering is enabled by default in nginx and similar tools, blocking the stream	Setting `proxy_buffering off` and `X-Accel-Buffering: no` is mandatory
Debugging complexity	Execution tracing becomes harder as agent call depth increases	Include a `trace_id` field in each agent's output and add an observability layer that logs execution metadata to an external DB like Postgres via a Code node
Context contamination	Passing unstructured data between agents increases the risk of LLM hallucination	Fix types by enforcing structured output (JSON Schema)
Accumulated latency	In a serial agent chain, each LLM call time adds up	Switch to Fan-Out parallel processing where possible

SSE (Server-Sent Events): An HTTP-based protocol that maintains a one-way streaming connection from server to client. Because MCP Server Trigger uses this, the stream will be interrupted unless buffering is disabled at the proxy layer.

The Most Common Mistakes in Practice

Binding tools directly to the orchestrator — The orchestrator should be responsible for "routing" only. If an email tool is attached to the orchestrator, it may call it directly without making a branching decision.
Using natural language for inter-agent communication — Passing natural language like "check the emails" directly to a sub-agent leads to different interpretations per agent. Fixing types with a JSON Schema is far more reliable.
Using the same model for all agents — Using GPT-4o for simple tool calls at the execution stage is a waste of money. It is recommended to layer models: advanced models for intent classification, lightweight models for execution.

Closing Thoughts

By combining n8n's AI Agent Tool node with MCP, you can build production-grade multi-agent orchestration in a low-code way.

Three steps you can take right now:

Open template #4475 in n8n Cloud or a local instance — The MCP Server with AI Agent as Context Reducer template is the fastest starting point for understanding how MCP Client Tool and structured output are actually combined in practice.
Build a minimal pipeline with an orchestrator + 2 sub-agents — Register agents responsible for Gmail MCP and Google Calendar MCP respectively as AI Agent Tool nodes, and write routing instructions in the orchestrator's system prompt. That's all you need.
Fix each agent's output schema to a JSON Schema — It may seem tedious at first, but once the data flow between agents stabilizes, you'll notice a significant drop in debugging time.

References

#n8n#MCP#멀티에이전트#AI-Agent#오케스트레이션#ReAct#LLM#구조화출력#Fan-Out-Fan-In#로우코드

Designing Multi-Agent Orchestration with n8n AI Agent + MCP — Layered Architecture and Real-World Pitfalls | DEV BAK - 기술블로그

Designing Multi-Agent Orchestration with n8n AI Agent + MCP — Layered Architecture and Real-World Pitfalls

This is especially useful if you've worked with n8n before, or if you're a backend developer who's grown tired of code-based frameworks like LangGraph or CrewAI.

Core Concepts

n8n AI Agent Node — An LLM Wrapper with a ReAct Loop

ReAct (Reasoning + Acting) loop: An autonomous execution pattern where an LLM cycles through reasoning about which tool to use → selecting and executing a tool → observing the result → deciding the next action.

yaml

AI Agent Node Components
├── System Prompt      : Define this agent's role and constraints
├── Model              : GPT-4o / Claude Sonnet / Gemini, etc.
├── Memory             : Window / Redis / pgvector
├── Output Schema      : Enforce structured output (JSON Schema)
└── Tools              : MCP Client, Code, HTTP Request, etc.

AI Agent Tool node: A node that registers an agent into another agent's tools array. When the orchestrator issues a tool_call, the corresponding sub-agent executes and returns a result. It completes a nested agent hierarchy on a single canvas without invoking separate workflows.

MCP — A Standard Contract for Tool Discovery and Execution

n8n supports MCP in two directions:

Node	Direction	Role
`MCP Client Tool`	Consume	Calls tools from an external MCP server from within an agent
`MCP Server Trigger`	Expose	Publishes n8n workflows themselves as an MCP server — callable by Claude Desktop, Cursor, etc.

Orchestration Architecture — End-to-End Flow

The core flow of a multi-agent pipeline is as follows:

sql

User Request (natural language)
        ↓
┌─────────────────────────────────┐
│        Orchestrator Agent        │
│  (Intent classification · Task decomposition) │
└──────┬──────────┬───────────┬───┘
       ↓          ↓           ↓
  Email Agent  Calendar Agent  Data Agent
  (MCP: Gmail)  (MCP: GCal)  (MCP: DB/Vector)
       ↓          ↓           ↓
       └──────────┴───────────┘
                  ↓
           Fan-In Node (Merge)
                  ↓
          Final Response Generation

Structured Output: A method that forces agents to respond in a fixed format matching a JSON Schema instead of natural language. It greatly reduces the risk of LLM hallucination when passing data between agents.

Practical Application

Example 1: Business Assistant — Email, Calendar, and Task Integration

Orchestrator System Prompt

You are a business assistant orchestrator.
Analyze the user's request and route tasks to the appropriate sub-agents.
 
Available agents:
- email_agent: read/write Gmail messages
- calendar_agent: check/create Google Calendar events
- task_agent: create/update Linear issues and Notion pages
 
Always respond in the following JSON format:
{
  "intent": "string",
  "agents_to_invoke": ["email_agent", "calendar_agent"],
  "routing_reason": "string"
}

Sub-Agent Configuration

yaml

Email Agent
├── System Prompt : Gmail only. Do not call other tools.
├── Model         : claude-haiku-4-5 (cost reduction)
├── Tools         : MCP Client Tool → Gmail MCP Server
└── Output Schema : { "source": "email", "emails": [...], "summary": "..." }
 
Calendar Agent
├── System Prompt : Google Calendar only.
├── Model         : gpt-4o-mini
├── Tools         : MCP Client Tool → GCal MCP Server
└── Output Schema : { "source": "calendar", "events": [...], "conflicts": [...] }
 
Task Agent
├── System Prompt : Linear / Notion only.
├── Model         : gpt-4o-mini
├── Tools         : MCP Client Tool → Linear MCP + Notion MCP
└── Output Schema : { "source": "task", "created_issues": [...], "pages": [...] }

It's a good idea to always include a source field in the Output Schema. The Fan-In node downstream uses this field as the basis for merging each agent's results.

Component	Reason for Choice
Advanced model for orchestrator	Intent classification errors cause full pipeline failure, so accuracy comes first
Lightweight model for sub-agents	The domain is fixed, so complex reasoning is unnecessary
Enforced structured output	Prevents parsing errors when merging agent results in the Fan-In node
MCP Client Tool	Binds only the tools each agent needs → blocks unnecessary tool exposure

Example 2: MCP Context Reducer Pattern

The solution proposed by the official n8n template (#4475) is the Context Reducer pattern.

sql

User Request
     ↓
Preprocessing Agent (Context Reducer)
 - From the full tool list, selects only 3–5 tools needed for this request
 - Returns the selected tool list as JSON
     ↓
Execution Agent
 - Uses only the minimal tool set provided by the preprocessing agent
 - Greatly reduced context size → cost savings + improved accuracy

tool_manifest Injection Method

Context Reducer Agent Prompt

You are a tool selector. Given the user's request and the full tool manifest,
select ONLY the tools needed to fulfill this request.
 
Full tool manifest:
{{ $json.tool_manifest }}
 
User request:
{{ $json.user_request }}
 
Respond with:
{
  "selected_tools": ["tool_a", "tool_b"],
  "reason": "short explanation"
}

Example 3: Web Research Multi-Agent — Parallel Fan-Out/Fan-In

css

Research Orchestrator
│  (Topic analysis → Generate 3 search queries → Parallel branching)
│
├── Search Agent (MCP: Brave Search)
│    └── Output: { "source": "brave", "items": [...] }
│
├── Scraping Agent (MCP: Firecrawl)
│    └── Output: { "source": "firecrawl", "content": [...] }
│
└── Vector Search Agent (MCP: Supabase pgvector)
     └── Output: { "source": "supabase", "documents": [...] }
          │
          └──────────────────────┐
                                 ↓
                          Fan-In Node (Merge)
                                 ↓
                       Summarization & Citation Agent
                    (Integrate results → Generate report with sources)

Merging Results in the Fan-In Node

javascript

// n8n Code Node (JavaScript)
const results = $input.all();
 
const merged = {
  search_results: results.find(r => r.json.source === 'brave')?.json.items ?? [],
  scraped_content: results.find(r => r.json.source === 'firecrawl')?.json.content ?? [],
  internal_docs: results.find(r => r.json.source === 'supabase')?.json.documents ?? [],
  collected_at: new Date().toISOString()
};
 
return [{ json: merged }];

Pros and Cons

Advantages

Item	Details
Low-code accessibility	Complex multi-agent architectures can be composed visually without code
Single canvas	The `AI Agent Tool` node enables agent nesting without splitting into separate workflows
Standard protocol	MCP compliance ensures compatibility with various AI clients including Claude, GPT, and Gemini
Cost optimization	Model layering (mixing advanced and lightweight) can significantly reduce token costs
Rich integrations	Nearly any external service can be connected via 820 core + 830 community nodes
Memory flexibility	Free choice among Redis (short-term), pgvector (long-term RAG), and Window Memory (conversation context)

Disadvantages and Caveats

Item	Details	Mitigation
SSE connection stability	MCP Server Trigger is SSE-based — connection drops can occur in multi-webhook replica environments	Route `/mcp*` paths to a single webhook replica in Queue Mode
Reverse proxy configuration	Proxy buffering is enabled by default in nginx and similar tools, blocking the stream	Setting `proxy_buffering off` and `X-Accel-Buffering: no` is mandatory
Debugging complexity	Execution tracing becomes harder as agent call depth increases	Include a `trace_id` field in each agent's output and add an observability layer that logs execution metadata to an external DB like Postgres via a Code node
Context contamination	Passing unstructured data between agents increases the risk of LLM hallucination	Fix types by enforcing structured output (JSON Schema)
Accumulated latency	In a serial agent chain, each LLM call time adds up	Switch to Fan-Out parallel processing where possible

SSE (Server-Sent Events): An HTTP-based protocol that maintains a one-way streaming connection from server to client. Because MCP Server Trigger uses this, the stream will be interrupted unless buffering is disabled at the proxy layer.

The Most Common Mistakes in Practice

Binding tools directly to the orchestrator — The orchestrator should be responsible for "routing" only. If an email tool is attached to the orchestrator, it may call it directly without making a branching decision.
Using natural language for inter-agent communication — Passing natural language like "check the emails" directly to a sub-agent leads to different interpretations per agent. Fixing types with a JSON Schema is far more reliable.
Using the same model for all agents — Using GPT-4o for simple tool calls at the execution stage is a waste of money. It is recommended to layer models: advanced models for intent classification, lightweight models for execution.

Closing Thoughts

By combining n8n's AI Agent Tool node with MCP, you can build production-grade multi-agent orchestration in a low-code way.

Three steps you can take right now:

Open template #4475 in n8n Cloud or a local instance — The MCP Server with AI Agent as Context Reducer template is the fastest starting point for understanding how MCP Client Tool and structured output are actually combined in practice.
Build a minimal pipeline with an orchestrator + 2 sub-agents — Register agents responsible for Gmail MCP and Google Calendar MCP respectively as AI Agent Tool nodes, and write routing instructions in the orchestrator's system prompt. That's all you need.
Fix each agent's output schema to a JSON Schema — It may seem tedious at first, but once the data flow between agents stabilizes, you'll notice a significant drop in debugging time.

References

#n8n#MCP#멀티에이전트#AI-Agent#오케스트레이션#ReAct#LLM#구조화출력#Fan-Out-Fan-In#로우코드

Core Concepts

n8n AI Agent Node — An LLM Wrapper with a ReAct Loop

MCP — A Standard Contract for Tool Discovery and Execution

Orchestration Architecture — End-to-End Flow

Practical Application

Example 1: Business Assistant — Email, Calendar, and Task Integration

Example 2: MCP Context Reducer Pattern

Example 3: Web Research Multi-Agent — Parallel Fan-Out/Fan-In

Pros and Cons

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Closing Thoughts

References

Core Concepts

n8n AI Agent Node — An LLM Wrapper with a ReAct Loop

MCP — A Standard Contract for Tool Discovery and Execution

Orchestration Architecture — End-to-End Flow

Practical Application

Example 1: Business Assistant — Email, Calendar, and Task Integration

Example 2: MCP Context Reducer Pattern

Example 3: Web Research Multi-Agent — Parallel Fan-Out/Fan-In

Pros and Cons

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Closing Thoughts

References

Recommended Posts

Claude Desktop × n8n: Triggering Workflows Directly with Natural Language via MCP Reverse Integration

n8n을 MCP Hub로 쓰면 525개 서비스를 AI 에이전트에 단일 도구로 일괄 연결한다 — n8n as MCP Hub 아키텍처 패턴

How to Interpret Local LLM Benchmarks — Choosing the Right Model for Your VRAM with Real-World Comparisons by Quantization and Runtime (2026)

n8n MCP Self-Hosting Integration: How AI Designs and Deploys Workflows in Natural Language (Including a 71-Node Production Case)

Structurally Tracking Agent User State with Mastra Working Memory + Zod Schema

How a TypeScript AI Agent Maintains Conversational Context Across Sessions — Designing Mastra's Memory Layer