Designing Multi-Agent Orchestration with n8n AI Agent + MCP — Layered Architecture and Real-World Pitfalls
"Summarize today's emails, schedule the relevant meetings, and create issues in Linear" — I've personally experienced what happens when you try to cram a complex request like this into a single agent. The prompt grows bloated, the model gets confused about which of 20+ tools to use, and the results vary with every run. I started out thinking "I just need to write a better system prompt," but I quickly learned how naive that was in practice.
Combining n8n's AI Agent Tool node with MCP (Model Context Protocol) lets you solve this problem structurally. An orchestrator agent classifies intent, delegates domain responsibility to specialized sub-agents, and each agent binds only the MCP tools it needs. After reading this article, you'll be able to build an orchestrator + two or more sub-agent layered structure directly on the n8n canvas, and you'll know how to avoid the pitfalls that commonly appear in production environments.
This is especially useful if you've worked with n8n before, or if you're a backend developer who's grown tired of code-based frameworks like LangGraph or CrewAI.
Core Concepts
n8n AI Agent Node — An LLM Wrapper with a ReAct Loop
ReAct (Reasoning + Acting) loop: An autonomous execution pattern where an LLM cycles through reasoning about which tool to use → selecting and executing a tool → observing the result → deciding the next action.
The AI Agent node encapsulates this ReAct loop. System prompt, model selection, memory backend, and output schema can each be configured independently per node — making it possible to attach different models to different agents or use different memory strategies for each.
AI Agent Node Components
├── System Prompt : Define this agent's role and constraints
├── Model : GPT-4o / Claude Sonnet / Gemini, etc.
├── Memory : Window / Redis / pgvector
├── Output Schema : Enforce structured output (JSON Schema)
└── Tools : MCP Client, Code, HTTP Request, etc.Choosing a memory backend can be confusing. Here's how I distinguish them: Redis for short-term session state, pgvector for long-term RAG context, and Window Memory for in-conversation history. Attaching just one per agent based on its role is sufficient.
The AI Agent Tool node goes one step further, allowing you to register another AI Agent as a tool. Thanks to this node, officially released in the first half of 2025, you can visually compose a hierarchical agent structure on a single canvas without splitting sub-workflows into separate canvases.
AI Agent Tool node: A node that registers an agent into another agent's
toolsarray. When the orchestrator issues atool_call, the corresponding sub-agent executes and returns a result. It completes a nested agent hierarchy on a single canvas without invoking separate workflows.
MCP — A Standard Contract for Tool Discovery and Execution
MCP (Model Context Protocol) is an open protocol designed by Anthropic that lets AI agents discover and invoke external tools and data sources through a consistent interface. Instead of implementing different API specs for each SaaS product directly, you use a single MCP server to receive and consume a list of tools.
n8n supports MCP in two directions:
| Node | Direction | Role |
|---|---|---|
MCP Client Tool |
Consume | Calls tools from an external MCP server from within an agent |
MCP Server Trigger |
Expose | Publishes n8n workflows themselves as an MCP server — callable by Claude Desktop, Cursor, etc. |
What makes the combination of these two nodes interesting is that n8n is not just a client that consumes MCP tools, but can also act as an MCP server that exposes integrations with 1,650+ external services. This makes it possible for Claude Desktop to send Slack messages or query Salesforce through n8n.
Orchestration Architecture — End-to-End Flow
The core flow of a multi-agent pipeline is as follows:
User Request (natural language)
↓
┌─────────────────────────────────┐
│ Orchestrator Agent │
│ (Intent classification · Task decomposition) │
└──────┬──────────┬───────────┬───┘
↓ ↓ ↓
Email Agent Calendar Agent Data Agent
(MCP: Gmail) (MCP: GCal) (MCP: DB/Vector)
↓ ↓ ↓
└──────────┴───────────┘
↓
Fan-In Node (Merge)
↓
Final Response GenerationThe layered strategy of using a heavier model (GPT-4o, Claude Sonnet) for the orchestrator and lightweight models (GPT-4o-mini, Haiku) for sub-agents at the execution stage is key to significantly reducing token costs. Since intent classification accuracy determines the overall success rate of the pipeline, using a better model is justified there; a fixed domain at the execution stage is well within the capability of a lightweight model.
Structured Output: A method that forces agents to respond in a fixed format matching a JSON Schema instead of natural language. It greatly reduces the risk of LLM hallucination when passing data between agents.
Practical Application
Example 1: Business Assistant — Email, Calendar, and Task Integration
This is a scenario for handling composite requests like "Summarize today's meeting-related emails, schedule a team meeting for tomorrow, and create action items in Linear." It's a situation that comes up frequently in practice, and a single agent tends to fail at tool selection.
Orchestrator System Prompt
You are a business assistant orchestrator.
Analyze the user's request and route tasks to the appropriate sub-agents.
Available agents:
- email_agent: read/write Gmail messages
- calendar_agent: check/create Google Calendar events
- task_agent: create/update Linear issues and Notion pages
Always respond in the following JSON format:
{
"intent": "string",
"agents_to_invoke": ["email_agent", "calendar_agent"],
"routing_reason": "string"
}Once the orchestrator returns an agents_to_invoke array, a Switch node parses that array and branches to each sub-agent. You can create the email agent branch in the Switch node using the condition {{ $json.agents_to_invoke.includes('email_agent') }} and connect the remaining agents the same way. If multiple agents need to run simultaneously, you can use n8n's Split Out node to decompose the array and trigger each agent in parallel.
Sub-Agent Configuration
Email Agent
├── System Prompt : Gmail only. Do not call other tools.
├── Model : claude-haiku-4-5 (cost reduction)
├── Tools : MCP Client Tool → Gmail MCP Server
└── Output Schema : { "source": "email", "emails": [...], "summary": "..." }
Calendar Agent
├── System Prompt : Google Calendar only.
├── Model : gpt-4o-mini
├── Tools : MCP Client Tool → GCal MCP Server
└── Output Schema : { "source": "calendar", "events": [...], "conflicts": [...] }
Task Agent
├── System Prompt : Linear / Notion only.
├── Model : gpt-4o-mini
├── Tools : MCP Client Tool → Linear MCP + Notion MCP
└── Output Schema : { "source": "task", "created_issues": [...], "pages": [...] }It's a good idea to always include a source field in the Output Schema. The Fan-In node downstream uses this field as the basis for merging each agent's results.
| Component | Reason for Choice |
|---|---|
| Advanced model for orchestrator | Intent classification errors cause full pipeline failure, so accuracy comes first |
| Lightweight model for sub-agents | The domain is fixed, so complex reasoning is unnecessary |
| Enforced structured output | Prevents parsing errors when merging agent results in the Fan-In node |
| MCP Client Tool | Binds only the tools each agent needs → blocks unnecessary tool exposure |
Example 2: MCP Context Reducer Pattern
What if there are 50+ tools when you connect an MCP server to the task agent in Example 1? Honestly, I initially thought "just give it all of them" — until I encountered a situation where GPT-4o repeatedly selected the wrong tools and the entire pipeline fell apart. The more tools there are, the lower the model's selection accuracy, and the more unnecessary tokens are wasted.
The solution proposed by the official n8n template (#4475) is the Context Reducer pattern.
User Request
↓
Preprocessing Agent (Context Reducer)
- From the full tool list, selects only 3–5 tools needed for this request
- Returns the selected tool list as JSON
↓
Execution Agent
- Uses only the minimal tool set provided by the preprocessing agent
- Greatly reduced context size → cost savings + improved accuracytool_manifest Injection Method
{{ $json.tool_manifest }} is the result of querying the tool list from the MCP server connected to the MCP Client Tool. Specifically, an HTTP Request node or the MCP initialization step calls the server's /tools/list endpoint to retrieve tool names and descriptions, serializes them as JSON, sets them as $json.tool_manifest in a Code node, and injects them into the Context Reducer's prompt.
Context Reducer Agent Prompt
You are a tool selector. Given the user's request and the full tool manifest,
select ONLY the tools needed to fulfill this request.
Full tool manifest:
{{ $json.tool_manifest }}
User request:
{{ $json.user_request }}
Respond with:
{
"selected_tools": ["tool_a", "tool_b"],
"reason": "short explanation"
}Combining this pattern with MCP Server Trigger turns the n8n workflow itself into an "intelligent MCP gateway." When an external AI client (such as Claude Desktop) calls it, n8n first reduces the context and responds with an optimized set of tools.
Example 3: Web Research Multi-Agent — Parallel Fan-Out/Fan-In
Research tasks require exploring multiple sources simultaneously, making parallel processing far more effective than a serial agent chain. I remember redesigning the architecture from scratch after initially building this serially and finding the response time was too long.
Research Orchestrator
│ (Topic analysis → Generate 3 search queries → Parallel branching)
│
├── Search Agent (MCP: Brave Search)
│ └── Output: { "source": "brave", "items": [...] }
│
├── Scraping Agent (MCP: Firecrawl)
│ └── Output: { "source": "firecrawl", "content": [...] }
│
└── Vector Search Agent (MCP: Supabase pgvector)
└── Output: { "source": "supabase", "documents": [...] }
│
└──────────────────────┐
↓
Fan-In Node (Merge)
↓
Summarization & Citation Agent
(Integrate results → Generate report with sources)Merging Results in the Fan-In Node
// n8n Code Node (JavaScript)
const results = $input.all();
const merged = {
search_results: results.find(r => r.json.source === 'brave')?.json.items ?? [],
scraped_content: results.find(r => r.json.source === 'firecrawl')?.json.content ?? [],
internal_docs: results.find(r => r.json.source === 'supabase')?.json.documents ?? [],
collected_at: new Date().toISOString()
};
return [{ json: merged }];For this code to work correctly, the source field must be included in each agent's Output Schema. Fixing the values to 'brave', 'firecrawl', and 'supabase' respectively ensures there's no confusion at the Fan-In stage.
The key point is that the orchestrator is responsible only for generating the search queries, while the actual searching, scraping, and vector retrieval are handled completely independently by each agent. Even if one agent fails, the results from the others are preserved.
Pros and Cons
Advantages
| Item | Details |
|---|---|
| Low-code accessibility | Complex multi-agent architectures can be composed visually without code |
| Single canvas | The AI Agent Tool node enables agent nesting without splitting into separate workflows |
| Standard protocol | MCP compliance ensures compatibility with various AI clients including Claude, GPT, and Gemini |
| Cost optimization | Model layering (mixing advanced and lightweight) can significantly reduce token costs |
| Rich integrations | Nearly any external service can be connected via 820 core + 830 community nodes |
| Memory flexibility | Free choice among Redis (short-term), pgvector (long-term RAG), and Window Memory (conversation context) |
Disadvantages and Caveats
| Item | Details | Mitigation |
|---|---|---|
| SSE connection stability | MCP Server Trigger is SSE-based — connection drops can occur in multi-webhook replica environments | Route /mcp* paths to a single webhook replica in Queue Mode |
| Reverse proxy configuration | Proxy buffering is enabled by default in nginx and similar tools, blocking the stream | Setting proxy_buffering off and X-Accel-Buffering: no is mandatory |
| Debugging complexity | Execution tracing becomes harder as agent call depth increases | Include a trace_id field in each agent's output and add an observability layer that logs execution metadata to an external DB like Postgres via a Code node |
| Context contamination | Passing unstructured data between agents increases the risk of LLM hallucination | Fix types by enforcing structured output (JSON Schema) |
| Accumulated latency | In a serial agent chain, each LLM call time adds up | Switch to Fan-Out parallel processing where possible |
SSE (Server-Sent Events): An HTTP-based protocol that maintains a one-way streaming connection from server to client. Because MCP Server Trigger uses this, the stream will be interrupted unless buffering is disabled at the proxy layer.
The Most Common Mistakes in Practice
- Binding tools directly to the orchestrator — The orchestrator should be responsible for "routing" only. If an email tool is attached to the orchestrator, it may call it directly without making a branching decision.
- Using natural language for inter-agent communication — Passing natural language like "check the emails" directly to a sub-agent leads to different interpretations per agent. Fixing types with a JSON Schema is far more reliable.
- Using the same model for all agents — Using GPT-4o for simple tool calls at the execution stage is a waste of money. It is recommended to layer models: advanced models for intent classification, lightweight models for execution.
Closing Thoughts
By combining n8n's AI Agent Tool node with MCP, you can build production-grade multi-agent orchestration in a low-code way.
Three steps you can take right now:
- Open template #4475 in n8n Cloud or a local instance — The
MCP Server with AI Agent as Context Reducertemplate is the fastest starting point for understanding how MCP Client Tool and structured output are actually combined in practice. - Build a minimal pipeline with an orchestrator + 2 sub-agents — Register agents responsible for Gmail MCP and Google Calendar MCP respectively as
AI Agent Toolnodes, and write routing instructions in the orchestrator's system prompt. That's all you need. - Fix each agent's output schema to a JSON Schema — It may seem tedious at first, but once the data flow between agents stabilizes, you'll notice a significant drop in debugging time.
References
- AI Agent Tool Node Official Documentation | n8n Docs
- MCP Client Tool Node Official Documentation | n8n Docs
- MCP Server Trigger Node Official Documentation | n8n Docs
- MCP Server with AI Agent as Context Reducer Workflow Template #4475 | n8n
- Multi-Agent Patterns Using MCP Trigger & Client | n8n Community