LangGraph Supervisor Pattern: How to Stay in Control in a Multi-Agent System
The most common mistake when first designing a multi-agent system is connecting agents loosely under the vague expectation that "they'll figure out how to collaborate." I thought the same thing at first, and the result was always the same: you can't tell where the control flow is, you can't trace where it failed, and debugging inevitably leads you to redesign everything from scratch.
The Supervisor Pattern starts from the opposite direction. It is a centralized control structure where a single orchestrator (Supervisor) decomposes the overall task, delegates to specialized Worker agents, then collects and integrates the results. As of 2026, this pattern has become the de facto default for multi-agent production systems, and LangGraph, CrewAI, and the OpenAI Agents SDK all support it as a first-class concept.
This article covers what the Supervisor Pattern is, how it differs from Swarm and Pipeline, how to actually write the code, and what pitfalls you'll encounter when operating it. By the end, you'll be able to judge for yourself whether your system truly needs a Supervisor or whether a simple Pipeline is sufficient — and you'll be able to avoid the cost of getting that judgment wrong.
Core Concepts
Supervisor, Swarm, Pipeline — A Question of Where the Control Flow Lives
The difference between the three patterns can be summarized in one sentence each:
- Swarm: Agents hand off directly to each other. Control flow is distributed.
- Pipeline: Agents are passed through in a fixed linear order. Control flow is predetermined.
- Supervisor: All control returns to the orchestrator. Control flow is centralized.
User Request
│
▼
┌─────────────┐
│ Supervisor │ ← routing · decomposition · termination decisions
│ (LLM-based) │
└──────┬──────┘
│ delegate
┌────┴────────────────┐
▼ ▼ ▼
Worker A Worker B Worker C
(search) (coding) (validation)
│ │ │
└────┬────┴───────────┘
▼
Aggregate results → return to Supervisor → final responseSwarm looks convenient, but once you have more than four agents, tracking which agent did what and where becomes very difficult. Pipeline works well when the steps are clear, but it hits its limits the moment you need to dynamically decide "what to do next."
Orchestrator: A higher-level component that coordinates the execution of multiple agents. Supervisor is the representative implementation of an LLM-based orchestrator.
Why LLM-Based Routing Instead of Rule-Based?
"If it's just routing, can't you write it with if-else?" is a question I often get. In simple cases, that's correct. But real user requests are mostly not neatly categorized.
A rule-based router only works properly when the input fits precisely into predefined categories. An LLM-based Supervisor can interpret ambiguous requests based on context, dynamically decide the next step based on intermediate results, and make reasonable judgments on unexpected input. Flows like re-delegating when a research agent's results are insufficient, or escalating to a validation agent when a code agent's execution fails, are difficult to hard-code as rules.
Three-Phase Mechanism: Decompose → Delegate → Aggregate
What a Supervisor does breaks down into three broad phases. It sounds simple when described, but the key is that this is a cyclical structure.
- Decomposition: Breaks down the user request into meaningful sub-tasks.
- Delegation: Routes each sub-task to a domain-specialized Worker.
- Aggregation: Receives Worker results and decides "is this sufficient, or do I need to delegate again?"
This final decision loop is what sets a Supervisor apart from a Pipeline. A Pipeline has a fixed A→B→C sequence, but a Supervisor can look at B's result and call A again instead of C.
Hierarchical Scaling: When You Have More Than Six Agents
When there are too many Workers, a single Supervisor becomes hard to manage. At that point, you can consider a 2-level hierarchical structure — placing Sub-Supervisors by domain, like a research sub-team and a writing sub-team, with a top-level Supervisor coordinating them.
Top-Level Supervisor
├── Research Sub-Supervisor
│ ├── Search Worker
│ └── Data Extraction Worker
└── Writing Sub-Supervisor
├── Draft Writing Worker
└── Validation WorkerHowever, if you have three or fewer Workers, this structure is over-abstraction. A sequential Pipeline is simpler and faster. It's worth checking the Worker count threshold before introducing a Supervisor.
Practical Application
Example 1: Building a Basic Supervisor with LangGraph
LangGraph has officially supported the langgraph-supervisor package since late 2024. The create_supervisor() factory function automates topology wiring, so you no longer have to manually connect each graph edge as in the past.
In the code below, search_tool can be replaced with DuckDuckGoSearchRun from langchain_community.tools, and python_repl_tool with PythonREPLTool from langchain_experimental.tools. A fully runnable example can be found in the langchain-ai/langgraph-supervisor-py repository.
from langchain_openai import ChatOpenAI
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_experimental.tools import PythonREPLTool
from langgraph_supervisor import create_supervisor
from langgraph.prebuilt import create_react_agent
search_tool = DuckDuckGoSearchRun()
python_repl_tool = PythonREPLTool()
# Define Worker agents
search_agent = create_react_agent(
model=ChatOpenAI(model="gpt-4o-mini"),
tools=[search_tool],
name="search_agent",
prompt="You are an agent that performs web searches. Return search results in JSON format."
)
code_agent = create_react_agent(
model=ChatOpenAI(model="gpt-4o-mini"),
tools=[python_repl_tool],
name="code_agent",
prompt="You are an agent that writes and executes code. Return the code and its execution results."
)
# Create Supervisor — recommended to use a more powerful model than the Workers
supervisor = create_supervisor(
agents=[search_agent, code_agent],
model=ChatOpenAI(model="gpt-4o"),
prompt=(
"You are a Supervisor coordinating a team. "
"Delegate to search_agent when search is needed, and to code_agent when code writing is needed. "
"Return FINISH when the task is complete."
)
)
app = supervisor.compile()
result = app.invoke({
"messages": [{"role": "user", "content": "Write and run Python code that prints the first 10 Fibonacci numbers."}]
})| Section | Description |
|---|---|
create_react_agent |
Creates a Worker agent that can use Tools via the ReAct pattern of repeated reasoning → action → observation |
create_supervisor |
Takes a list of Workers and creates a Supervisor graph with routing logic |
Separate model |
Supervisor uses gpt-4o, Workers use gpt-4o-mini — the key to distributing costs |
FINISH |
The termination signal specified in the Supervisor prompt. create_supervisor detects this string to exit the routing loop |
Example 2: Building a Hierarchical Supervisor with CrewAI
CrewAI can implement Supervisor behavior with just the Process.hierarchical mode and manager_llm setting. If you're familiar with role-based agent design, the barrier to entry is lower than with LangGraph.
If LangGraph is an approach of "directly designing the graph structure," CrewAI is closer to a declarative approach of "declare the role and goal, and let the framework determine the collaboration method." Neither is inherently better — it comes down to which level of abstraction the team is more comfortable with.
from crewai import Agent, Task, Crew, Process
from langchain_openai import ChatOpenAI
from langchain_community.tools import DuckDuckGoSearchRun
search_tool = DuckDuckGoSearchRun()
# Define Worker agents
researcher = Agent(
role="Research Specialist",
goal="Gather the latest information on a given topic",
backstory="An agent specialized in data collection and analysis.",
llm=ChatOpenAI(model="gpt-4o-mini"),
tools=[search_tool]
)
writer = Agent(
role="Content Writing Specialist",
goal="Write a clear report based on the collected information",
backstory="An agent specialized in technical document writing.",
llm=ChatOpenAI(model="gpt-4o-mini")
)
# Define tasks
research_task = Task(
description="Gather the latest information on AI agent trends",
expected_output="Research results including 5 key trends",
agent=researcher
)
write_task = Task(
description="Write a 500-character summary report based on the research results",
expected_output="A completed summary report",
agent=writer
)
# Activate Supervisor with hierarchical mode
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, write_task],
process=Process.hierarchical, # Manager LLM acts as Supervisor
manager_llm=ChatOpenAI(model="gpt-4o"),
verbose=True
)
result = crew.kickoff()| Section | Description |
|---|---|
Process.hierarchical |
The Supervisor (manager) dynamically determines task order and delegation. Unlike Process.sequential, the flow is not fixed |
manager_llm |
The Manager LLM that acts as Supervisor. It is recommended to assign a high-performance model separately from the Workers |
verbose=True |
Prints the routing decision process to the console. Useful for validating the initial design |
Example 3: Preventing Token Explosion with Context Compression
This is a situation frequently encountered in practice — once sub-agent round trips exceed 10, the Supervisor's context starts to become saturated with Worker result messages. I personally experienced a noticeable drop in Supervisor response quality around the 10-round-trip mark while running a research pipeline. After that, I started passing a summary rather than the full transcript at each handoff point.
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage
summarizer_llm = ChatOpenAI(model="gpt-4o-mini")
def compress_worker_result(worker_output: str, max_chars: int = 500) -> str:
"""Compresses Worker results before passing them to the Supervisor."""
if len(worker_output) <= max_chars:
return worker_output
summary_prompt = f"""
Summarize the following agent execution results to within {max_chars} characters, keeping only the essentials.
Include only figures, conclusions, and information needed for the next steps.
Results:
{worker_output}
"""
response = summarizer_llm.invoke([SystemMessage(content=summary_prompt)])
return response.contentcompress_worker_result is a standalone utility function, so you can verify its behavior independently before attaching it to a Supervisor node. Inside the Supervisor node, iterate through Worker messages, pass them through this function to compress them, and then forward the result to the LLM.
In pipelines where Worker outputs such as search results or code execution logs frequently exceed thousands of characters, this approach can significantly reduce token consumption. Depending on the conditions, reductions of 70–90% are achievable. However, the tradeoff of an added 0.5–1.5 second delay at the summarization step must be accounted for.
Pros and Cons
Advantages
In practice, "traceability" and "quality gates" matter overwhelmingly. Teams that have operated with Swarm without a Supervisor already know how painful it is to trace "where and why something failed." The items below are the benefits that structurally resolve that pain.
| Item | Description |
|---|---|
| Traceability | Since all routing decisions pass through the Supervisor, the entire execution path is easy to trace with LangSmith or OpenTelemetry |
| Quality Gate | The Supervisor reviews Worker results before deciding whether to re-delegate, providing a clear point for controlling intermediate result quality |
| Domain Isolation | Each Worker focuses only on its own domain, simplifying prompts and making individual replacement easy |
| Ease of Debugging | The control flow is simpler than Swarm, making it easier to pinpoint where errors occur |
Disadvantages and Caveats
Looking at the numbers, the disadvantages seem quite numerous, but in practice "bottleneck" and "context accumulation" are by far the most frequent issues. The remaining items can mostly be addressed once during the initial design phase and rarely need attention afterward.
| Item | Description | Mitigation |
|---|---|---|
| Bottleneck | When the Supervisor uses a frontier model, every routing decision incurs the full inference cost and latency | Use cheaper models for Workers, batch routing decisions |
| Single Point of Failure | If the Supervisor goes down, the entire workflow stops | Secure a recovery path with retry logic and state checkpoints |
| Context Accumulation Degradation | Routing accuracy noticeably drops after more than 8–12 sub-agent round trips | Summarize and pass Worker results at each handoff |
| Increased Token Cost | Since all Worker results route through the Supervisor, there are more LLM calls compared to Swarm | Distribute costs with context compression + lightweight Worker models |
Single Point of Failure: A structural vulnerability where a single component failure causes the entire system to stop functioning. In the Supervisor pattern, the Supervisor itself is that point. It is recommended to consider retry logic and state checkpoints from the beginning of design.
Most Common Mistakes in Practice
- Introducing a Supervisor with three or fewer Workers — this is over-engineering. A sequential Pipeline is simpler and faster. It's worth checking the Worker count threshold first.
- Using the same model for both Supervisor and Workers — since the Supervisor's purpose is routing decisions, a high-performance model is advantageous, but distributing costs by using lightweight, domain-specialized models for Workers is effective.
- Passing Worker results to the Supervisor as-is without compression — honestly, almost everyone makes this mistake during the initial build. You only feel the need for context compression after seeing Supervisor response quality noticeably degrade around the 10-round-trip mark.
Closing Thoughts
The Supervisor Pattern replaces the vague design of "the agents will figure out how to collaborate" with a clear structure where "the control flow always returns to one place." Traceability, quality gates, and domain isolation are the three advantages that have made this pattern the production default.
Three steps you can start with right now:
- You can start by installing the packages — after installing with
pip install langgraph-supervisor langchain-openai, search for the "Hierarchical Agent Teams" tutorial in the official LangGraph documentation and run the example code to quickly grasp the overall flow. - You can try splitting an existing single agent into a Supervisor + 2 Workers — if you already have an agent built, try dividing the task into two stages — "information gathering" and "result generation" — separating each into a Worker and restructuring so that a Supervisor coordinates between them, to directly experience the benefits of the pattern.
- You can connect LangSmith to check execution traces — simply setting the
LANGCHAIN_TRACING_V2=trueenvironment variable lets you visually trace the Supervisor's routing decisions and each Worker call. Seeing the trace for the first time gives you a clear "ah, so this is how it flows" realization — and that is the moment you truly understand this pattern.
References
- LangGraph Supervisor Pattern: Orchestrating Multi-Agent Teams in 2026 | CallSphere Blog
- GitHub - langchain-ai/langgraph-supervisor-py
- Hierarchical Agent Teams | LangGraph Official Tutorial
- langgraph-supervisor · PyPI
- How to Use the Supervisor Pattern for Multi-Agent Voice AI Systems | LiveKit
- Supervisor pattern | LiveKit Documentation
- Multi-Agent AI Orchestration Patterns: Production Guide | Lushbinary
- Multi-Agent Orchestration in LangGraph: Supervisor vs Swarm, Tradeoffs and Architecture
- Swarm vs. Supervisor: Multi-Agent Architecture Guide | Augment Code
- Agent system design patterns | Databricks on AWS
- Multi-Agent collaboration patterns with Strands Agents and Amazon Nova | AWS Blog
- Multi-Agent in Production 2026: 3 Patterns That Survived
- Architecting efficient context-aware multi-agent framework for production | Google Developers Blog
- LangGraph Multi-Agent Collaboration in Practice: Supervisor Pattern and Task Dispatch
- The Multi-Agent Trap | Towards Data Science