LangGraph Supervisor Pattern: How to Stay in Control in a Multi-Agent System

The most common mistake when first designing a multi-agent system is connecting agents loosely under the vague expectation that "they'll figure out how to collaborate." I thought the same thing at first, and the result was always the same: you can't tell where the control flow is, you can't trace where it failed, and debugging inevitably leads you to redesign everything from scratch.

The Supervisor Pattern starts from the opposite direction. It is a centralized control structure where a single orchestrator (Supervisor) decomposes the overall task, delegates to specialized Worker agents, then collects and integrates the results. As of 2026, this pattern has become the de facto default for multi-agent production systems, and LangGraph, CrewAI, and the OpenAI Agents SDK all support it as a first-class concept.

This article covers what the Supervisor Pattern is, how it differs from Swarm and Pipeline, how to actually write the code, and what pitfalls you'll encounter when operating it. By the end, you'll be able to judge for yourself whether your system truly needs a Supervisor or whether a simple Pipeline is sufficient — and you'll be able to avoid the cost of getting that judgment wrong.

Core Concepts

Supervisor, Swarm, Pipeline — A Question of Where the Control Flow Lives

The difference between the three patterns can be summarized in one sentence each:

Swarm: Agents hand off directly to each other. Control flow is distributed.
Pipeline: Agents are passed through in a fixed linear order. Control flow is predetermined.
Supervisor: All control returns to the orchestrator. Control flow is centralized.

User Request
    │
    ▼
┌─────────────┐
│  Supervisor  │  ← routing · decomposition · termination decisions
│  (LLM-based) │
└──────┬──────┘
       │  delegate
  ┌────┴────────────────┐
  ▼         ▼           ▼
Worker A  Worker B   Worker C
(search)  (coding)  (validation)
  │         │           │
  └────┬────┴───────────┘
       ▼
  Aggregate results → return to Supervisor → final response

Swarm looks convenient, but once you have more than four agents, tracking which agent did what and where becomes very difficult. Pipeline works well when the steps are clear, but it hits its limits the moment you need to dynamically decide "what to do next."

Orchestrator: A higher-level component that coordinates the execution of multiple agents. Supervisor is the representative implementation of an LLM-based orchestrator.

Why LLM-Based Routing Instead of Rule-Based?

"If it's just routing, can't you write it with if-else?" is a question I often get. In simple cases, that's correct. But real user requests are mostly not neatly categorized.

A rule-based router only works properly when the input fits precisely into predefined categories. An LLM-based Supervisor can interpret ambiguous requests based on context, dynamically decide the next step based on intermediate results, and make reasonable judgments on unexpected input. Flows like re-delegating when a research agent's results are insufficient, or escalating to a validation agent when a code agent's execution fails, are difficult to hard-code as rules.

Three-Phase Mechanism: Decompose → Delegate → Aggregate

What a Supervisor does breaks down into three broad phases. It sounds simple when described, but the key is that this is a cyclical structure.

Decomposition: Breaks down the user request into meaningful sub-tasks.
Delegation: Routes each sub-task to a domain-specialized Worker.
Aggregation: Receives Worker results and decides "is this sufficient, or do I need to delegate again?"

This final decision loop is what sets a Supervisor apart from a Pipeline. A Pipeline has a fixed A→B→C sequence, but a Supervisor can look at B's result and call A again instead of C.

Hierarchical Scaling: When You Have More Than Six Agents

When there are too many Workers, a single Supervisor becomes hard to manage. At that point, you can consider a 2-level hierarchical structure — placing Sub-Supervisors by domain, like a research sub-team and a writing sub-team, with a top-level Supervisor coordinating them.

Top-Level Supervisor
  ├── Research Sub-Supervisor
  │     ├── Search Worker
  │     └── Data Extraction Worker
  └── Writing Sub-Supervisor
        ├── Draft Writing Worker
        └── Validation Worker

However, if you have three or fewer Workers, this structure is over-abstraction. A sequential Pipeline is simpler and faster. It's worth checking the Worker count threshold before introducing a Supervisor.

Practical Application

Example 1: Building a Basic Supervisor with LangGraph

LangGraph has officially supported the langgraph-supervisor package since late 2024. The create_supervisor() factory function automates topology wiring, so you no longer have to manually connect each graph edge as in the past.

In the code below, search_tool can be replaced with DuckDuckGoSearchRun from langchain_community.tools, and python_repl_tool with PythonREPLTool from langchain_experimental.tools. A fully runnable example can be found in the langchain-ai/langgraph-supervisor-py repository.

python

from langchain_openai import ChatOpenAI
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_experimental.tools import PythonREPLTool
from langgraph_supervisor import create_supervisor
from langgraph.prebuilt import create_react_agent
 
search_tool = DuckDuckGoSearchRun()
python_repl_tool = PythonREPLTool()
 
# Define Worker agents
search_agent = create_react_agent(
    model=ChatOpenAI(model="gpt-4o-mini"),
    tools=[search_tool],
    name="search_agent",
    prompt="You are an agent that performs web searches. Return search results in JSON format."
)
 
code_agent = create_react_agent(
    model=ChatOpenAI(model="gpt-4o-mini"),
    tools=[python_repl_tool],
    name="code_agent",
    prompt="You are an agent that writes and executes code. Return the code and its execution results."
)
 
# Create Supervisor — recommended to use a more powerful model than the Workers
supervisor = create_supervisor(
    agents=[search_agent, code_agent],
    model=ChatOpenAI(model="gpt-4o"),
    prompt=(
        "You are a Supervisor coordinating a team. "
        "Delegate to search_agent when search is needed, and to code_agent when code writing is needed. "
        "Return FINISH when the task is complete."
    )
)
 
app = supervisor.compile()
 
result = app.invoke({
    "messages": [{"role": "user", "content": "Write and run Python code that prints the first 10 Fibonacci numbers."}]
})

Section	Description
`create_react_agent`	Creates a Worker agent that can use Tools via the ReAct pattern of repeated reasoning → action → observation
`create_supervisor`	Takes a list of Workers and creates a Supervisor graph with routing logic
Separate `model`	Supervisor uses `gpt-4o`, Workers use `gpt-4o-mini` — the key to distributing costs
`FINISH`	The termination signal specified in the Supervisor prompt. `create_supervisor` detects this string to exit the routing loop

Example 2: Building a Hierarchical Supervisor with CrewAI

CrewAI can implement Supervisor behavior with just the Process.hierarchical mode and manager_llm setting. If you're familiar with role-based agent design, the barrier to entry is lower than with LangGraph.

If LangGraph is an approach of "directly designing the graph structure," CrewAI is closer to a declarative approach of "declare the role and goal, and let the framework determine the collaboration method." Neither is inherently better — it comes down to which level of abstraction the team is more comfortable with.

python

from crewai import Agent, Task, Crew, Process
from langchain_openai import ChatOpenAI
from langchain_community.tools import DuckDuckGoSearchRun
 
search_tool = DuckDuckGoSearchRun()
 
# Define Worker agents
researcher = Agent(
    role="Research Specialist",
    goal="Gather the latest information on a given topic",
    backstory="An agent specialized in data collection and analysis.",
    llm=ChatOpenAI(model="gpt-4o-mini"),
    tools=[search_tool]
)
 
writer = Agent(
    role="Content Writing Specialist",
    goal="Write a clear report based on the collected information",
    backstory="An agent specialized in technical document writing.",
    llm=ChatOpenAI(model="gpt-4o-mini")
)
 
# Define tasks
research_task = Task(
    description="Gather the latest information on AI agent trends",
    expected_output="Research results including 5 key trends",
    agent=researcher
)
 
write_task = Task(
    description="Write a 500-character summary report based on the research results",
    expected_output="A completed summary report",
    agent=writer
)
 
# Activate Supervisor with hierarchical mode
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, write_task],
    process=Process.hierarchical,   # Manager LLM acts as Supervisor
    manager_llm=ChatOpenAI(model="gpt-4o"),
    verbose=True
)
 
result = crew.kickoff()

Section	Description
`Process.hierarchical`	The Supervisor (manager) dynamically determines task order and delegation. Unlike `Process.sequential`, the flow is not fixed
`manager_llm`	The Manager LLM that acts as Supervisor. It is recommended to assign a high-performance model separately from the Workers
`verbose=True`	Prints the routing decision process to the console. Useful for validating the initial design

Example 3: Preventing Token Explosion with Context Compression

This is a situation frequently encountered in practice — once sub-agent round trips exceed 10, the Supervisor's context starts to become saturated with Worker result messages. I personally experienced a noticeable drop in Supervisor response quality around the 10-round-trip mark while running a research pipeline. After that, I started passing a summary rather than the full transcript at each handoff point.

python

from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage
 
summarizer_llm = ChatOpenAI(model="gpt-4o-mini")
 
def compress_worker_result(worker_output: str, max_chars: int = 500) -> str:
    """Compresses Worker results before passing them to the Supervisor."""
    if len(worker_output) <= max_chars:
        return worker_output
 
    summary_prompt = f"""
Summarize the following agent execution results to within {max_chars} characters, keeping only the essentials.
Include only figures, conclusions, and information needed for the next steps.
 
Results:
{worker_output}
"""
    response = summarizer_llm.invoke([SystemMessage(content=summary_prompt)])
    return response.content

compress_worker_result is a standalone utility function, so you can verify its behavior independently before attaching it to a Supervisor node. Inside the Supervisor node, iterate through Worker messages, pass them through this function to compress them, and then forward the result to the LLM.

In pipelines where Worker outputs such as search results or code execution logs frequently exceed thousands of characters, this approach can significantly reduce token consumption. Depending on the conditions, reductions of 70–90% are achievable. However, the tradeoff of an added 0.5–1.5 second delay at the summarization step must be accounted for.

Pros and Cons

Advantages

In practice, "traceability" and "quality gates" matter overwhelmingly. Teams that have operated with Swarm without a Supervisor already know how painful it is to trace "where and why something failed." The items below are the benefits that structurally resolve that pain.

Item	Description
Traceability	Since all routing decisions pass through the Supervisor, the entire execution path is easy to trace with LangSmith or OpenTelemetry
Quality Gate	The Supervisor reviews Worker results before deciding whether to re-delegate, providing a clear point for controlling intermediate result quality
Domain Isolation	Each Worker focuses only on its own domain, simplifying prompts and making individual replacement easy
Ease of Debugging	The control flow is simpler than Swarm, making it easier to pinpoint where errors occur

Disadvantages and Caveats

Looking at the numbers, the disadvantages seem quite numerous, but in practice "bottleneck" and "context accumulation" are by far the most frequent issues. The remaining items can mostly be addressed once during the initial design phase and rarely need attention afterward.

Item	Description	Mitigation
Bottleneck	When the Supervisor uses a frontier model, every routing decision incurs the full inference cost and latency	Use cheaper models for Workers, batch routing decisions
Single Point of Failure	If the Supervisor goes down, the entire workflow stops	Secure a recovery path with retry logic and state checkpoints
Context Accumulation Degradation	Routing accuracy noticeably drops after more than 8–12 sub-agent round trips	Summarize and pass Worker results at each handoff
Increased Token Cost	Since all Worker results route through the Supervisor, there are more LLM calls compared to Swarm	Distribute costs with context compression + lightweight Worker models

Single Point of Failure: A structural vulnerability where a single component failure causes the entire system to stop functioning. In the Supervisor pattern, the Supervisor itself is that point. It is recommended to consider retry logic and state checkpoints from the beginning of design.

Most Common Mistakes in Practice

Introducing a Supervisor with three or fewer Workers — this is over-engineering. A sequential Pipeline is simpler and faster. It's worth checking the Worker count threshold first.
Using the same model for both Supervisor and Workers — since the Supervisor's purpose is routing decisions, a high-performance model is advantageous, but distributing costs by using lightweight, domain-specialized models for Workers is effective.
Passing Worker results to the Supervisor as-is without compression — honestly, almost everyone makes this mistake during the initial build. You only feel the need for context compression after seeing Supervisor response quality noticeably degrade around the 10-round-trip mark.

Closing Thoughts

The Supervisor Pattern replaces the vague design of "the agents will figure out how to collaborate" with a clear structure where "the control flow always returns to one place." Traceability, quality gates, and domain isolation are the three advantages that have made this pattern the production default.

Three steps you can start with right now:

You can start by installing the packages — after installing with pip install langgraph-supervisor langchain-openai, search for the "Hierarchical Agent Teams" tutorial in the official LangGraph documentation and run the example code to quickly grasp the overall flow.
You can try splitting an existing single agent into a Supervisor + 2 Workers — if you already have an agent built, try dividing the task into two stages — "information gathering" and "result generation" — separating each into a Worker and restructuring so that a Supervisor coordinates between them, to directly experience the benefits of the pattern.
You can connect LangSmith to check execution traces — simply setting the LANGCHAIN_TRACING_V2=true environment variable lets you visually trace the Supervisor's routing decisions and each Worker call. Seeing the trace for the first time gives you a clear "ah, so this is how it flows" realization — and that is the moment you truly understand this pattern.

References

#LangGraph#멀티에이전트#SupervisorPattern#CrewAI#LLM오케스트레이션#ReAct#Python#LangChain#컨텍스트압축#OpenAI

LangGraph Supervisor Pattern: How to Stay in Control in a Multi-Agent System

Core Concepts

Supervisor, Swarm, Pipeline — A Question of Where the Control Flow Lives

The difference between the three patterns can be summarized in one sentence each:

Swarm: Agents hand off directly to each other. Control flow is distributed.
Pipeline: Agents are passed through in a fixed linear order. Control flow is predetermined.
Supervisor: All control returns to the orchestrator. Control flow is centralized.

User Request
    │
    ▼
┌─────────────┐
│  Supervisor  │  ← routing · decomposition · termination decisions
│  (LLM-based) │
└──────┬──────┘
       │  delegate
  ┌────┴────────────────┐
  ▼         ▼           ▼
Worker A  Worker B   Worker C
(search)  (coding)  (validation)
  │         │           │
  └────┬────┴───────────┘
       ▼
  Aggregate results → return to Supervisor → final response

Orchestrator: A higher-level component that coordinates the execution of multiple agents. Supervisor is the representative implementation of an LLM-based orchestrator.

Why LLM-Based Routing Instead of Rule-Based?

"If it's just routing, can't you write it with if-else?" is a question I often get. In simple cases, that's correct. But real user requests are mostly not neatly categorized.

Three-Phase Mechanism: Decompose → Delegate → Aggregate

What a Supervisor does breaks down into three broad phases. It sounds simple when described, but the key is that this is a cyclical structure.

Decomposition: Breaks down the user request into meaningful sub-tasks.
Delegation: Routes each sub-task to a domain-specialized Worker.
Aggregation: Receives Worker results and decides "is this sufficient, or do I need to delegate again?"

This final decision loop is what sets a Supervisor apart from a Pipeline. A Pipeline has a fixed A→B→C sequence, but a Supervisor can look at B's result and call A again instead of C.

Hierarchical Scaling: When You Have More Than Six Agents

Top-Level Supervisor
  ├── Research Sub-Supervisor
  │     ├── Search Worker
  │     └── Data Extraction Worker
  └── Writing Sub-Supervisor
        ├── Draft Writing Worker
        └── Validation Worker

Practical Application

Example 1: Building a Basic Supervisor with LangGraph

python

from langchain_openai import ChatOpenAI
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_experimental.tools import PythonREPLTool
from langgraph_supervisor import create_supervisor
from langgraph.prebuilt import create_react_agent
 
search_tool = DuckDuckGoSearchRun()
python_repl_tool = PythonREPLTool()
 
# Define Worker agents
search_agent = create_react_agent(
    model=ChatOpenAI(model="gpt-4o-mini"),
    tools=[search_tool],
    name="search_agent",
    prompt="You are an agent that performs web searches. Return search results in JSON format."
)
 
code_agent = create_react_agent(
    model=ChatOpenAI(model="gpt-4o-mini"),
    tools=[python_repl_tool],
    name="code_agent",
    prompt="You are an agent that writes and executes code. Return the code and its execution results."
)
 
# Create Supervisor — recommended to use a more powerful model than the Workers
supervisor = create_supervisor(
    agents=[search_agent, code_agent],
    model=ChatOpenAI(model="gpt-4o"),
    prompt=(
        "You are a Supervisor coordinating a team. "
        "Delegate to search_agent when search is needed, and to code_agent when code writing is needed. "
        "Return FINISH when the task is complete."
    )
)
 
app = supervisor.compile()
 
result = app.invoke({
    "messages": [{"role": "user", "content": "Write and run Python code that prints the first 10 Fibonacci numbers."}]
})

Section	Description
`create_react_agent`	Creates a Worker agent that can use Tools via the ReAct pattern of repeated reasoning → action → observation
`create_supervisor`	Takes a list of Workers and creates a Supervisor graph with routing logic
Separate `model`	Supervisor uses `gpt-4o`, Workers use `gpt-4o-mini` — the key to distributing costs
`FINISH`	The termination signal specified in the Supervisor prompt. `create_supervisor` detects this string to exit the routing loop

Example 2: Building a Hierarchical Supervisor with CrewAI

python

from crewai import Agent, Task, Crew, Process
from langchain_openai import ChatOpenAI
from langchain_community.tools import DuckDuckGoSearchRun
 
search_tool = DuckDuckGoSearchRun()
 
# Define Worker agents
researcher = Agent(
    role="Research Specialist",
    goal="Gather the latest information on a given topic",
    backstory="An agent specialized in data collection and analysis.",
    llm=ChatOpenAI(model="gpt-4o-mini"),
    tools=[search_tool]
)
 
writer = Agent(
    role="Content Writing Specialist",
    goal="Write a clear report based on the collected information",
    backstory="An agent specialized in technical document writing.",
    llm=ChatOpenAI(model="gpt-4o-mini")
)
 
# Define tasks
research_task = Task(
    description="Gather the latest information on AI agent trends",
    expected_output="Research results including 5 key trends",
    agent=researcher
)
 
write_task = Task(
    description="Write a 500-character summary report based on the research results",
    expected_output="A completed summary report",
    agent=writer
)
 
# Activate Supervisor with hierarchical mode
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, write_task],
    process=Process.hierarchical,   # Manager LLM acts as Supervisor
    manager_llm=ChatOpenAI(model="gpt-4o"),
    verbose=True
)
 
result = crew.kickoff()

Section	Description
`Process.hierarchical`	The Supervisor (manager) dynamically determines task order and delegation. Unlike `Process.sequential`, the flow is not fixed
`manager_llm`	The Manager LLM that acts as Supervisor. It is recommended to assign a high-performance model separately from the Workers
`verbose=True`	Prints the routing decision process to the console. Useful for validating the initial design

Example 3: Preventing Token Explosion with Context Compression

python

from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage
 
summarizer_llm = ChatOpenAI(model="gpt-4o-mini")
 
def compress_worker_result(worker_output: str, max_chars: int = 500) -> str:
    """Compresses Worker results before passing them to the Supervisor."""
    if len(worker_output) <= max_chars:
        return worker_output
 
    summary_prompt = f"""
Summarize the following agent execution results to within {max_chars} characters, keeping only the essentials.
Include only figures, conclusions, and information needed for the next steps.
 
Results:
{worker_output}
"""
    response = summarizer_llm.invoke([SystemMessage(content=summary_prompt)])
    return response.content

Pros and Cons

Advantages

Item	Description
Traceability	Since all routing decisions pass through the Supervisor, the entire execution path is easy to trace with LangSmith or OpenTelemetry
Quality Gate	The Supervisor reviews Worker results before deciding whether to re-delegate, providing a clear point for controlling intermediate result quality
Domain Isolation	Each Worker focuses only on its own domain, simplifying prompts and making individual replacement easy
Ease of Debugging	The control flow is simpler than Swarm, making it easier to pinpoint where errors occur

Disadvantages and Caveats

Item	Description	Mitigation
Bottleneck	When the Supervisor uses a frontier model, every routing decision incurs the full inference cost and latency	Use cheaper models for Workers, batch routing decisions
Single Point of Failure	If the Supervisor goes down, the entire workflow stops	Secure a recovery path with retry logic and state checkpoints
Context Accumulation Degradation	Routing accuracy noticeably drops after more than 8–12 sub-agent round trips	Summarize and pass Worker results at each handoff
Increased Token Cost	Since all Worker results route through the Supervisor, there are more LLM calls compared to Swarm	Distribute costs with context compression + lightweight Worker models

Single Point of Failure: A structural vulnerability where a single component failure causes the entire system to stop functioning. In the Supervisor pattern, the Supervisor itself is that point. It is recommended to consider retry logic and state checkpoints from the beginning of design.

Most Common Mistakes in Practice

Introducing a Supervisor with three or fewer Workers — this is over-engineering. A sequential Pipeline is simpler and faster. It's worth checking the Worker count threshold first.
Using the same model for both Supervisor and Workers — since the Supervisor's purpose is routing decisions, a high-performance model is advantageous, but distributing costs by using lightweight, domain-specialized models for Workers is effective.
Passing Worker results to the Supervisor as-is without compression — honestly, almost everyone makes this mistake during the initial build. You only feel the need for context compression after seeing Supervisor response quality noticeably degrade around the 10-round-trip mark.

Closing Thoughts

Three steps you can start with right now:

You can start by installing the packages — after installing with pip install langgraph-supervisor langchain-openai, search for the "Hierarchical Agent Teams" tutorial in the official LangGraph documentation and run the example code to quickly grasp the overall flow.
You can try splitting an existing single agent into a Supervisor + 2 Workers — if you already have an agent built, try dividing the task into two stages — "information gathering" and "result generation" — separating each into a Worker and restructuring so that a Supervisor coordinates between them, to directly experience the benefits of the pattern.
You can connect LangSmith to check execution traces — simply setting the LANGCHAIN_TRACING_V2=true environment variable lets you visually trace the Supervisor's routing decisions and each Worker call. Seeing the trace for the first time gives you a clear "ah, so this is how it flows" realization — and that is the moment you truly understand this pattern.

References

#LangGraph#멀티에이전트#SupervisorPattern#CrewAI#LLM오케스트레이션#ReAct#Python#LangChain#컨텍스트압축#OpenAI

Core Concepts

Supervisor, Swarm, Pipeline — A Question of Where the Control Flow Lives

Why LLM-Based Routing Instead of Rule-Based?

Three-Phase Mechanism: Decompose → Delegate → Aggregate

Hierarchical Scaling: When You Have More Than Six Agents

Practical Application

Example 1: Building a Basic Supervisor with LangGraph

Example 2: Building a Hierarchical Supervisor with CrewAI

Example 3: Preventing Token Explosion with Context Compression

Pros and Cons

Advantages

Disadvantages and Caveats

Most Common Mistakes in Practice

Closing Thoughts

References

Core Concepts

Supervisor, Swarm, Pipeline — A Question of Where the Control Flow Lives

Why LLM-Based Routing Instead of Rule-Based?

Three-Phase Mechanism: Decompose → Delegate → Aggregate

Hierarchical Scaling: When You Have More Than Six Agents

Practical Application

Example 1: Building a Basic Supervisor with LangGraph

Example 2: Building a Hierarchical Supervisor with CrewAI

Example 3: Preventing Token Explosion with Context Compression

Pros and Cons

Advantages

Disadvantages and Caveats

Most Common Mistakes in Practice

Closing Thoughts

References

Recommended Posts

Comparing Long-Term Memory for AI Agents: Mem0 vs Letta vs Zep — Three Philosophies and How to Choose

Building a Multimodal RAG Pipeline: Making LLMs Understand Images and Tables

Building LLM Tracing with OpenTelemetry: Tracking RAG and Multi-Agent Flows with the gen_ai Standard

Why 88% of AI Agents Fail in Production: The 5-Layer Harness Architecture Is the Answer

FP4 Quantization + Blackwell GPU: Conditions for 4× Throughput over H100 and When Not to Use It

XGrammar-2: The Design Principles Behind 80x Faster Structured Output