Privacy Policy© 2026 DEV BAK - TECH BLOG. All rights reserved.
DEV BAK - TECH BLOG
AI

LangGraph Supervisor Pattern: How to Stay in Control in a Multi-Agent System

The most common mistake when first designing a multi-agent system is connecting agents loosely under the vague expectation that "they'll figure out how to collaborate." I thought the same thing at first, and the result was always the same: you can't tell where the control flow is, you can't trace where it failed, and debugging inevitably leads you to redesign everything from scratch.

The Supervisor Pattern starts from the opposite direction. It is a centralized control structure where a single orchestrator (Supervisor) decomposes the overall task, delegates to specialized Worker agents, then collects and integrates the results. As of 2026, this pattern has become the de facto default for multi-agent production systems, and LangGraph, CrewAI, and the OpenAI Agents SDK all support it as a first-class concept.

This article covers what the Supervisor Pattern is, how it differs from Swarm and Pipeline, how to actually write the code, and what pitfalls you'll encounter when operating it. By the end, you'll be able to judge for yourself whether your system truly needs a Supervisor or whether a simple Pipeline is sufficient — and you'll be able to avoid the cost of getting that judgment wrong.


Core Concepts

Supervisor, Swarm, Pipeline — A Question of Where the Control Flow Lives

The difference between the three patterns can be summarized in one sentence each:

  • Swarm: Agents hand off directly to each other. Control flow is distributed.
  • Pipeline: Agents are passed through in a fixed linear order. Control flow is predetermined.
  • Supervisor: All control returns to the orchestrator. Control flow is centralized.
User Request
    │
    ▼
┌─────────────┐
│  Supervisor  │  ← routing · decomposition · termination decisions
│  (LLM-based) │
└──────┬──────┘
       │  delegate
  ┌────┴────────────────┐
  ▼         ▼           ▼
Worker A  Worker B   Worker C
(search)  (coding)  (validation)
  │         │           │
  └────┬────┴───────────┘
       ▼
  Aggregate results → return to Supervisor → final response

Swarm looks convenient, but once you have more than four agents, tracking which agent did what and where becomes very difficult. Pipeline works well when the steps are clear, but it hits its limits the moment you need to dynamically decide "what to do next."

Orchestrator: A higher-level component that coordinates the execution of multiple agents. Supervisor is the representative implementation of an LLM-based orchestrator.

Why LLM-Based Routing Instead of Rule-Based?

"If it's just routing, can't you write it with if-else?" is a question I often get. In simple cases, that's correct. But real user requests are mostly not neatly categorized.

A rule-based router only works properly when the input fits precisely into predefined categories. An LLM-based Supervisor can interpret ambiguous requests based on context, dynamically decide the next step based on intermediate results, and make reasonable judgments on unexpected input. Flows like re-delegating when a research agent's results are insufficient, or escalating to a validation agent when a code agent's execution fails, are difficult to hard-code as rules.

Three-Phase Mechanism: Decompose → Delegate → Aggregate

What a Supervisor does breaks down into three broad phases. It sounds simple when described, but the key is that this is a cyclical structure.

  • Decomposition: Breaks down the user request into meaningful sub-tasks.
  • Delegation: Routes each sub-task to a domain-specialized Worker.
  • Aggregation: Receives Worker results and decides "is this sufficient, or do I need to delegate again?"

This final decision loop is what sets a Supervisor apart from a Pipeline. A Pipeline has a fixed A→B→C sequence, but a Supervisor can look at B's result and call A again instead of C.

Hierarchical Scaling: When You Have More Than Six Agents

When there are too many Workers, a single Supervisor becomes hard to manage. At that point, you can consider a 2-level hierarchical structure — placing Sub-Supervisors by domain, like a research sub-team and a writing sub-team, with a top-level Supervisor coordinating them.

Top-Level Supervisor
  ├── Research Sub-Supervisor
  │     ├── Search Worker
  │     └── Data Extraction Worker
  └── Writing Sub-Supervisor
        ├── Draft Writing Worker
        └── Validation Worker

However, if you have three or fewer Workers, this structure is over-abstraction. A sequential Pipeline is simpler and faster. It's worth checking the Worker count threshold before introducing a Supervisor.


Practical Application

Example 1: Building a Basic Supervisor with LangGraph

LangGraph has officially supported the langgraph-supervisor package since late 2024. The create_supervisor() factory function automates topology wiring, so you no longer have to manually connect each graph edge as in the past.

In the code below, search_tool can be replaced with DuckDuckGoSearchRun from langchain_community.tools, and python_repl_tool with PythonREPLTool from langchain_experimental.tools. A fully runnable example can be found in the langchain-ai/langgraph-supervisor-py repository.

python
from langchain_openai import ChatOpenAI
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_experimental.tools import PythonREPLTool
from langgraph_supervisor import create_supervisor
from langgraph.prebuilt import create_react_agent
 
search_tool = DuckDuckGoSearchRun()
python_repl_tool = PythonREPLTool()
 
# Define Worker agents
search_agent = create_react_agent(
    model=ChatOpenAI(model="gpt-4o-mini"),
    tools=[search_tool],
    name="search_agent",
    prompt="You are an agent that performs web searches. Return search results in JSON format."
)
 
code_agent = create_react_agent(
    model=ChatOpenAI(model="gpt-4o-mini"),
    tools=[python_repl_tool],
    name="code_agent",
    prompt="You are an agent that writes and executes code. Return the code and its execution results."
)
 
# Create Supervisor — recommended to use a more powerful model than the Workers
supervisor = create_supervisor(
    agents=[search_agent, code_agent],
    model=ChatOpenAI(model="gpt-4o"),
    prompt=(
        "You are a Supervisor coordinating a team. "
        "Delegate to search_agent when search is needed, and to code_agent when code writing is needed. "
        "Return FINISH when the task is complete."
    )
)
 
app = supervisor.compile()
 
result = app.invoke({
    "messages": [{"role": "user", "content": "Write and run Python code that prints the first 10 Fibonacci numbers."}]
})
Section Description
create_react_agent Creates a Worker agent that can use Tools via the ReAct pattern of repeated reasoning → action → observation
create_supervisor Takes a list of Workers and creates a Supervisor graph with routing logic
Separate model Supervisor uses gpt-4o, Workers use gpt-4o-mini — the key to distributing costs
FINISH The termination signal specified in the Supervisor prompt. create_supervisor detects this string to exit the routing loop

Example 2: Building a Hierarchical Supervisor with CrewAI

CrewAI can implement Supervisor behavior with just the Process.hierarchical mode and manager_llm setting. If you're familiar with role-based agent design, the barrier to entry is lower than with LangGraph.

If LangGraph is an approach of "directly designing the graph structure," CrewAI is closer to a declarative approach of "declare the role and goal, and let the framework determine the collaboration method." Neither is inherently better — it comes down to which level of abstraction the team is more comfortable with.

python
from crewai import Agent, Task, Crew, Process
from langchain_openai import ChatOpenAI
from langchain_community.tools import DuckDuckGoSearchRun
 
search_tool = DuckDuckGoSearchRun()
 
# Define Worker agents
researcher = Agent(
    role="Research Specialist",
    goal="Gather the latest information on a given topic",
    backstory="An agent specialized in data collection and analysis.",
    llm=ChatOpenAI(model="gpt-4o-mini"),
    tools=[search_tool]
)
 
writer = Agent(
    role="Content Writing Specialist",
    goal="Write a clear report based on the collected information",
    backstory="An agent specialized in technical document writing.",
    llm=ChatOpenAI(model="gpt-4o-mini")
)
 
# Define tasks
research_task = Task(
    description="Gather the latest information on AI agent trends",
    expected_output="Research results including 5 key trends",
    agent=researcher
)
 
write_task = Task(
    description="Write a 500-character summary report based on the research results",
    expected_output="A completed summary report",
    agent=writer
)
 
# Activate Supervisor with hierarchical mode
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, write_task],
    process=Process.hierarchical,   # Manager LLM acts as Supervisor
    manager_llm=ChatOpenAI(model="gpt-4o"),
    verbose=True
)
 
result = crew.kickoff()
Section Description
Process.hierarchical The Supervisor (manager) dynamically determines task order and delegation. Unlike Process.sequential, the flow is not fixed
manager_llm The Manager LLM that acts as Supervisor. It is recommended to assign a high-performance model separately from the Workers
verbose=True Prints the routing decision process to the console. Useful for validating the initial design

Example 3: Preventing Token Explosion with Context Compression

This is a situation frequently encountered in practice — once sub-agent round trips exceed 10, the Supervisor's context starts to become saturated with Worker result messages. I personally experienced a noticeable drop in Supervisor response quality around the 10-round-trip mark while running a research pipeline. After that, I started passing a summary rather than the full transcript at each handoff point.

python
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage
 
summarizer_llm = ChatOpenAI(model="gpt-4o-mini")
 
def compress_worker_result(worker_output: str, max_chars: int = 500) -> str:
    """Compresses Worker results before passing them to the Supervisor."""
    if len(worker_output) <= max_chars:
        return worker_output
 
    summary_prompt = f"""
Summarize the following agent execution results to within {max_chars} characters, keeping only the essentials.
Include only figures, conclusions, and information needed for the next steps.
 
Results:
{worker_output}
"""
    response = summarizer_llm.invoke([SystemMessage(content=summary_prompt)])
    return response.content

compress_worker_result is a standalone utility function, so you can verify its behavior independently before attaching it to a Supervisor node. Inside the Supervisor node, iterate through Worker messages, pass them through this function to compress them, and then forward the result to the LLM.

In pipelines where Worker outputs such as search results or code execution logs frequently exceed thousands of characters, this approach can significantly reduce token consumption. Depending on the conditions, reductions of 70–90% are achievable. However, the tradeoff of an added 0.5–1.5 second delay at the summarization step must be accounted for.


Pros and Cons

Advantages

In practice, "traceability" and "quality gates" matter overwhelmingly. Teams that have operated with Swarm without a Supervisor already know how painful it is to trace "where and why something failed." The items below are the benefits that structurally resolve that pain.

Item Description
Traceability Since all routing decisions pass through the Supervisor, the entire execution path is easy to trace with LangSmith or OpenTelemetry
Quality Gate The Supervisor reviews Worker results before deciding whether to re-delegate, providing a clear point for controlling intermediate result quality
Domain Isolation Each Worker focuses only on its own domain, simplifying prompts and making individual replacement easy
Ease of Debugging The control flow is simpler than Swarm, making it easier to pinpoint where errors occur

Disadvantages and Caveats

Looking at the numbers, the disadvantages seem quite numerous, but in practice "bottleneck" and "context accumulation" are by far the most frequent issues. The remaining items can mostly be addressed once during the initial design phase and rarely need attention afterward.

Item Description Mitigation
Bottleneck When the Supervisor uses a frontier model, every routing decision incurs the full inference cost and latency Use cheaper models for Workers, batch routing decisions
Single Point of Failure If the Supervisor goes down, the entire workflow stops Secure a recovery path with retry logic and state checkpoints
Context Accumulation Degradation Routing accuracy noticeably drops after more than 8–12 sub-agent round trips Summarize and pass Worker results at each handoff
Increased Token Cost Since all Worker results route through the Supervisor, there are more LLM calls compared to Swarm Distribute costs with context compression + lightweight Worker models

Single Point of Failure: A structural vulnerability where a single component failure causes the entire system to stop functioning. In the Supervisor pattern, the Supervisor itself is that point. It is recommended to consider retry logic and state checkpoints from the beginning of design.

Most Common Mistakes in Practice

  1. Introducing a Supervisor with three or fewer Workers — this is over-engineering. A sequential Pipeline is simpler and faster. It's worth checking the Worker count threshold first.
  2. Using the same model for both Supervisor and Workers — since the Supervisor's purpose is routing decisions, a high-performance model is advantageous, but distributing costs by using lightweight, domain-specialized models for Workers is effective.
  3. Passing Worker results to the Supervisor as-is without compression — honestly, almost everyone makes this mistake during the initial build. You only feel the need for context compression after seeing Supervisor response quality noticeably degrade around the 10-round-trip mark.

Closing Thoughts

The Supervisor Pattern replaces the vague design of "the agents will figure out how to collaborate" with a clear structure where "the control flow always returns to one place." Traceability, quality gates, and domain isolation are the three advantages that have made this pattern the production default.

Three steps you can start with right now:

  1. You can start by installing the packages — after installing with pip install langgraph-supervisor langchain-openai, search for the "Hierarchical Agent Teams" tutorial in the official LangGraph documentation and run the example code to quickly grasp the overall flow.
  2. You can try splitting an existing single agent into a Supervisor + 2 Workers — if you already have an agent built, try dividing the task into two stages — "information gathering" and "result generation" — separating each into a Worker and restructuring so that a Supervisor coordinates between them, to directly experience the benefits of the pattern.
  3. You can connect LangSmith to check execution traces — simply setting the LANGCHAIN_TRACING_V2=true environment variable lets you visually trace the Supervisor's routing decisions and each Worker call. Seeing the trace for the first time gives you a clear "ah, so this is how it flows" realization — and that is the moment you truly understand this pattern.

References

  • LangGraph Supervisor Pattern: Orchestrating Multi-Agent Teams in 2026 | CallSphere Blog
  • GitHub - langchain-ai/langgraph-supervisor-py
  • Hierarchical Agent Teams | LangGraph Official Tutorial
  • langgraph-supervisor · PyPI
  • How to Use the Supervisor Pattern for Multi-Agent Voice AI Systems | LiveKit
  • Supervisor pattern | LiveKit Documentation
  • Multi-Agent AI Orchestration Patterns: Production Guide | Lushbinary
  • Multi-Agent Orchestration in LangGraph: Supervisor vs Swarm, Tradeoffs and Architecture
  • Swarm vs. Supervisor: Multi-Agent Architecture Guide | Augment Code
  • Agent system design patterns | Databricks on AWS
  • Multi-Agent collaboration patterns with Strands Agents and Amazon Nova | AWS Blog
  • Multi-Agent in Production 2026: 3 Patterns That Survived
  • Architecting efficient context-aware multi-agent framework for production | Google Developers Blog
  • LangGraph Multi-Agent Collaboration in Practice: Supervisor Pattern and Task Dispatch
  • The Multi-Agent Trap | Towards Data Science
#LangGraph#멀티에이전트#SupervisorPattern#CrewAI#LLM오케스트레이션#ReAct#Python#LangChain#컨텍스트압축#OpenAI
Share

Table of Contents

Core ConceptsSupervisor, Swarm, Pipeline — A Question of Where the Control Flow LivesWhy LLM-Based Routing Instead of Rule-Based?Three-Phase Mechanism: Decompose → Delegate → AggregateHierarchical Scaling: When You Have More Than Six AgentsPractical ApplicationExample 1: Building a Basic Supervisor with LangGraphExample 2: Building a Hierarchical Supervisor with CrewAIExample 3: Preventing Token Explosion with Context CompressionPros and ConsAdvantagesDisadvantages and CaveatsMost Common Mistakes in PracticeClosing ThoughtsReferences

Recommended Posts

Comparing Long-Term Memory for AI Agents: Mem0 vs Letta vs Zep — Three Philosophies and How to Choose
AI

Comparing Long-Term Memory for AI Agents: Mem0 vs Letta vs Zep — Three Philosophies and How to Choose

If you've ever built an LLM-based app, you've hit this wall. "How do I make it remember past conversations?" You might think you can just shove the entire conve...

May 30, 202629 min read
Building a Multimodal RAG Pipeline: Making LLMs Understand Images and Tables
AI

Building a Multimodal RAG Pipeline: Making LLMs Understand Images and Tables

When I first introduced RAG, I had a similar experience. I parsed a few hundred PDFs, loaded them into a vector DB, and ran some searches — it retrieved text-he...

May 30, 202620 min read
Building LLM Tracing with OpenTelemetry: Tracking RAG and Multi-Agent Flows with the gen_ai Standard
AI

Building LLM Tracing with OpenTelemetry: Tracking RAG and Multi-Agent Flows with the gen_ai Standard

A service connected to GPT-4 suddenly starts giving nonsensical answers. You dig through the logs and find no errors. HTTP response codes are all 200. But users...

May 30, 202625 min read
Why 88% of AI Agents Fail in Production: The 5-Layer Harness Architecture Is the Answer
AI

Why 88% of AI Agents Fail in Production: The 5-Layer Harness Architecture Is the Answer

When GPT-4 first came out, I—along with most developers around me—shared the same misconception: "Isn't a good model all you need?" We'd slap a few prompt lines...

May 29, 202628 min read
FP4 Quantization + Blackwell GPU: Conditions for 4× Throughput over H100 and When Not to Use It
AI

FP4 Quantization + Blackwell GPU: Conditions for 4× Throughput over H100 and When Not to Use It

llm-compressorscheme="NVFP4"ignore=["lm_head"]num_calibration_samplespip install llmcompressornvfp4_experts_onlynvfp4_experts_onlytorch.cuda.get_device_capabili...

May 29, 202622 min read
XGrammar-2: The Design Principles Behind 80x Faster Structured Output
AI

XGrammar-2: The Design Principles Behind 80x Faster Structured Output

When an LLM calls a tool or returns JSON, there's actually quite a heavy operation running behind the scenes. Every time the model emits a token, it must determ...

May 28, 202623 min read