LangGraph vs CrewAI: Respond with State Machine Orchestration when the number of agents exceeds 3
A single agent is simple. You input a prompt and receive a result. It remains manageable even with a second agent attached. However, the moment the number of agents exceeds three, entirely different problems emerge. State is lost with every handoff, token costs in naive implementations balloon to O(n²) depending on the number of agents, and a single error by the first agent contaminates the entire pipeline. 2025 was the year the industry fully confronted "agent operationalization"—according to an arXiv paper (2601.13671), over 400 companies are operating multi-agent systems in production.
LangGraph's directed graph-based state machine and CrewAI's role-based team model originate from fundamentally different philosophies, and the choice between them determines the system's complexity, maintainability, and cost. This section helps you understand the operating principles of each framework through code examples and presents criteria for deciding which one to choose in different situations.
LangGraph vs. CrewAI Key Differences: 30-Second Summary
LangGraph CrewAI Analogy State Diagram Team Organization Chart Design Unit Node (Function) + Edge (Condition) Agent (Role) + Task State Management Explicit with TypedDict + reducer Declarative with context parameter Branches/Loops Free with conditional edges Restricted by Process.hierarchical Human-in-the-loop interrupt_before/after Native Support No Native Support
| Strengths | Complex conditional branching, people approval, loops | Team collaboration structure with clear roles |
| Learning Curve | Medium-High (Requires understanding of graph concepts) | Low (Intuitive role abstraction) |
Key Concepts
What is Multi-Agent Orchestration
Multi-agent orchestration is an architectural pattern that coordinates multiple AI agents to cooperate and handle complex tasks. It distributes tasks that are difficult for a single agent to handle (complex reasoning, separation of specialized domains, parallel processing) across multiple agents.
Tuning methods are broadly classified into three types:
| Method | Description | Suitable Situation |
|---|---|---|
| Supervisor | Central agent delegates to subordinate agents | Where roles are clearly separated |
| Peer-to-Peer | Agents communicate directly with each other | When dynamic collaboration is required |
| Event-driven | Loosely coupled via message bus (Kafka, etc.) | Large-scale distributed system |
Four Problems Created by Exceeding 3 Agents
Why does adding agents, which seems simple, suddenly become complicated? When the number of agents n exceeds 3, four problems arise simultaneously.
1. State Propagation Problem — How should the context collected by Agent A be passed to B, C, and D? Passing the entire conversation history as is results in token explosion, while injecting only the necessary chunks leads to information loss.
2. Increased Token Cost — In a naive implementation where the entire history is passed to each agent, the cost increases to O(n²). While passing only a summary to each agent can reduce it to O(n), without a deliberate design, it usually leads to the worst-case scenario.
3. Cascade Failure — Errors or illusions from upstream agents propagate downstream without verification. If research results are passed directly to the writing agent without a fact-checking agent, incorrect information is included in the final output.
4. Coordination Complexity — As the number of agents increases, "who executes when" must be clearly defined. Relying on implicit order leads to race conditions during parallel execution.
Cascade Failure: A phenomenon where the failure of a single component causes a chain reaction of failures in downstream components. The same pattern appears in microservices architecture.
LangGraph: Directed Graph-Based State Machine
LangGraph models workflows as mathematical directed graphs (DAGs or Cyclic Graphs). A Directed Acyclic Graph (DAG) is a directed graph without loops, while a Cyclic Graph is a graph with loops—cycles are essential in workflows such as retries or re-execution after human approval.
You only need to understand three key concepts:
- Node: Agent or logic function
- Edge: State transition condition (
"approve"→END,"reject"→"code_review") - State: Defines the schema with
TypedDictand manages immutability with theAnnotatedreducer.
TypedDict is the standard way to add type hints to Python dictionaries. Annotated[list[str], operator.add] is the syntax for declaring a reducer that says "do not overwrite when this field is updated, but append the list" — ensuring that the results of the previous agent are not lost during a handoff.
The key point is that the state becomes a single source of truth that runs through all agents. Each node reads a portion of the state and returns only the changes (diff).
One of LangGraph's key differentiators is Human-in-the-loop support. Workflows can be paused before or after specific node execution using the interrupt_before or interrupt_after parameters and resumed after human review. A typical example is a workflow where deployment proceeds only after a human manually verifies security scan results. CrewAI does not have a native feature to this effect.
Reducer: A pure function that takes the current state and a new value and returns the next state. If operator.add is specified as the reducer, the list is accumulated instead of overwritten.
CrewAI: Role-based Team Orchestration
CrewAI starts from a different philosophy. It models agents as "digital team members." Each agent has three attributes:
- role: This agent's title ("Technology Researcher", "Fact Checker")
- goal: The goal this agent must achieve
- backstory: The background story that defines the agent's expertise and conduct.
Handoffs between tasks are handled declaratively by the context=[이전_태스크] parameter. allow_delegation=False is an important option — if left enabled, unexpected task chains are created as agents redeleg tasks to other agents, making it difficult to track which agent actually performed the work.
The execution order is selected from Process.sequential (sequential) or Process.hierarchical (Manager LLM delegated).
Practical Application
Example 1: Building a Code Review Pipeline with LangGraph
Implement a loop of Code Review → Test Analysis → Security Scan → Conditional Approval/Reconsideration. The key is a cycle that returns to the Code Review stage if a security issue is detected.
Note: The code below is an example to demonstrate the state flow of LangGraph. LLM calls are omitted, and hardcoded return values are used. In production code, import from langchain_openai import ChatOpenAI and replace it with a call to llm.invoke(...) at each node.
from typing import TypedDict, Annotated, Literal
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
import operator
# 1. 공유 상태 스키마 — 모든 에이전트가 이 타입을 공유한다
class ReviewState(TypedDict):
code: str
review_result: str
test_result: str
security_result: str
errors: Annotated[list[str], operator.add] # reducer: 덮어쓰기 대신 누적
approved: bool
iteration: int
# 2. 에이전트 노드 — 순수 함수, 상태 diff만 반환
def code_review_agent(state: ReviewState) -> dict:
# 실제 구현: llm.invoke(f"다음 코드를 리뷰하세요: {state['code']}")
return {
"review_result": "구조 개선 필요: 의존성 주입 패턴 미적용",
"iteration": state["iteration"] + 1
}
def test_agent(state: ReviewState) -> dict:
return {"test_result": "커버리지 78% — 경계값 테스트 누락"}
def security_scan_agent(state: ReviewState) -> dict:
return {"security_result": "SQL Injection 취약점 감지 (line 42)"}
# 3. 조건부 엣지 함수 — 다음 노드를 결정
def approval_gate(
state: ReviewState,
) -> Literal["approve", "reject", "reject_final"]:
has_vulnerability = "취약점" in state["security_result"]
iteration_limit_reached = state["iteration"] >= 3
if has_vulnerability and iteration_limit_reached:
# 최대 반복 도달 시 취약점 있는 코드를 승인하지 않고 실패 종료
# 프로덕션에서는 알림 발송 + 수동 검토 요청으로 연결
return "reject_final"
if has_vulnerability:
return "reject"
return "approve"
# 4. 그래프 조립
builder = StateGraph(ReviewState)
builder.add_node("code_review", code_review_agent)
builder.add_node("test", test_agent)
builder.add_node("security_scan", security_scan_agent)
builder.set_entry_point("code_review")
builder.add_edge("code_review", "test")
builder.add_edge("test", "security_scan")
builder.add_conditional_edges(
"security_scan",
approval_gate,
{
"approve": END,
"reject": "code_review", # 실패 시 첫 단계로 루프백
"reject_final": END # 반복 초과 시 실패 종료 (취약점 있는 상태로 승인 금지)
}
)
# 5. 체크포인트 — 중간 실패 시 해당 노드에서 재시작 가능
# Human-in-the-loop 활성화 예시:
# graph = builder.compile(checkpointer=checkpointer, interrupt_before=["security_scan"])
# → 보안 스캔 전에 실행이 멈추고, 사람이 직접 검토 후 graph.invoke()로 재개 가능
checkpointer = MemorySaver()
graph = builder.compile(checkpointer=checkpointer)
# 실행
config = {"configurable": {"thread_id": "review-001"}}
result = graph.invoke(
{"code": "...", "errors": [], "iteration": 0, "approved": False},
config=config
)| Code Point | Role |
|---|---|
Annotated[list[str], operator.add] |
Accumulate errors list without overwriting — Prevents data loss during handoff |
add_conditional_edges |
Dynamically determine the next node based on security scan results |
MemorySaver |
Save the state of each step — Restart from the current checkpoint in case of intermediate failure |
reject_final Termination Node |
Failed to terminate without accepting vulnerable code when maximum iterations are reached |
interrupt_before (Comment) |
Human-in-the-loop: Stop execution before a specific node and resume after human review |
Example 2: Building a Content Creation Pipeline with CrewAI
It is a 4-stage pipeline of Researcher → Fact-checker → Author → Editor. Each agent focuses solely on their role and automatically receives previous results via context.
from crewai import Agent, Task, Crew, Process
# 1. 역할 기반 에이전트 정의
researcher = Agent(
role="기술 리서처",
goal="멀티에이전트 오케스트레이션 최신 트렌드와 실제 사례를 수집한다",
backstory=(
"10년 경력의 기술 저널리스트. arXiv, GitHub, 공식 문서만 신뢰하며, "
"마케팅 블로그 내용을 그대로 인용하지 않는다."
),
allow_delegation=False, # 켜두면 에이전트가 작업을 재위임해 예상치 못한 태스크 체인이 생긴다
verbose=True
)
fact_checker = Agent(
role="팩트 체커",
goal="리서치 결과의 정확성을 검증하고 수치·인용 오류를 제거한다",
backstory="전직 학술 연구원. 출처 없는 주장은 모두 의심하고 검증을 요구한다.",
allow_delegation=False
)
writer = Agent(
role="기술 작가",
goal="검증된 정보를 개발자 친화적인 마크다운 블로그 글로 변환한다",
backstory="오픈소스 커뮤니티에서 활동하는 5년 경력의 기술 블로거.",
allow_delegation=False
)
editor = Agent(
role="편집자",
goal="초안의 가독성, 논리 흐름, 문체를 개선해 최종 출판 품질로 다듬는다",
backstory="기술 미디어 10년 경력의 시니어 에디터. 독자 친화성을 최우선으로 본다.",
allow_delegation=False
)
# 2. 태스크 정의 — context가 핸드오프를 처리한다
research_task = Task(
description="LangGraph와 CrewAI의 멀티에이전트 오케스트레이션 패턴 조사",
expected_output="핵심 개념, 실제 사례, 장단점, 참고 자료 목록",
agent=researcher
)
fact_check_task = Task(
description="리서치 결과의 수치와 사실 관계 검증. 오류 발견 시 수정사항 명시",
expected_output="검증된 사실 목록과 수정된 오류 내역",
agent=fact_checker,
context=[research_task] # 리서처 결과를 컨텍스트로 자동 주입
)
write_task = Task(
description="검증된 정보로 2000자 이상의 기술 블로그 초안 작성 (마크다운)",
expected_output="제목, 도입, 핵심 개념, 실전 예시, 결론을 포함한 마크다운 글",
agent=writer,
context=[research_task, fact_check_task] # 두 태스크 결과 모두 주입
)
edit_task = Task(
description="작성된 초안을 검토하여 가독성, 논리 흐름, 표현을 최종 수준으로 개선",
expected_output="편집 완료된 최종 마크다운 글",
agent=editor,
context=[write_task]
)
# 3. Crew 조립 및 실행
crew = Crew(
agents=[researcher, fact_checker, writer, editor],
tasks=[research_task, fact_check_task, write_task, edit_task],
process=Process.sequential, # 순차 실행
verbose=True
)
result = crew.kickoff()
print(result.raw)Example 3: Supervisor Pattern — Comparison of LangGraph and CrewAI
The pattern where a triage agent classifies intents and delegates to expert agents is implemented using two frameworks, respectively.
LangGraph — Branch by conditional edge:
def triage_agent(state: dict) -> dict:
intent = classify_intent(state["query"]) # "payment" | "shipping" | "general"
return {"intent": intent}
def route_by_intent(state: dict) -> str:
return state["intent"] # 반환값이 다음 노드 이름
builder.add_node("triage", triage_agent)
builder.add_node("payment_agent", payment_agent)
builder.add_node("shipping_agent", shipping_agent)
builder.add_node("general_agent", general_agent)
builder.set_entry_point("triage")
builder.add_conditional_edges(
"triage",
route_by_intent,
{
"payment": "payment_agent",
"shipping": "shipping_agent",
"general": "general_agent"
}
)CrewAI — Delegate Manager to Process.hierarchical:
from crewai import Agent, Task, Crew, Process
payment_agent = Agent(
role="결제 전문가",
goal="결제 관련 고객 문의를 처리한다",
backstory="결제 시스템 5년 경력. 환불, 결제 오류, 카드 문제를 전문으로 한다.",
allow_delegation=False
)
shipping_agent = Agent(
role="배송 전문가",
goal="배송 추적과 배송 관련 문의를 처리한다",
backstory="물류 시스템 전문가. 배송 지연, 분실, 주소 변경을 담당한다.",
allow_delegation=False
)
manager = Agent(
role="고객 서비스 매니저",
goal="고객 문의를 분석하고 적절한 전문가에게 위임한다",
backstory="고객 서비스 총괄. 모든 문의를 접수하고 전문팀에 배분한다.",
)
handle_query_task = Task(
description="고객 문의를 처리하고 완전한 답변을 제공한다",
expected_output="고객 문의에 대한 완전하고 정확한 답변",
agent=manager
)
crew = Crew(
agents=[payment_agent, shipping_agent],
tasks=[handle_query_task],
process=Process.hierarchical, # Manager LLM이 자동으로 적절한 에이전트에 위임
manager_agent=manager,
verbose=True
)LangGraph specifies branch conditions in code, making it predictable and easy to test. CrewAI's Process.hierarchical is simple to set up because the Manager LLM makes decisions in natural language, but there is a trade-off in that delegation decisions depend on the LLM.
Pros and Cons Analysis
Advantages
| Item | LangGraph | CrewAI |
|---|---|---|
| State Management | Type-safe immutable state with TypedDict + reducer | Declarative handoff with context parameter |
| Complex Branching | Representing Arbitrary Branches and Loops with Conditional Edges | Delegating Managers with Process.hierarchical |
| Restart/Recovery | Checkpoint-based mid-restart, Time Travel debugging | Task-based retries |
| Human-in-the-loop | Native support with interrupt_before/after |
No native support |
| Parallel Execution | Fan-out support via Send() API |
Crew-level parallel processing |
| Learning Curve | Requires understanding of graph concepts | Intuitive through role-based abstraction |
| Production Cases | 400+ companies including LinkedIn, Uber (arXiv 2601.13671) | Widely used as a role-based pipeline |
Disadvantages and Precautions
| Item | Content | Response Plan |
|---|---|---|
| LangGraph Learning Curve | Prior understanding of graph design and state reducer concepts required | Learn by following the simple graph in the official tutorial → complex examples |
| CrewAI direct communication impossible | No direct communication between agents except in the Manager↔Worker layer | Workaround using storage of intermediate results in shared storage (file/DB) |
| Token Cost Explosion | Increases to O(n²) due to context accumulation during handoff in naive implementations | Reduced to O(n) by injecting only the chunks needed by each agent and inserting summary agents |
| Chain Failure | Upstream error propagates across downstream | Insertion of retry logic, fallback nodes, and fact-checking agents |
| Absence of Observability | Unable to track what happened on which agent | Track all agent calls with LangSmith or LangFuse |
| JS/TS Support | LangGraph.js available / CrewAI is Python only | Consider LangGraph.js or AutoGen for JS/TS stacks |
Observability: The ability to observe the internal state of a system from the outside. In multi-agent systems, this refers to tracking the inputs, outputs, latency, and token usage of each agent.
Time Travel Debugging: Using LangGraph checkpointer features, you can revert to a specific past state and execute different paths. It is a powerful tool for debugging agent behavior.
The Most Common Mistakes in Practice
- Postpone State Schema Design — If you create agents first and try to attach the state later, context loss occurs with every handoff. Design the shared state schema first.
- Omission of Infinite Loop Prevention — When creating loops with conditional edges in LangGraph, if you design them without an iteration counter or maximum retries, the pipeline will loop indefinitely under certain conditions. Always specify an exit condition and a failure exit node.
- Production Deployment Without Observability — While debugging is possible locally with verbose output, in production, without LangSmith or LangFuse, it is impossible to know which agent failed or how much it cost. Observability must be established before feature development.
In Conclusion
In agent orchestration, three principles are more important than framework selection: First, state design—define a shared state schema among agents first. Second, failure isolation—design retry/fallback nodes so that individual agent failures do not kill the entire pipeline. Third, observability—go into production only after establishing a system to track all agent calls. LangGraph excels at complex conditional branching, loops, and workflows requiring human approval, while CrewAI shines when intuitively modeling team collaboration structures with clearly separated roles.
3 Steps to Start Right Now:
- LangGraph Basic Practice: The starting point is to run the minimal example below exactly after
pip install langgraph. Let's directly check the state flow using a 2-node graph: from langgraph.graph import StateGraph, END from typing import TypedDict class State(TypedDict): value: str def node_a(state): return {"value": "A 처리됨"} def node_b(state): return {"value": state["value"] + " → B 처리됨"} g = StateGraph(State) g.add_node("a", node_a) g.add_node("b", node_b) g.set_entry_point("a") g.add_edge("a", "b") g.add_edge("b", END) print(g.compile().invoke({"value": ""})) # 출력: {'value': 'A 처리됨 → B 처리됨'}- CrewAI Role Design Practice: After
pip install crewai, find a process of 3 steps or more that the current team handles manually, define an Agent and a Task using theresearcher → reviewer → writerpattern, and executecrew.kickoff(). - Observability Linking: Regardless of the framework used, attach LangSmith first, verify the agent call trace, and then move to production. Tracing is enabled only when both environment variables are set together:
export LANGCHAIN_API_KEY="your-api-key" export LANGCHAIN_TRACING_V2="true"
Next Post: Context Compression Strategy to Halve Token Costs in Agent Pipelines — Practical Comparison of Summary Agents, Chunk Injection, and Prompt Caching.
Reference Materials
- LangGraph vs CrewAI vs AutoGen: Complete Guide 2026 | DEV Community
- CrewAI vs LangGraph vs AutoGen | DataCamp
- Mastering LangGraph State Management in 2025 | Sparkco
- LangGraph Review: Agentic State Machine 2025 | Sider AI
- CrewAI Process Types | DeepWiki
- Choosing the right orchestration pattern for multi-agent systems | Kore.ai
- The Orchestration of Multi-Agent Systems: Architectures, Protocols, and Enterprise Adoption | arXiv
- Multi-Agent Coordination Patterns | Claude Blog
- AI Agent Orchestration Patterns | Azure Architecture Center
- Multi-Agent Collaboration via Evolving Orchestration | arXiv
- Production Multi-Agent System with LangGraph | Markaicode
- Four Design Patterns for Event-Driven Multi-Agent Systems | Confluent
- Unlocking exponential value with AI agent orchestration | Deloitte