Flow Engineering: From LLM Workflows to Organizational Architecture, How to Design Flow

There are moments when you look into a codebase and think, "Why is this logic so tangled?" I also experienced a situation a few years ago where it took me two weeks to deploy a new feature, and at first, I assumed it was because the code was messy. It turned out that a single PR had to go through reviews by three teams, and since there was only one shared QA environment, several days were wasted just trying to get the deployment order right. It was only much later that I realized the root cause was actually that the workflow was blocked.

Flow Engineering is an approach that addresses "where work gets stuck" at the design level, rather than focusing on code quality issues. Whether in AI agent workflows, microservices architectures, or team deployment pipelines—visualizing the flow and eliminating bottlenecks is key everywhere. In fact, the effect is quite dramatic, as seen in the Weights & Biases case study, where a task that was stuck at 17% accuracy with simple prompts rose to 91% after designing the flow.

In this article, we examine how Flow Engineering works in three contexts (AI/LLM workflows, software architecture, and organizational processes) and how to apply it while writing actual code with LangGraph.

Key Concepts

What is "Flow"

When you first encounter Flow Engineering, the concepts may feel a bit abstract. However, if you place the three contexts side by side, the common philosophy becomes clear:

Core Philosophy of Flow Engineering: Work must flow toward value without bottlenecks. The starting point of design is to visualize where that flow gets blocked.

Context	Definition	Key Target
AI/LLM Engineering	Design workflow graphs to break down complex tasks into steps and enable LLM self-validation	Prompt → Agent Node/Edge
Software Architecture	Building a system optimized for the flow of change by combining Wardley Mapping + DDD + Team Topologies	Organizational Structure ↔ Technology Structure
Organization/Process	Measures how efficiently the four Flow Items (Feature, Defect, Risk, Debt) flow through the deployment pipeline	Cycle Time (Job Start to Completion Time), Throughput

If you look at the three contexts, you realize that they are ultimately asking the same question: "Where is the current work getting stuck?"

AI/LLM Context: Beyond the Limitations of a Single Prompt

Everyone has probably experienced receiving unstable results at least once after thinking, "I just need to use prompts well," and attempting to handle complex tasks with a single LLM call. The secret is simple: break down the task and design the flow so that the LLM performs self-refinement at each step.

LangGraph is a graph-based state machine that represents this flow using nodes and edges. Since the release of v1.0, it has established itself as the de facto standard framework supporting cycles, memory, and tool calls.

python

from langgraph.graph import StateGraph, END
from typing import TypedDict
 
# 실행 전 초기화 필요:
# from langchain_openai import ChatOpenAI
# from langchain_community.tools import DuckDuckGoSearchRun
# llm = ChatOpenAI(model="gpt-4o")
# search_web = DuckDuckGoSearchRun().invoke
 
class ResearchState(TypedDict):
    query: str
    search_results: list[str]
    draft: str
    review_feedback: str
    final_answer: str
 
def search_node(state: ResearchState) -> ResearchState:
    results = search_web(state["query"])
    return {**state, "search_results": results}
 
def draft_node(state: ResearchState) -> ResearchState:
    draft = llm.invoke(f"다음 자료로 답변을 작성하세요: {state['search_results']}")
    return {**state, "draft": draft.content}
 
def review_node(state: ResearchState) -> ResearchState:
    # 자기 검증 — 초안을 비판적으로 검토
    feedback = llm.invoke(f"이 답변의 문제점을 찾아주세요: {state['draft']}")
    return {**state, "review_feedback": feedback.content}
 
def revise_node(state: ResearchState) -> ResearchState:
    final = llm.invoke(
        f"피드백을 반영해 개선하세요.\n초안: {state['draft']}\n피드백: {state['review_feedback']}"
    )
    return {**state, "final_answer": final.content}
 
graph = StateGraph(ResearchState)
graph.add_node("search", search_node)
graph.add_node("draft", draft_node)
graph.add_node("review", review_node)
graph.add_node("revise", revise_node)
 
graph.set_entry_point("search")
graph.add_edge("search", "draft")
graph.add_edge("draft", "review")
graph.add_edge("review", "revise")
graph.add_edge("revise", END)
 
app = graph.compile()

Self-Refine: A pattern in which an LLM critiques and improves its own output. It is implemented by using a separate validation agent or by re-calling the same model with a different prompt.

Initially, I removed the review node and connected directly to revise, but when I requested modifications without a verification step, I experienced a situation where the LLM actually ruined a perfectly good draft. It was tantamount to "requesting improvements without knowing what needed to be improved." Explicitly adhering to the order of search → draft → review → revise is much more important than you might think.

Software Architecture Context: A System Optimized for the Flow of Change

If the principle of "explicitly designing steps" worked in LLM workflows, the same question applies to software architecture: "Where does change get stuck?" Susanne Kaiser's Architecture for Flow answers this question by combining three methodologies:

Methodology	Role	Questions to Answer
Wardley Mapping	Understanding the Strategic Landscape	Which Components to Build Yourself and Which to Buy?
Domain-Driven Design	Problem Space Separation	How to Divide Boundaries?
Team Topologies	Designing Team Interactions	How Will Teams Collaborate?

Honestly, at first, I found it difficult to understand why three methodologies needed to be used together, but seeing Wardley Mapping as an example made it immediately clear. If you place the question, "Should we build the authentication module ourselves or buy a SaaS like Auth0?" on a Wardley Map, authentication is already in the realm of commoditized components. This means that building it ourselves offers no differentiation and only creates a maintenance burden. On the other hand, our service's core recommendation algorithm is closer to genesis (the realm of independent innovation), so building it ourselves is the right choice. The role of Wardley Mapping is to help make this decision consistently across the entire system.

When these three elements work together, a structure is created where code changes can be deployed independently without the need for approval from other teams.

Organizational/Process Context: Flow Framework and Four Flow Items

Even if the architecture is well-designed, you cannot know where bottlenecks occur without measuring how actual work flows through the pipeline. Mik Kersten's Flow Framework is a methodology that measures how efficiently four Flow Items flow through the deployment pipeline:

Flow Items:
├── Feature  → 새로운 비즈니스 가치 창출
├── Defect   → 품질 문제 수정
├── Risk     → 보안, 컴플라이언스, 아키텍처 부채
└── Debt     → 기술 부채 해소

Flow Metrics: Consists of Flow Velocity (number of completed items per unit of time), Flow Time (time taken for an item from request to deployment), Flow Efficiency (ratio of actual work to total time), and Flow Distribution (ratio of the four items).

Practical Application

Example 1: Multi-Agent Research Workflow

This is an orchestrator-subagent structure borrowed from IBM's LangGraph + Granite example. It is a flow where the orchestrator analyzes tasks, specialized subagents (search, code generation) execute in parallel, and the results are integrated during the verification and editing phases.

python

from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
import operator
 
# 초기화 예시:
# from langchain_openai import ChatOpenAI
# from langchain_community.tools import DuckDuckGoSearchRun
# llm = ChatOpenAI(model="gpt-4o")
# search_tool = DuckDuckGoSearchRun()
 
class MultiAgentState(TypedDict):
    task: str
    task_analysis: str      # 오케스트레이터가 생성한 작업 분석
    search_result: str
    code_result: str
    validation_result: str
    final_output: str
    errors: Annotated[list[str], operator.add]  # 병렬 노드에서 안전하게 누적
 
def orchestrator(state: MultiAgentState) -> dict:
    """작업을 분석하고 서브에이전트가 공유할 컨텍스트를 생성"""
    analysis = llm.invoke(
        f"다음 작업을 분석하고 핵심 요구사항을 요약하세요: {state['task']}"
    )
    return {"task_analysis": analysis.content}
 
def search_agent(state: MultiAgentState) -> dict:
    """검색 전문 에이전트 — 오케스트레이터 분석을 컨텍스트로 활용"""
    result = search_tool.invoke(
        f"{state['task_analysis']}\n\n검색 쿼리: {state['task']}"
    )
    return {"search_result": result}
 
def code_agent(state: MultiAgentState) -> dict:
    """코드 생성 전문 에이전트"""
    result = llm.invoke(
        f"분석: {state['task_analysis']}\n\n요구사항에 맞는 코드를 작성하세요: {state['task']}"
    )
    return {"code_result": result.content}
 
def validation_agent(state: MultiAgentState) -> dict:
    """검색과 코드 결과를 모두 받은 뒤 실행 — 두 브랜치가 완료되어야 진입"""
    combined = f"검색: {state['search_result']}\n코드: {state['code_result']}"
    validation = llm.invoke(f"다음 결과를 검증하세요: {combined}")
    return {"validation_result": validation.content}
 
def editor_agent(state: MultiAgentState) -> dict:
    """최종 통합"""
    final = llm.invoke(
        f"""다음 결과를 통합하여 최종 답변을 작성하세요:
        - 작업 분석: {state['task_analysis']}
        - 검색 결과: {state['search_result']}
        - 코드 결과: {state['code_result']}
        - 검증 결과: {state['validation_result']}"""
    )
    return {"final_output": final.content}
 
builder = StateGraph(MultiAgentState)
builder.add_node("orchestrator", orchestrator)
builder.add_node("search", search_agent)
builder.add_node("code", code_agent)
builder.add_node("validation", validation_agent)
builder.add_node("editor", editor_agent)
 
builder.set_entry_point("orchestrator")
 
# 팬아웃: orchestrator → search, code 두 노드가 같은 슈퍼스텝에서 실행
builder.add_edge("orchestrator", "search")
builder.add_edge("orchestrator", "code")
 
# 팬인: search, code 모두 완료된 슈퍼스텝 이후에 validation 진입
builder.add_edge("search", "validation")
builder.add_edge("code", "validation")
builder.add_edge("validation", "editor")
builder.add_edge("editor", END)
 
multi_agent_app = builder.compile()

LangGraph executes internally in superstep units. The structure is such that search and code run simultaneously in the same superstep after the orchestrator is complete, and the validation superstep begins only after both nodes are finished. Thanks to this, you do not need to implement synchronization processing separately, as long as you keep state merging annotations like Annotated[list, operator.add] handy.

One thing to note is that the orchestrator is only meaningful if task_analysis is actually utilized in the sub-agent. When I first used this pattern, I once had code where the orchestrator was practically useless because I just put the analysis results into the state and no one read them.

Example 2: Finding Development Bottlenecks with Flow Metrics

As the BDC (Development Bank of Canada) team discovered when measuring flow with Axify, most delays occur before and after development (planning waiting, QA waiting) rather than during development. If you create a simple Flow Metrics measurement code, it looks like this:

typescript

interface FlowItem {
  id: string;
  type: "feature" | "defect" | "risk" | "debt";
  requestedAt: Date;
  startedAt: Date | null;
  completedAt: Date | null;
}
 
interface FlowMetrics {
  flowTime: number;       // 요청부터 완료까지 (일)
  waitTime: number;       // 요청부터 시작까지 (일)
  activeTime: number;     // 실제 작업 시간 (일)
  flowEfficiency: number; // activeTime / flowTime × 100 (%)
}
 
function calculateFlowMetrics(item: FlowItem): FlowMetrics | null {
  if (!item.startedAt || !item.completedAt) return null;
 
  const msPerDay = 1000 * 60 * 60 * 24;
  const flowTime = (item.completedAt.getTime() - item.requestedAt.getTime()) / msPerDay;
  const waitTime = (item.startedAt.getTime() - item.requestedAt.getTime()) / msPerDay;
  const activeTime = (item.completedAt.getTime() - item.startedAt.getTime()) / msPerDay;
  const flowEfficiency = (activeTime / flowTime) * 100;
 
  return { flowTime, waitTime, activeTime, flowEfficiency };
}
 
function analyzeBottleneck(items: FlowItem[]): void {
  const metrics = items
    .map((item) => ({ item, metrics: calculateFlowMetrics(item) }))
    .filter((r) => r.metrics !== null);
 
  const avgWaitTime =
    metrics.reduce((sum, r) => sum + r.metrics!.waitTime, 0) / metrics.length;
  const avgEfficiency =
    metrics.reduce((sum, r) => sum + r.metrics!.flowEfficiency, 0) / metrics.length;
 
  console.log(`평균 대기 시간: ${avgWaitTime.toFixed(1)}일`);
  console.log(`평균 흐름 효율성: ${avgEfficiency.toFixed(1)}%`);
 
  // Mik Kersten의 연구에서 지식 노동(knowledge work)의 Flow Efficiency는
  // 통상 15~40% 범위이며, 15% 미만은 대기 시간이 압도적으로 많다는 신호다.
  if (avgEfficiency < 15) {
    console.log("⚠️  병목 감지: 실제 작업보다 대기 시간이 압도적으로 많습니다.");
    console.log("→ 기획·QA·리뷰 프로세스를 점검해보시면 좋습니다.");
  }
}

It was quite shocking to realize that the problem, which seemed like "developers were slow" until I saw these numbers, was actually just waiting for the process. A Flow Efficiency in the 10% range means that out of an 8-hour day, only 48 minutes are actually spent writing code, with the rest of the time spent waiting for approval or the next sprint.

Pros and Cons Analysis

Advantages

Item	Content	Precautions
Bottleneck Visualization	Break down workflows into measurable units to discover hidden latency points	Incorrectly set measurement points can point to the wrong bottlenecks
LLM Quality Improvement	Significantly increase accuracy compared to a single prompt with step decomposition + self-verification (17% → 91% cases)	As the number of nodes increases, debugging and costs increase together
Team Autonomy	Flow-centric design reduces inter-team dependency, increasing the possibility of independent deployment	Adopting Team Topologies entails organizational change, so executive support is required
Business Linkage	Flow Metrics allows you to directly link engineering performance to business impact	Verifying business connectivity first when designing metrics prevents distortion
Scalability	Horizontal scaling is easy with parallel agent execution and an event-driven pattern	Horizontal scaling without Observability infrastructure makes it difficult to trace the cause of failures

Disadvantages and Precautions

Item	Content	Response Plan
The Paradox of Measurement	Poorly designed Flow Metrics lead to distortions where meaningless metrics, such as commit counts, are optimized	It is recommended to first verify the connection to business impact when designing metrics
Increased complexity	Multi-agent workflows are much more difficult to debug than a single LLM call	It is recommended to build the Observability infrastructure first and then add agents
Increased Costs	Token costs can increase more than linearly when chaining LLM calls	Consider using tiered caching and a mix of low-cost models
Organizational Change	The adoption of Team Topologies and Architecture for Flow entails a change in organizational culture, not just a technical issue	Driving this solely through the technology team without management support has a high probability of failure
Requirements Variability	Even if the flow is stabilized, frequent changes in requirements incur workflow redesign costs.	It is recommended to separate interfaces for nodes with a high change frequency to allow for flexible replacement.

Observability: The degree to which the internal state of a system can be inferred from external outputs (logs, traces, metrics). In agent workflows, tracking which node made which decision based on which input is key. LangSmith is widely used in the LangChain ecosystem, while OpenTelemetry is commonly used for general purposes.

The Most Common Mistakes in Practice

Starting Optimization Without Measurement: It is common to overhaul processes based solely on the feeling that "our team flow is slow," often resulting in improvements to areas that are not actual bottlenecks. We recommend collecting Flow Metrics first and determining areas for improvement after reviewing the data.
Expecting too much from a single LLM call: This occurs when attempting to handle a complex task in a single call results in unstable outcomes. A structure that breaks down the task into 3 to 5 steps and verifies each step is much more stable. As seen above, simply adding a single review node can drastically change the quality of the results.
Deploying agents to production without Observability: When a workflow that ran well locally behaves strangely in production, debugging is virtually impossible if you cannot see which node received what input. The figure from the LangChain report stating that 89% of companies have already implemented Observability is not without reason.

In Conclusion

Designing flow is ultimately the process of honestly looking into "what we are currently waiting for." The essence of Flow Engineering is eliminating that waiting time through code, processes, and organizational structures. Whether it is an AI agent workflow, a microservices architecture, or a team's development process—anywhere—it is helpful to first visualize the points where the flow gets stuck and start the design from there.

Here are 3 steps you can start right now, tailored to your situation:

You can start by measuring your team's Flow Time. It is recommended to extract issues from the last three months in GitHub, Jira, or Linear and calculate the interval between "Request Date → Deployment Date." You can immediately see the average Flow Time for Features and Defects, as well as which stage (Planning Wait, Review, QA) takes the most time. If you feel your team's process is slow, I recommend starting here.
If you are using LLM, you can break down a complex prompt into three steps. By dividing it into "Search → Draft → Review" or "Analysis → Generation → Validation" and defining each step as a node in LangGraph to connect them, you can immediately see changes in accuracy. If the quality of your LLM results is unstable, we recommend starting here.
If you wish to approach this from an architectural perspective, I recommend reading Susanne Kaiser's Architecture for Flow or Mik Kersten's Project to Product. Both books explain concepts with concrete examples, making them excellent starting points for team discussions. If deployment dependencies between teams are complex, I recommend starting here.

Reference Materials

Flow Engineering: From LLM Workflows to Organizational Architecture, How to Design Flow | DEV BAK - 기술블로그

Architecture

Flow Engineering: From LLM Workflows to Organizational Architecture, How to Design Flow

Key Concepts

What is "Flow"

When you first encounter Flow Engineering, the concepts may feel a bit abstract. However, if you place the three contexts side by side, the common philosophy becomes clear:

Core Philosophy of Flow Engineering: Work must flow toward value without bottlenecks. The starting point of design is to visualize where that flow gets blocked.

Context	Definition	Key Target
AI/LLM Engineering	Design workflow graphs to break down complex tasks into steps and enable LLM self-validation	Prompt → Agent Node/Edge
Software Architecture	Building a system optimized for the flow of change by combining Wardley Mapping + DDD + Team Topologies	Organizational Structure ↔ Technology Structure
Organization/Process	Measures how efficiently the four Flow Items (Feature, Defect, Risk, Debt) flow through the deployment pipeline	Cycle Time (Job Start to Completion Time), Throughput

If you look at the three contexts, you realize that they are ultimately asking the same question: "Where is the current work getting stuck?"

AI/LLM Context: Beyond the Limitations of a Single Prompt

python

from langgraph.graph import StateGraph, END
from typing import TypedDict
 
# 실행 전 초기화 필요:
# from langchain_openai import ChatOpenAI
# from langchain_community.tools import DuckDuckGoSearchRun
# llm = ChatOpenAI(model="gpt-4o")
# search_web = DuckDuckGoSearchRun().invoke
 
class ResearchState(TypedDict):
    query: str
    search_results: list[str]
    draft: str
    review_feedback: str
    final_answer: str
 
def search_node(state: ResearchState) -> ResearchState:
    results = search_web(state["query"])
    return {**state, "search_results": results}
 
def draft_node(state: ResearchState) -> ResearchState:
    draft = llm.invoke(f"다음 자료로 답변을 작성하세요: {state['search_results']}")
    return {**state, "draft": draft.content}
 
def review_node(state: ResearchState) -> ResearchState:
    # 자기 검증 — 초안을 비판적으로 검토
    feedback = llm.invoke(f"이 답변의 문제점을 찾아주세요: {state['draft']}")
    return {**state, "review_feedback": feedback.content}
 
def revise_node(state: ResearchState) -> ResearchState:
    final = llm.invoke(
        f"피드백을 반영해 개선하세요.\n초안: {state['draft']}\n피드백: {state['review_feedback']}"
    )
    return {**state, "final_answer": final.content}
 
graph = StateGraph(ResearchState)
graph.add_node("search", search_node)
graph.add_node("draft", draft_node)
graph.add_node("review", review_node)
graph.add_node("revise", revise_node)
 
graph.set_entry_point("search")
graph.add_edge("search", "draft")
graph.add_edge("draft", "review")
graph.add_edge("review", "revise")
graph.add_edge("revise", END)
 
app = graph.compile()

Self-Refine: A pattern in which an LLM critiques and improves its own output. It is implemented by using a separate validation agent or by re-calling the same model with a different prompt.

Software Architecture Context: A System Optimized for the Flow of Change

Methodology	Role	Questions to Answer
Wardley Mapping	Understanding the Strategic Landscape	Which Components to Build Yourself and Which to Buy?
Domain-Driven Design	Problem Space Separation	How to Divide Boundaries?
Team Topologies	Designing Team Interactions	How Will Teams Collaborate?

When these three elements work together, a structure is created where code changes can be deployed independently without the need for approval from other teams.

Organizational/Process Context: Flow Framework and Four Flow Items

Flow Items:
├── Feature  → 새로운 비즈니스 가치 창출
├── Defect   → 품질 문제 수정
├── Risk     → 보안, 컴플라이언스, 아키텍처 부채
└── Debt     → 기술 부채 해소

Practical Application

Example 1: Multi-Agent Research Workflow

python

from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
import operator
 
# 초기화 예시:
# from langchain_openai import ChatOpenAI
# from langchain_community.tools import DuckDuckGoSearchRun
# llm = ChatOpenAI(model="gpt-4o")
# search_tool = DuckDuckGoSearchRun()
 
class MultiAgentState(TypedDict):
    task: str
    task_analysis: str      # 오케스트레이터가 생성한 작업 분석
    search_result: str
    code_result: str
    validation_result: str
    final_output: str
    errors: Annotated[list[str], operator.add]  # 병렬 노드에서 안전하게 누적
 
def orchestrator(state: MultiAgentState) -> dict:
    """작업을 분석하고 서브에이전트가 공유할 컨텍스트를 생성"""
    analysis = llm.invoke(
        f"다음 작업을 분석하고 핵심 요구사항을 요약하세요: {state['task']}"
    )
    return {"task_analysis": analysis.content}
 
def search_agent(state: MultiAgentState) -> dict:
    """검색 전문 에이전트 — 오케스트레이터 분석을 컨텍스트로 활용"""
    result = search_tool.invoke(
        f"{state['task_analysis']}\n\n검색 쿼리: {state['task']}"
    )
    return {"search_result": result}
 
def code_agent(state: MultiAgentState) -> dict:
    """코드 생성 전문 에이전트"""
    result = llm.invoke(
        f"분석: {state['task_analysis']}\n\n요구사항에 맞는 코드를 작성하세요: {state['task']}"
    )
    return {"code_result": result.content}
 
def validation_agent(state: MultiAgentState) -> dict:
    """검색과 코드 결과를 모두 받은 뒤 실행 — 두 브랜치가 완료되어야 진입"""
    combined = f"검색: {state['search_result']}\n코드: {state['code_result']}"
    validation = llm.invoke(f"다음 결과를 검증하세요: {combined}")
    return {"validation_result": validation.content}
 
def editor_agent(state: MultiAgentState) -> dict:
    """최종 통합"""
    final = llm.invoke(
        f"""다음 결과를 통합하여 최종 답변을 작성하세요:
        - 작업 분석: {state['task_analysis']}
        - 검색 결과: {state['search_result']}
        - 코드 결과: {state['code_result']}
        - 검증 결과: {state['validation_result']}"""
    )
    return {"final_output": final.content}
 
builder = StateGraph(MultiAgentState)
builder.add_node("orchestrator", orchestrator)
builder.add_node("search", search_agent)
builder.add_node("code", code_agent)
builder.add_node("validation", validation_agent)
builder.add_node("editor", editor_agent)
 
builder.set_entry_point("orchestrator")
 
# 팬아웃: orchestrator → search, code 두 노드가 같은 슈퍼스텝에서 실행
builder.add_edge("orchestrator", "search")
builder.add_edge("orchestrator", "code")
 
# 팬인: search, code 모두 완료된 슈퍼스텝 이후에 validation 진입
builder.add_edge("search", "validation")
builder.add_edge("code", "validation")
builder.add_edge("validation", "editor")
builder.add_edge("editor", END)
 
multi_agent_app = builder.compile()

Example 2: Finding Development Bottlenecks with Flow Metrics

typescript

interface FlowItem {
  id: string;
  type: "feature" | "defect" | "risk" | "debt";
  requestedAt: Date;
  startedAt: Date | null;
  completedAt: Date | null;
}
 
interface FlowMetrics {
  flowTime: number;       // 요청부터 완료까지 (일)
  waitTime: number;       // 요청부터 시작까지 (일)
  activeTime: number;     // 실제 작업 시간 (일)
  flowEfficiency: number; // activeTime / flowTime × 100 (%)
}
 
function calculateFlowMetrics(item: FlowItem): FlowMetrics | null {
  if (!item.startedAt || !item.completedAt) return null;
 
  const msPerDay = 1000 * 60 * 60 * 24;
  const flowTime = (item.completedAt.getTime() - item.requestedAt.getTime()) / msPerDay;
  const waitTime = (item.startedAt.getTime() - item.requestedAt.getTime()) / msPerDay;
  const activeTime = (item.completedAt.getTime() - item.startedAt.getTime()) / msPerDay;
  const flowEfficiency = (activeTime / flowTime) * 100;
 
  return { flowTime, waitTime, activeTime, flowEfficiency };
}
 
function analyzeBottleneck(items: FlowItem[]): void {
  const metrics = items
    .map((item) => ({ item, metrics: calculateFlowMetrics(item) }))
    .filter((r) => r.metrics !== null);
 
  const avgWaitTime =
    metrics.reduce((sum, r) => sum + r.metrics!.waitTime, 0) / metrics.length;
  const avgEfficiency =
    metrics.reduce((sum, r) => sum + r.metrics!.flowEfficiency, 0) / metrics.length;
 
  console.log(`평균 대기 시간: ${avgWaitTime.toFixed(1)}일`);
  console.log(`평균 흐름 효율성: ${avgEfficiency.toFixed(1)}%`);
 
  // Mik Kersten의 연구에서 지식 노동(knowledge work)의 Flow Efficiency는
  // 통상 15~40% 범위이며, 15% 미만은 대기 시간이 압도적으로 많다는 신호다.
  if (avgEfficiency < 15) {
    console.log("⚠️  병목 감지: 실제 작업보다 대기 시간이 압도적으로 많습니다.");
    console.log("→ 기획·QA·리뷰 프로세스를 점검해보시면 좋습니다.");
  }
}

Pros and Cons Analysis

Advantages

Item	Content	Precautions
Bottleneck Visualization	Break down workflows into measurable units to discover hidden latency points	Incorrectly set measurement points can point to the wrong bottlenecks
LLM Quality Improvement	Significantly increase accuracy compared to a single prompt with step decomposition + self-verification (17% → 91% cases)	As the number of nodes increases, debugging and costs increase together
Team Autonomy	Flow-centric design reduces inter-team dependency, increasing the possibility of independent deployment	Adopting Team Topologies entails organizational change, so executive support is required
Business Linkage	Flow Metrics allows you to directly link engineering performance to business impact	Verifying business connectivity first when designing metrics prevents distortion
Scalability	Horizontal scaling is easy with parallel agent execution and an event-driven pattern	Horizontal scaling without Observability infrastructure makes it difficult to trace the cause of failures

Disadvantages and Precautions

Item	Content	Response Plan
The Paradox of Measurement	Poorly designed Flow Metrics lead to distortions where meaningless metrics, such as commit counts, are optimized	It is recommended to first verify the connection to business impact when designing metrics
Increased complexity	Multi-agent workflows are much more difficult to debug than a single LLM call	It is recommended to build the Observability infrastructure first and then add agents
Increased Costs	Token costs can increase more than linearly when chaining LLM calls	Consider using tiered caching and a mix of low-cost models
Organizational Change	The adoption of Team Topologies and Architecture for Flow entails a change in organizational culture, not just a technical issue	Driving this solely through the technology team without management support has a high probability of failure
Requirements Variability	Even if the flow is stabilized, frequent changes in requirements incur workflow redesign costs.	It is recommended to separate interfaces for nodes with a high change frequency to allow for flexible replacement.

The Most Common Mistakes in Practice

Starting Optimization Without Measurement: It is common to overhaul processes based solely on the feeling that "our team flow is slow," often resulting in improvements to areas that are not actual bottlenecks. We recommend collecting Flow Metrics first and determining areas for improvement after reviewing the data.
Expecting too much from a single LLM call: This occurs when attempting to handle a complex task in a single call results in unstable outcomes. A structure that breaks down the task into 3 to 5 steps and verifies each step is much more stable. As seen above, simply adding a single review node can drastically change the quality of the results.
Deploying agents to production without Observability: When a workflow that ran well locally behaves strangely in production, debugging is virtually impossible if you cannot see which node received what input. The figure from the LangChain report stating that 89% of companies have already implemented Observability is not without reason.

In Conclusion

Here are 3 steps you can start right now, tailored to your situation:

You can start by measuring your team's Flow Time. It is recommended to extract issues from the last three months in GitHub, Jira, or Linear and calculate the interval between "Request Date → Deployment Date." You can immediately see the average Flow Time for Features and Defects, as well as which stage (Planning Wait, Review, QA) takes the most time. If you feel your team's process is slow, I recommend starting here.
If you are using LLM, you can break down a complex prompt into three steps. By dividing it into "Search → Draft → Review" or "Analysis → Generation → Validation" and defining each step as a node in LangGraph to connect them, you can immediately see changes in accuracy. If the quality of your LLM results is unstable, we recommend starting here.
If you wish to approach this from an architectural perspective, I recommend reading Susanne Kaiser's Architecture for Flow or Mik Kersten's Project to Product. Both books explain concepts with concrete examples, making them excellent starting points for team discussions. If deployment dependencies between teams are complex, I recommend starting here.

Key Concepts

What is "Flow"

AI/LLM Context: Beyond the Limitations of a Single Prompt

Software Architecture Context: A System Optimized for the Flow of Change

Organizational/Process Context: Flow Framework and Four Flow Items

Practical Application

Example 1: Multi-Agent Research Workflow

Example 2: Finding Development Bottlenecks with Flow Metrics

Pros and Cons Analysis

Advantages

Disadvantages and Precautions

The Most Common Mistakes in Practice

In Conclusion

Reference Materials

Key Concepts

What is "Flow"

AI/LLM Context: Beyond the Limitations of a Single Prompt

Software Architecture Context: A System Optimized for the Flow of Change

Organizational/Process Context: Flow Framework and Four Flow Items

Practical Application

Example 1: Multi-Agent Research Workflow

Example 2: Finding Development Bottlenecks with Flow Metrics

Pros and Cons Analysis

Advantages

Disadvantages and Precautions

The Most Common Mistakes in Practice

In Conclusion

Reference Materials

Recommended Posts

Governance-as-Architecture: An experience eliminating quarterly reviews by automatically detecting architecture violations on every commit with ArchUnit and OPA

Deploying LLM Streaming API with Hono + Cloudflare Workers — How to Run a Type-Safe AI Layer at the Edge

Escape Over-engineering — Reducing Architecture Complexity with YAGNI, KISS, and the Rule of Three

Horizontally Scaling a Yjs Collaboration Server with Hocuspocus + Redis: Sticky Session and Document Persistence Strategies

Yjs Collaborative Editor Architecture: What I Learned Building Awareness, Offline Persistence, and Server Relay from Scratch

Google Docs Chose OT, Figma Chose CRDT — Why the Conflict Resolution Approach Determines Your Entire Real-Time Collaboration Architecture