Oh My OpenCode (oh-my-openagent) Configuration That Cuts Multi-Agent AI Coding API Costs to ~$11/Month with Category Routing
"Spend sparingly on expensive judgment; use cheap execution generously."
💡 This article is current as of May 2026. Oh My OpenCode (OMO) was rebranded to oh-my-openagent in early 2026, and the GitHub repository has moved to code-yeongyu/oh-my-openagent. The pre-rebrand version uses oh-my-opencode.json as its config file; the current post-rebrand version uses opencode.json.
Honestly, I was pretty shocked when I got my first bill after trying a multi-agent system. "This much money just to write a line of code?" When I was running everything on Claude Opus, monthly API costs reached into the hundreds of dollars. After properly configuring Oh My OpenCode (OMO), I ended up running the same scale of projects for ~$11 a month.
The core idea is simple: "You don't need the same tier of model for every agent." Route a high-performance model to the orchestrator that does strategic planning, and route cheaper models to executors that handle repetitive tasks like file navigation or documentation lookup. This article walks through a concrete configuration that combines category-based model routing with Git Worktree isolation to dramatically cut costs in practice.
Core Concepts
3-Tier Architecture: Who Does What
OMO transforms a single AI agent into a virtual development team with clearly separated roles. When I first saw the architecture diagram I thought, "Is this really necessary?" — but looking at the cost breakdown changed my mind.
| Tier | Agents | Role | Recommended Model Tier |
|---|---|---|---|
| Planning | Prometheus, Metis, Momus | Requirements analysis, strategy — no code written | High-performance |
| Orchestration | Sisyphus, Atlas | Task distribution, sub-agent lifecycle management | High-performance |
| Execution | Oracle, Librarian, Frontend Engineer, Explorer, 10+ others | Actual code writing, research, and search | Budget |
The agent names come from Greek mythology, but what matters more than the names are the role boundaries. The principle that orchestrators never write code directly is the foundation of the entire structure. By strictly maintaining the boundary between planning and execution, you can focus the tokens consumed by expensive models purely on decision-making.
Inter-agent communication happens via tool calls. Sisyphus is said to "spawn" executors, but in practice this means the orchestrator delegates work by calling specified tools and receives results back.
Why Costs Drop: The MVI Principle
MVI (Minimum Viable Intelligence): The principle of selecting the minimum intelligence level of model sufficient to perform a given task. The analogy "using Claude Opus for file search is like moving a flower pot with an excavator" fits perfectly.
Tasks like file navigation and symbol search are essentially just calling the right tool with the right arguments. Claude Haiku or Gemini Flash handle them just fine. For repetitive codebase navigation and symbol search workloads on mid-sized projects, cases have been reported where token consumption per task dropped from 8,000 tokens to around 750 tokens (roughly a 90% reduction). Results will vary by task type, but it's clear there's no reason to throw expensive models at repetitive work.
Why Category-Based Routing Is the Key
Specifying a model for each agent individually is tedious to maintain — especially once you have more than ten executor agents. The category-based routing OMO adopts solves this problem cleanly.
Per-agent override → User override → Category fallback → System defaultWhen Sisyphus spawns a sub-agent, it specifies a category (execution, research, etc.) rather than a model name, and the model mapped to that category is applied automatically. If you later swap out a budget model, you only need to change one line in the category configuration.
Practical Application
Example 1: Per-Category Model Routing Configuration
In opencode.json at the project root, you map categories to models. The concurrency values directly affect cost, so configure them carefully.
{
"categories": {
"planning": {
"model": "anthropic/claude-opus-4-7",
"concurrency": 1
},
"execution": {
"model": "google/gemini-2.0-flash",
"concurrency": 4
},
"research": {
"model": "anthropic/claude-haiku-4-5",
"concurrency": 6
}
}
}Model identifier formats vary by platform. The example above is for OpenRouter. If you're using Vertex AI or a direct API, check the model identifiers for that platform separately.
| Field | Description |
|---|---|
model |
The default model applied to all executors in that category |
concurrency |
Number of agents that can run simultaneously — orchestrators must be limited to 1 |
The reason for capping orchestrator concurrency at 1 is simple: if multiple orchestrators spawn executors simultaneously, costs balloon exponentially and agent conflicts can occur. I initially set it to 2 thinking "wouldn't parallel be faster?" — then I saw the bill and immediately reverted.
Example 2: Defining an Executor Agent for Repetitive Tasks
Define executors as Markdown files in the .opencode/agents/ directory. Keep YAML frontmatter under 10 lines to avoid wasting context window.
---
description: 코드베이스 탐색 및 심볼 검색 전담 에이전트
mode: subagent
model: anthropic/claude-haiku-4-5
temperature: 0.0
---
You are a codebase explorer. Your only job is to search, read,
and summarise code. Never write or modify files.
Always return structured findings to the orchestrator.Writing the system prompt in English is intentional. Most models currently show higher tool-call accuracy and response format consistency with English instructions. Korean prompts do work, but for repetitive execution executors that make frequent tool calls, English is recommended.
Applying temperature: 0.0 to executors increases result consistency for repetitive tasks. It prevents situations where the same file is explored twice and returned in different formats, and also reduces unnecessary retries.
Example 3: Parallel Isolated Execution with Git Worktree
Git Worktree is a feature that lets you check out multiple branches from a single repository into separate directories simultaneously. It gives each agent an independent workspace while sharing the Git object store, achieving isolation without wasting disk space.
# Prepare worktrees for parallel execution
git worktree add ../worktree-explorer-a feature/search-a
git worktree add ../worktree-explorer-b feature/search-b
git worktree add ../worktree-explorer-c feature/search-c
# Always clean up after work is done (leaving them around wastes disk space)
git worktree remove ../worktree-explorer-a
git worktree remove ../worktree-explorer-b
git worktree remove ../worktree-explorer-cThe execution flow looks like this:
Sisyphus (Orchestrator, Opus)
├── Explorer × 3 (Haiku, worktree-A/B/C in parallel) ← file navigation
├── Librarian (Haiku) ← documentation lookup
└── Frontend Engineer (Flash) ← UI code writingAtlas consolidates the results from each worktree. "Consolidation" here is not a Git merge — it's the process of combining the results each executor returns (structured exploration findings, code snippets, dependency lists, etc.) at the agent level into a single context. When the /start-work command is run, Atlas reads the plan Prometheus wrote and processes TODOs one by one, spawning an executor from the appropriate category for each TODO. Expensive models are focused exclusively on the planning and orchestration phases.
Open-Source Model Combination Examples
You don't have to use only OpenAI or Anthropic APIs. Using open-source alternative models can cut costs further.
The following is a configuration example reported by actual users. Model names and API availability change quickly, so before adopting any model, verify its current availability and tool-call support directly.
| Agent | Model Family | Role |
|---|---|---|
| Sisyphus, Atlas | Kimi K2 family | Orchestration |
| Prometheus, Metis | GLM-4 family | Planning |
| Librarian, Explorer | MiniMax family | Repetitive lookup |
Cases have been reported where this configuration enables operation at roughly $11/month. This assumes a small-to-mid-sized project with dozens of agent runs per day. The difference versus running everything on Claude Opus is significant, but estimating the cost against your own usage patterns and project scale will give you a more accurate picture.
Pros and Cons
Pros
| Item | Details |
|---|---|
| Cost optimization | Automatically routes repetitive and simple tasks to budget models, dramatically reducing API costs |
| Provider independence | Not locked into a single LLM vendor; can select the optimal model based on task characteristics |
| Parallel processing | Git Worktree-based isolation allows multiple executors to run simultaneously → faster throughput |
| Recursive delegation prevention | Explicit role boundaries in the 3-tier structure prevent agent infinite loops |
| Context independence | Each executor operates in an independent session, so prior conversation tokens don't accumulate |
Cons and Caveats
| Item | Details | Mitigation |
|---|---|---|
| Initial token overhead | Even simple tasks can consume 15,000–25,000 tokens due to context initialization | Single agents are more economical for short one-shot tasks |
| Misconfiguration billing spikes | Reports of $350+ bills from concurrency/routing misconfiguration | Start all category concurrencies at 1 and increase incrementally |
| Parallel cost multiplication | 5 parallel agents ≈ ~5× the token consumption of a single API call | Explicitly cap executor concurrency |
| Local model compatibility | Small local models (7B–13B) have low tool-call reliability and may fail to exit orchestration loops | Only use models with verified tool-call support in orchestration |
| Learning curve | Stable operation requires understanding category mapping, Worktrees, and concurrency limits | Start with BYOA mode using a single custom agent and expand from there |
BYOA (Bring Your Own Agent): A custom agent definition mode supported by OMO. You register an agent simply by adding a Markdown file to the
.opencode/agents/directory. It's a way to extend team-specific workflows without touching the existing system agents.
The Most Common Mistakes in Practice
-
Over-writing executor frontmatter. Including lengthy descriptions, examples, and caveats in agent definitions means that content is included in context on every run, wasting the very tokens you were trying to save. Keep YAML frontmatter under 10 lines, and limit system prompts to the core role definition.
-
Deploying local models to the orchestrator without verifying tool-call reliability. When I tried attaching a local 7B model to Sisyphus to cut costs, it directly produced situations where JSON-format tool calls came out malformed or where it couldn't exit the orchestration loop. When introducing a new model, running a simple standalone tool-call test first will save you from grief later.
-
Forgetting to clean up Git Worktrees. Adding worktrees on every run without removing them causes disk waste to pile up. Get into the habit of checking remaining worktrees with
git worktree listand cleaning up withgit worktree removewhen work is done.
Closing Thoughts
The essence of orchestrator-executor separation is the principle: "Spend sparingly on expensive judgment; use cheap execution generously." The configuration itself requires nothing more than a few lines of JSON and Markdown agent definition files, and you don't need to aim for a perfect setup from the start.
Three steps you can try right now:
-
Copy the example
opencode.jsonfrom thedevbranch of theoh-my-openagentGitHub repo (code-yeongyu/oh-my-openagent), add it to your project root, and setconcurrencyto 1 for all categories. This is the safest way to experience the structure first without worrying about costs. -
Create one exploration-only executor Markdown file in
.opencode/agents/. Three lines —mode: subagent,model: anthropic/claude-haiku-4-5,temperature: 0.0— is all you need for a basic setup. -
After using it for a couple of days, check the per-category token consumption ratios in your API dashboard. Once you start seeing noticeable savings in the
executionandresearchcategories, that's the time to gradually increaseconcurrencyand add parallel Worktree configurations.
The first wall you'll hit when integrating this setup into a CI/CD pipeline is branch conflicts between concurrently running agents. How to structure your Git strategy in a multi-agent environment is a topic worth covering separately.
References
- Agent Orchestration Overview (Sisyphus) — DeepWiki
- Atlas: Plan Executor — DeepWiki
- Oh My Opencode Specialised Agents Deep Dive and Model Guide — Rost Glukhov
- Oh My Opencode Review: Honest Results, Billing Risks — Rost Glukhov
- What Is Oh My OpenAgent (OMO)? Complete 2026 Guide — a2a-mcp.org
- Oh My Opencode Specialised Agents Deep Dive — DEV Community
- Deep Dive into OpenCode Agent Orchestration — DEV Community
- Agents Official Docs — OpenCode
- Multi-Agent AI Coding Workflow: Git Worktrees That Scale — The Agentic Blog