OpenCode Multi-Provider Model Routing Strategy That Cuts Your Monthly AI Coding Agent Bill by 40%+

Have you ever broken into a cold sweat looking at your end-of-month bill after using AI coding tools? When I started using Claude Sonnet as my primary model, I threw the same frontier model at everything — architecture design, boilerplate generation, you name it — and ended up with a charge nearly twice what I expected. In practice, working solo, after applying this strategy, I've kept my cloud API costs under $10 per month. This is based on moderate individual developer usage; the absolute numbers will differ for team environments, but the savings ratio should be roughly similar.

Honestly, there's no reason to process complex design-phase reasoning and generating a single line of test code with the same model. There's a way to maintain the same quality for half your current spend, and that's the multi-provider tiering strategy that automatically assigns models based on task complexity. An open-source terminal-based AI coding agent called OpenCode lets you implement this with a single JSON file.

This post covers patterns you can apply immediately in practice: three-tier model layering configuration, simplifying setup with a LiteLLM gateway, and privacy strategies for sensitive codebases. The structure lets you grasp the concepts first, then pick the example that fits your situation and follow along directly.

Core Concepts

OpenCode's Provider-Agnostic Architecture

OpenCode is a MIT-licensed open-source AI coding agent written in Go. It's a terminal-native tool, and unlike SaaS tools such as Claude Code or Cursor that are locked to specific cloud vendors, it's designed to freely mix any provider within the same agent loop. The key is that you can connect 75+ LLMs through a single opencode.json file.

Provider-Agnostic: An architectural design approach that avoids lock-in to a specific LLM vendor's API, instead enabling any model to be swapped in through a standard interface (typically an OpenAI-compatible API — an interface callable the same way as ChatGPT)

There are two key fields to focus on in the config file:

model: The primary model used for main tasks
small_model: A lightweight model automatically assigned to repetitive, simple tasks

How small_model gets triggered is probably the thing you're most curious about — I was confused about this at first too, wondering "what criteria decides when to use the smaller model?" OpenCode internally delegates auxiliary subtasks like file summarization, title generation, and simple completions to small_model. These are "peripheral tasks" that run alongside the main agent loop, not the primary tasks. The trigger is based on task type, not token count. So complex architecture design always goes to model unless you switch manually with /model.

json

{
  "$schema": "https://opencode.ai/config.json",
  "model": "anthropic/claude-sonnet-4-6",
  "small_model": "ollama/qwen3:30b-a3b",
  "provider": {
    "ollama": {
      "name": "Ollama",
      "baseURL": "http://localhost:11434/v1",
      "models": {
        "qwen3:30b-a3b": { "name": "Qwen3 30B MoE" },
        "devstral": { "name": "Devstral Small 24B" }
      }
    }
  }
}

Save this file as opencode.json in your project root to apply it only to that project, or place it at ~/.config/opencode/config.json to apply it globally across all projects.

The Cost Escalation Tiering Principle

Honestly, at first I thought "can't I just use one good model?" But when you actually analyze your task types, most time is spent on boilerplate writing, test generation, and simple refactoring — genuinely complex design decisions account for only 10–20% of total work. The key is using expensive models only for that 10–20%.

Cost Tier	Model Type	Suitable Tasks
Free (Tier 1)	Ollama local models	Code editing, boilerplate, simple implementations
Low-cost (Tier 2)	Gemini Flash, Claude Haiku	Test generation, iterative processing, documentation
High-cost (Tier 3)	Claude Opus, Sonnet	Architecture design, complex reasoning, critical decisions

Cost Escalation: A staged cost investment strategy that starts with the cheapest option and only escalates to a higher-tier model when task requirements demand it

The Maturity of Local Models in 2025–2026

Just one or two years ago, local models were at the "you can use them, but you'll end up going back to cloud anyway" stage — but that's changed now. On SWE-bench, Qwen3 30B-A3B scores 73.4% and Devstral Small 24B scores 68%. Ollama is the tool that lets you run these models locally; it has a built-in OpenAI-compatible API server accessible at http://localhost:11434/v1 — you can install it at ollama.com.

The "7x cost efficiency of Devstral vs. Claude Sonnet" figure is a comparison based on API pricing. It refers to the difference in per-token costs between using Devstral via cloud API versus using Sonnet, and running locally with Ollama makes the API cost itself zero. Hardware costs (electricity, GPU depreciation) are separate, of course. With an RTX 4090, the math works out to recouping the savings within 3–6 months, but if you already have a high-performance Mac or GPU, you can benefit immediately with no additional cost.

Local Model	SWE-bench	Minimum Hardware	Characteristics
Qwen3 30B-A3B (MoE)	73.4%	24GB VRAM	General-purpose coding, balanced reasoning
Devstral Small 24B	68%	32GB RAM (Mac) / RTX 4090	Coding-specialized, Mistral-based
Gemma 4 27B	-	24GB VRAM	Google, verified OpenCode compatible

Practical Application

Before looking at the examples, it's worth figuring out which configuration fits your situation first.

Individual developer, setting up for the first time → Example 1 (3-tier layering, simplest)
Multiple team members sharing the same config, or frequently switching providers → Example 2 (LiteLLM gateway)
Want complex automated workflows → Example 3 (agent separation)
Fintech, healthcare, or other environments where code must not leave your infrastructure → Example 4 (privacy-first)

Example 1: Three-Tier Model Layering Configuration

This is the simplest configuration to start with. Assign an Ollama local model to small_model, and switch to cloud only when complex tasks arise by using the /model command within the session.

json

{
  "$schema": "https://opencode.ai/config.json",
  "model": "anthropic/claude-sonnet-4-6",
  "small_model": "ollama/qwen3:30b-a3b",
  "provider": {
    "anthropic": {
      "apiKey": "$ANTHROPIC_API_KEY"
    },
    "ollama": {
      "name": "Ollama",
      "baseURL": "http://localhost:11434/v1",
      "models": {
        "qwen3:30b-a3b": { "name": "Qwen3 30B MoE" },
        "devstral": { "name": "Devstral Small 24B" }
      }
    }
  }
}

The "apiKey": "$ANTHROPIC_API_KEY" part is how OpenCode reads shell environment variables at runtime. If you copy this value as-is, you must have export ANTHROPIC_API_KEY=sk-ant-... set in your shell, or have a .env file in the project root. Without it, the literal string $ANTHROPIC_API_KEY gets sent as the API key, causing an authentication error. I missed this myself initially and spent a while confused.

The baseURL of http://localhost:11434/v1 for ollama/qwen3:30b-a3b is the default address that opens when you run ollama serve after installing Ollama. If you're in a Docker environment or changed the port, you'll need to update this address accordingly.

Task Type	Model Applied	Reason
Architecture planning, complex design	Claude Opus 4.7 (manual switch)	Requires advanced reasoning
General code editing, implementation	Qwen3 30B (Ollama, small_model)	Free, sufficient quality
Test generation, documentation	Claude Haiku (small_model alternative)	Low-cost iterative processing

When you want to switch models during a session, enter the /model command in the TUI or use the variant_cycle keybinding for real-time switching.

Now that you have the concepts down, let's look more deeply at the configuration for each scenario.

Example 2: Simplifying Configuration with a LiteLLM Gateway

The most tedious part of multi-provider setup is managing API keys and URL configuration for each provider. By running a LiteLLM proxy locally, OpenCode only needs to point at a single endpoint, and LiteLLM handles the actual routing.

json

{
  "$schema": "https://opencode.ai/config.json",
  "model": "gateway/claude-sonnet-4-6",
  "small_model": "gateway/ollama/qwen3:30b-a3b",
  "provider": {
    "gateway": {
      "name": "LiteLLM Gateway",
      "baseURL": "http://localhost:4000/v1",
      "apiKey": "sk-local",
      "models": {
        "claude-sonnet-4-6": {},
        "ollama/qwen3:30b-a3b": {},
        "ollama/devstral": {}
      }
    }
  }
}

The "apiKey": "sk-local" might look odd, but that's because it's a local proxy. When you run LiteLLM locally, you can configure it to pass through any arbitrary value without actual API key validation. Since nothing is going external, there's no security risk — any string will work.

LiteLLM Proxy: An open-source gateway that bundles 100+ LLMs — Anthropic, OpenAI, Ollama, Gemini, and more — behind a single OpenAI-compatible endpoint. Install with pip install litellm and run locally with litellm --config config.yaml

The real advantage of this pattern is that you can swap providers or add new models by editing only the LiteLLM config file, without touching the OpenCode configuration at all. Especially useful for configs shared across a team.

Example 3: Role-Specialized Agent Configuration (Oh My OpenCode)

Going further, you can set up a "virtual team" structure where orchestrator, planner, executor, and researcher roles are each assigned to different models. This agents field is part of the Oh My OpenCode configuration schema (an agent workflow framework that sits on top of OpenCode). This is distinct from adding directly to the base OpenCode opencode.json — you need to install Oh My OpenCode separately for this configuration to work.

json

{
  "$schema": "https://opencode.ai/config.json",
  "agents": {
    "orchestrator": {
      "model": "anthropic/claude-haiku-4-5",
      "description": "작업 분배 및 계획 수립"
    },
    "fixer": {
      "model": "ollama/devstral",
      "description": "버그 수정 및 코드 편집"
    },
    "oracle": {
      "model": "ollama/qwen3:30b-a3b",
      "description": "코드 분석 및 리뷰"
    },
    "architect": {
      "model": "anthropic/claude-opus-4-7",
      "description": "아키텍처 결정 및 설계"
    }
  }
}

The description field isn't just a comment — it's metadata that Oh My OpenCode references when deciding which agent to route a task to. The high-frequency orchestrator is handled by inexpensive Haiku, while the actually heavy computation is handled by free local models.

Example 4: Privacy-First Configuration for Sensitive Codebases

In environments like fintech or healthcare where code must not be sent to external servers, it's safest to start with a 100% local Ollama configuration.

json

{
  "$schema": "https://opencode.ai/config.json",
  "model": "ollama/qwen3:30b-a3b",
  "small_model": "ollama/devstral",
  "provider": {
    "ollama": {
      "name": "Ollama (Local Only)",
      "baseURL": "http://localhost:11434/v1",
      "models": {
        "qwen3:30b-a3b": { "name": "Qwen3 30B MoE" },
        "devstral": { "name": "Devstral Small 24B" }
      }
    }
  }
}

The http://localhost:11434/v1 in baseURL is Ollama's default address. If you've installed it locally on the standard port, you can use this address as-is. If you're in a Docker Compose environment or changed the port, update it accordingly.

Removing the cloud provider section entirely eliminates any chance of sensitive code accidentally leaking to external services.

Pros and Cons Analysis

Advantages

Item	Details
Cost reduction	40–60% savings possible vs. uniform frontier model deployment; immediate benefit if you already own high-performance hardware
Privacy guarantee	With local models, code is never sent to external servers — suitable for confidential enterprise codebases
Offline usage	Work is possible with local models alone, without an internet connection
Provider neutrality	Freely choose and swap the most advantageous model without lock-in to any specific vendor
Flexible switching	Real-time model switching during a session via the `/model` command in TUI or keybindings

Disadvantages and Caveats

Item	Details	Mitigation
Reduced token generation speed	GitHub issue #4182: reports of <0.5 tokens/sec via OpenCode vs. 12 tokens/sec running Ollama directly — an ongoing bug	Await official patch; try routing via LiteLLM gateway as a temporary workaround
Tool call quality variance	Some local models become confused on basic tool calls like file operations	Choose models based on instruction-following quality rather than SWE-bench scores
Hardware requirements	Qwen3 30B MoE needs 24GB VRAM; Devstral 24B needs 32GB RAM or RTX 4090; CPU inference is 10–50x slower	Use smaller cloud models (Haiku, Gemini Flash) as alternatives if hardware falls short
Context window limits	Models with fewer than 64K tokens are unsuitable for multi-file work	Check context size first when selecting a model
Initial setup complexity	Multi-provider configuration requires trial and error to understand each model's characteristics	Start with just the two fields `model` + `small_model` and expand gradually

instruction-following: A model's ability to accurately follow given instructions (e.g., "only edit this file", "respond in JSON only"). In coding agents, this is as practically important a metric as SWE-bench scores

Common Pitfalls in Practice

Mistaking a speed issue for a model problem — There is currently an ongoing speed degradation bug with OpenCode's Ollama routing (#4182). If a local model seems unusually slow, check this issue before swapping models. Routing through a LiteLLM gateway has worked as a temporary workaround in some cases.
Choosing a model without checking the context window — I did this myself at first, picking based on benchmark scores alone, and ended up with the context getting truncated mid-way through a multi-file task, causing the agent to behave erratically. It's recommended to first confirm whether a model supports 64K tokens or more.
Attempting a multi-agent configuration from the start — The agent separation setup in Example 3 is powerful, but it's far more stable to start with just the two fields model + small_model, measure actual cost and quality, and then expand incrementally. The more complex the configuration, the harder it is to trace where problems arise.

Closing Thoughts

More important than the model strategy itself is the habit of measuring your own work patterns with data. Real optimization begins only when you track what you're spending on which tasks.

Three steps you can take right now:

You can install Ollama and pull the Qwen3 30B-A3B model. Run ollama pull qwen3:30b-a3b to download the model and ollama serve to start the local server — it'll be accessible as an OpenAI-compatible API at http://localhost:11434/v1. If your hardware doesn't reach 24GB VRAM, try devstral or a smaller model first.
You can add a single line "small_model": "ollama/qwen3:30b-a3b" to opencode.json in your project root. Leave your existing cloud main model in place and only switch small_model to Ollama — local processing will start handling simple repetitive tasks. Observe which tasks get offloaded locally and patterns will emerge.
After about a week of use, check how your cloud API costs have changed. Anthropic lets you view daily and per-model token usage and costs in console.anthropic.com → Usage tab. If you feel the savings, consider gradually expanding to a LiteLLM gateway or agent separation configuration.

References

Official Documentation

Community Guides

Benchmarks and Model Analysis

#OpenCode#Ollama#LiteLLM#멀티프로바이더#모델라우팅#로컬LLM#AI코딩에이전트#비용최적화#프로바이더중립#OhMyOpenCode

OpenCode Multi-Provider Model Routing Strategy That Cuts Your Monthly AI Coding Agent Bill by 40%+ | DEV BAK - 기술블로그

OpenCode Multi-Provider Model Routing Strategy That Cuts Your Monthly AI Coding Agent Bill by 40%+

Core Concepts

OpenCode's Provider-Agnostic Architecture

Provider-Agnostic: An architectural design approach that avoids lock-in to a specific LLM vendor's API, instead enabling any model to be swapped in through a standard interface (typically an OpenAI-compatible API — an interface callable the same way as ChatGPT)

There are two key fields to focus on in the config file:

model: The primary model used for main tasks
small_model: A lightweight model automatically assigned to repetitive, simple tasks

json

{
  "$schema": "https://opencode.ai/config.json",
  "model": "anthropic/claude-sonnet-4-6",
  "small_model": "ollama/qwen3:30b-a3b",
  "provider": {
    "ollama": {
      "name": "Ollama",
      "baseURL": "http://localhost:11434/v1",
      "models": {
        "qwen3:30b-a3b": { "name": "Qwen3 30B MoE" },
        "devstral": { "name": "Devstral Small 24B" }
      }
    }
  }
}

Save this file as opencode.json in your project root to apply it only to that project, or place it at ~/.config/opencode/config.json to apply it globally across all projects.

The Cost Escalation Tiering Principle

Cost Tier	Model Type	Suitable Tasks
Free (Tier 1)	Ollama local models	Code editing, boilerplate, simple implementations
Low-cost (Tier 2)	Gemini Flash, Claude Haiku	Test generation, iterative processing, documentation
High-cost (Tier 3)	Claude Opus, Sonnet	Architecture design, complex reasoning, critical decisions

Cost Escalation: A staged cost investment strategy that starts with the cheapest option and only escalates to a higher-tier model when task requirements demand it

The Maturity of Local Models in 2025–2026

Local Model	SWE-bench	Minimum Hardware	Characteristics
Qwen3 30B-A3B (MoE)	73.4%	24GB VRAM	General-purpose coding, balanced reasoning
Devstral Small 24B	68%	32GB RAM (Mac) / RTX 4090	Coding-specialized, Mistral-based
Gemma 4 27B	-	24GB VRAM	Google, verified OpenCode compatible

Practical Application

Before looking at the examples, it's worth figuring out which configuration fits your situation first.

Individual developer, setting up for the first time → Example 1 (3-tier layering, simplest)
Multiple team members sharing the same config, or frequently switching providers → Example 2 (LiteLLM gateway)
Want complex automated workflows → Example 3 (agent separation)
Fintech, healthcare, or other environments where code must not leave your infrastructure → Example 4 (privacy-first)

Example 1: Three-Tier Model Layering Configuration

This is the simplest configuration to start with. Assign an Ollama local model to small_model, and switch to cloud only when complex tasks arise by using the /model command within the session.

json

{
  "$schema": "https://opencode.ai/config.json",
  "model": "anthropic/claude-sonnet-4-6",
  "small_model": "ollama/qwen3:30b-a3b",
  "provider": {
    "anthropic": {
      "apiKey": "$ANTHROPIC_API_KEY"
    },
    "ollama": {
      "name": "Ollama",
      "baseURL": "http://localhost:11434/v1",
      "models": {
        "qwen3:30b-a3b": { "name": "Qwen3 30B MoE" },
        "devstral": { "name": "Devstral Small 24B" }
      }
    }
  }
}

Task Type	Model Applied	Reason
Architecture planning, complex design	Claude Opus 4.7 (manual switch)	Requires advanced reasoning
General code editing, implementation	Qwen3 30B (Ollama, small_model)	Free, sufficient quality
Test generation, documentation	Claude Haiku (small_model alternative)	Low-cost iterative processing

When you want to switch models during a session, enter the /model command in the TUI or use the variant_cycle keybinding for real-time switching.

Now that you have the concepts down, let's look more deeply at the configuration for each scenario.

Example 2: Simplifying Configuration with a LiteLLM Gateway

json

{
  "$schema": "https://opencode.ai/config.json",
  "model": "gateway/claude-sonnet-4-6",
  "small_model": "gateway/ollama/qwen3:30b-a3b",
  "provider": {
    "gateway": {
      "name": "LiteLLM Gateway",
      "baseURL": "http://localhost:4000/v1",
      "apiKey": "sk-local",
      "models": {
        "claude-sonnet-4-6": {},
        "ollama/qwen3:30b-a3b": {},
        "ollama/devstral": {}
      }
    }
  }
}

LiteLLM Proxy: An open-source gateway that bundles 100+ LLMs — Anthropic, OpenAI, Ollama, Gemini, and more — behind a single OpenAI-compatible endpoint. Install with pip install litellm and run locally with litellm --config config.yaml

Example 3: Role-Specialized Agent Configuration (Oh My OpenCode)

json

{
  "$schema": "https://opencode.ai/config.json",
  "agents": {
    "orchestrator": {
      "model": "anthropic/claude-haiku-4-5",
      "description": "작업 분배 및 계획 수립"
    },
    "fixer": {
      "model": "ollama/devstral",
      "description": "버그 수정 및 코드 편집"
    },
    "oracle": {
      "model": "ollama/qwen3:30b-a3b",
      "description": "코드 분석 및 리뷰"
    },
    "architect": {
      "model": "anthropic/claude-opus-4-7",
      "description": "아키텍처 결정 및 설계"
    }
  }
}

Example 4: Privacy-First Configuration for Sensitive Codebases

In environments like fintech or healthcare where code must not be sent to external servers, it's safest to start with a 100% local Ollama configuration.

json

{
  "$schema": "https://opencode.ai/config.json",
  "model": "ollama/qwen3:30b-a3b",
  "small_model": "ollama/devstral",
  "provider": {
    "ollama": {
      "name": "Ollama (Local Only)",
      "baseURL": "http://localhost:11434/v1",
      "models": {
        "qwen3:30b-a3b": { "name": "Qwen3 30B MoE" },
        "devstral": { "name": "Devstral Small 24B" }
      }
    }
  }
}

Removing the cloud provider section entirely eliminates any chance of sensitive code accidentally leaking to external services.

Pros and Cons Analysis

Advantages

Item	Details
Cost reduction	40–60% savings possible vs. uniform frontier model deployment; immediate benefit if you already own high-performance hardware
Privacy guarantee	With local models, code is never sent to external servers — suitable for confidential enterprise codebases
Offline usage	Work is possible with local models alone, without an internet connection
Provider neutrality	Freely choose and swap the most advantageous model without lock-in to any specific vendor
Flexible switching	Real-time model switching during a session via the `/model` command in TUI or keybindings

Disadvantages and Caveats

Item	Details	Mitigation
Reduced token generation speed	GitHub issue #4182: reports of <0.5 tokens/sec via OpenCode vs. 12 tokens/sec running Ollama directly — an ongoing bug	Await official patch; try routing via LiteLLM gateway as a temporary workaround
Tool call quality variance	Some local models become confused on basic tool calls like file operations	Choose models based on instruction-following quality rather than SWE-bench scores
Hardware requirements	Qwen3 30B MoE needs 24GB VRAM; Devstral 24B needs 32GB RAM or RTX 4090; CPU inference is 10–50x slower	Use smaller cloud models (Haiku, Gemini Flash) as alternatives if hardware falls short
Context window limits	Models with fewer than 64K tokens are unsuitable for multi-file work	Check context size first when selecting a model
Initial setup complexity	Multi-provider configuration requires trial and error to understand each model's characteristics	Start with just the two fields `model` + `small_model` and expand gradually

instruction-following: A model's ability to accurately follow given instructions (e.g., "only edit this file", "respond in JSON only"). In coding agents, this is as practically important a metric as SWE-bench scores

Common Pitfalls in Practice

Mistaking a speed issue for a model problem — There is currently an ongoing speed degradation bug with OpenCode's Ollama routing (#4182). If a local model seems unusually slow, check this issue before swapping models. Routing through a LiteLLM gateway has worked as a temporary workaround in some cases.
Choosing a model without checking the context window — I did this myself at first, picking based on benchmark scores alone, and ended up with the context getting truncated mid-way through a multi-file task, causing the agent to behave erratically. It's recommended to first confirm whether a model supports 64K tokens or more.
Attempting a multi-agent configuration from the start — The agent separation setup in Example 3 is powerful, but it's far more stable to start with just the two fields model + small_model, measure actual cost and quality, and then expand incrementally. The more complex the configuration, the harder it is to trace where problems arise.

Closing Thoughts

More important than the model strategy itself is the habit of measuring your own work patterns with data. Real optimization begins only when you track what you're spending on which tasks.

Three steps you can take right now:

You can install Ollama and pull the Qwen3 30B-A3B model. Run ollama pull qwen3:30b-a3b to download the model and ollama serve to start the local server — it'll be accessible as an OpenAI-compatible API at http://localhost:11434/v1. If your hardware doesn't reach 24GB VRAM, try devstral or a smaller model first.
You can add a single line "small_model": "ollama/qwen3:30b-a3b" to opencode.json in your project root. Leave your existing cloud main model in place and only switch small_model to Ollama — local processing will start handling simple repetitive tasks. Observe which tasks get offloaded locally and patterns will emerge.
After about a week of use, check how your cloud API costs have changed. Anthropic lets you view daily and per-model token usage and costs in console.anthropic.com → Usage tab. If you feel the savings, consider gradually expanding to a LiteLLM gateway or agent separation configuration.

References

Official Documentation

Community Guides

Benchmarks and Model Analysis

#OpenCode#Ollama#LiteLLM#멀티프로바이더#모델라우팅#로컬LLM#AI코딩에이전트#비용최적화#프로바이더중립#OhMyOpenCode

Core Concepts

OpenCode's Provider-Agnostic Architecture

The Cost Escalation Tiering Principle

The Maturity of Local Models in 2025–2026

Practical Application

Example 1: Three-Tier Model Layering Configuration

Example 2: Simplifying Configuration with a LiteLLM Gateway

Example 3: Role-Specialized Agent Configuration (Oh My OpenCode)

Example 4: Privacy-First Configuration for Sensitive Codebases

Pros and Cons Analysis

Advantages

Disadvantages and Caveats

Common Pitfalls in Practice

Closing Thoughts

References

Core Concepts

OpenCode's Provider-Agnostic Architecture

The Cost Escalation Tiering Principle

The Maturity of Local Models in 2025–2026

Practical Application

Example 1: Three-Tier Model Layering Configuration

Example 2: Simplifying Configuration with a LiteLLM Gateway

Example 3: Role-Specialized Agent Configuration (Oh My OpenCode)

Example 4: Privacy-First Configuration for Sensitive Codebases

Pros and Cons Analysis

Advantages

Disadvantages and Caveats

Common Pitfalls in Practice

Closing Thoughts

References

Recommended Posts

Oh My OpenCode (oh-my-openagent) Configuration That Cuts Multi-Agent AI Coding API Costs to ~$11/Month with Category Routing

Why AI Is Blocking Your PR Reviews — Clearing the Bottleneck with Tools, Process, and Architecture

AI Code Review That Reasons Over the Entire Repository Beyond PR Diffs — How Codebase Semantic Graphs Catch Cross-File Bugs

OpenCode Plan/Build Mode: Making AI Show You the Plan Before Touching Your Code

Running an AI Coding Agent in the Terminal Without the Cloud — Connecting Local LLMs with Ollama + OpenCode

Building a TypeScript LSP Self-Correction Loop with OpenCode — AI That Catches Its Own Type Errors