Running an AI Coding Agent in the Terminal Without the Cloud — Connecting Local LLMs with Ollama + OpenCode

I'll be honest: I gave up on setting up a local LLM more than once. I relied on cloud APIs with the attitude of "why not just use the Claude API," until one day I found myself completely stuck trying to make an urgent code fix on an airplane. That's when I started seriously digging into an offline AI coding environment.

After about a month of trial and error, I finally landed on a setup that works pretty well, and I've documented it here. This is a great read for anyone comfortable with the terminal but new to local LLMs. OpenCode is a terminal-based AI coding agent built in Go by the SST team, and the structure for connecting it offline is surprisingly simple. The key is that OpenCode supports OpenAI-compatible API endpoints, so all you need to do is point baseURL at your local LLM server address. All computation runs on your own machine with no cloud dependency, and not a single line of your code leaves your computer.

By the end of this post, you'll understand how to connect Ollama or LM Studio to OpenCode, how to choose a model that fits your hardware, and the pitfalls that come up most often in practice. If code privacy matters to you or you want to automate your development workflow without API costs, this setup is a genuinely practical option.

Core Concepts

How OpenCode Connects to a Local LLM

OpenCode's architecture in one sentence: "wherever the config file points is where the model lives." At runtime it reads opencode.json, recognizes the baseURL as the address of a local server, and from that point on everything flows exactly the same as with a cloud API.

[opencode.json config] ──read at startup──▶ [OpenCode TUI]
                                                   │
                                                   ▼
                                [Local LLM server (Ollama / LM Studio)]
                                                   │
                                                   ▼
                                [Model files: Qwen3-Coder, Llama 3, Gemma 3, etc.]

OpenAI-compatible API: An interface that follows the /v1/chat/completions endpoint specification defined by OpenAI. Most local LLM servers — Ollama, LM Studio, and others — support this spec, so you can switch providers by replacing only the baseURL without changing any client code.

OpenCode offers two agent modes you can toggle with the Tab key. The Build agent has full permissions including file editing and shell execution, while the Plan agent is read-only and requires user approval before running bash commands. When setting up an offline environment for the first time, it's safer to verify behavior in Plan mode first. That's the order I followed, and it saved me from a lot of mistakes.

Choosing a Model for Your Hardware

Model selection determines almost all of the perceived performance in an offline environment. Here are realistic options organized by memory:

Environment	Recommended Model	Notes
8GB VRAM (discrete GPU)	7B 4-bit quantized model	RTX 3060/4060, etc.
16GB RAM (unified memory)	Qwen2.5-Coder 7B, Gemma 3 4B	M1/M2 MacBook baseline
32GB RAM (unified memory)	Qwen3-Coder 30B, Gemma 3 27B	M2 Pro/Max or better
RTX 4060 Ti (16GB VRAM)	Llama 3.1 8B (~60 tokens/s)	High-performance discrete GPU

Looking at the table, you might be confused to see "unified memory" and "VRAM" side by side — one thing is worth clarifying. Apple Silicon's unified memory architecture lets the CPU and GPU share the same memory pool, so all 16GB of RAM is available for loading models. With a discrete GPU, the model only fits within the VRAM capacity; anything that spills over to system RAM causes a dramatic speed drop. That's why the same 16GB feels different on an M1 MacBook versus an RTX 3060 (12GB VRAM).

I mostly use qwen2.5-coder:7b on an M2 MacBook Pro with 16GB, and it runs at a perfectly usable speed for straightforward refactoring and file edits. That said, if you're expecting complex multi-step tasks, you may be a little disappointed.

Practical Application

Example 1: Offline Setup with Ollama + Qwen2.5-Coder (Fastest Start)

This is the most commonly used combination. Ollama handles automatic GPU detection and OpenAI-compatible API in one package, keeping configuration minimal.

bash

# 1. Install Ollama (macOS)
brew install ollama
 
# 2. Download a model
ollama pull qwen2.5-coder:7b      # For 16GB RAM, ~4.5GB
# ollama pull qwen3-coder:30b     # Recommended if you have 32GB RAM, ~20GB
 
# 3. Install OpenCode
npm install -g opencode-ai
# If you use pnpm: pnpm add -g opencode-ai
# If you prefer Homebrew on macOS: brew install opencode
#   (the brew method lets the package manager handle automatic updates)
 
# 4. Write the config file
# Path: ~/.config/opencode/opencode.json

The config file is the crux of this — I wasted two hours on it when I first started. Let me walk through each field.

json

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Ollama (local)",
      "options": {
        "baseURL": "http://localhost:11434/v1"
      },
      "models": {
        "qwen2.5-coder:7b": { "name": "Qwen2.5 Coder 7B" }
      }
    }
  },
  "disabled_providers": ["openai", "anthropic", "gemini"]
}

Config Key	Role
`npm`	The adapter package OpenCode uses internally to communicate with this provider. `@ai-sdk/openai-compatible` is the official adapter for OpenAI-compatible APIs.
`baseURL`	The local server address OpenCode sends queries to
`disabled_providers`	Disables cloud providers so you don't get errors when no API keys are present
`models`	Explicitly registers models when OpenCode can't auto-detect the model list

bash

# 5-A. Ollama v0.15+ — if you want to start immediately without writing a config file
ollama launch opencode
 
# 5-B. If you wrote the config file manually
opencode

ollama launch opencode is a feature available since Ollama v0.15 that automatically handles the connection setup between Ollama and OpenCode. If you're setting this up for the first time, this is the recommended way to verify that things are working. However, if you need fine-grained control — like choosing a specific model or disabling certain providers — the JSON config file approach is more flexible.

Example 2: Offline Environment with LM Studio and GPU Acceleration (GUI-Based)

LM Studio lets you manage models through a GUI, making it accessible even for those less comfortable with the terminal. Once you start the local server, it exposes an OpenAI-compatible API at http://localhost:1234/v1. The structure is identical to Example 1 — only the baseURL port number and model path differ.

Load the model you want in the LM Studio app, click "Start Server," then write the config file.

json

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "lmstudio": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "LM Studio (local)",
      "options": {
        "baseURL": "http://localhost:1234/v1"
      },
      "models": {
        "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF": {
          "name": "Llama 3.1 8B"
        }
      }
    }
  },
  "disabled_providers": ["openai", "anthropic", "gemini"]
}

If you have a discrete GPU, LM Studio automatically leverages GPU layers, so the same model will run noticeably faster than CPU inference. I once watched a colleague run this on an RTX 4060 with LM Studio, and even with the same Llama 3.1 8B model, it felt clearly faster than CPU inference on my MacBook.

Example 3: Shared Team Offline LLM Environment (Internal Corporate Network — Intermediate and Above)

This example involves environment variables and internal IP configuration, so it's aimed at intermediate-level users. The setup deploys Ollama on an internal server in an air-gapped network and shares it across the entire team. Since code never leaves the environment, this is viable even in domains with strict data security requirements such as finance, healthcare, or public sector. I know someone who set this up as a shared team environment and has been using it successfully.

bash

# Run Ollama on the internal server with team-wide access
OLLAMA_HOST=0.0.0.0:11434 ollama serve

Security note: Binding to 0.0.0.0 allows access from any host on the same network. Even within a corporate intranet, it is recommended to restrict allowed IPs with firewall rules. An Ollama server left open without authentication is something your security team will flag.

json

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "internal-llm": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Internal LLM Server",
      "options": {
        "baseURL": "http://192.168.1.100:11434/v1"
      },
      "models": {
        "qwen3-coder:30b": { "name": "Qwen3-Coder 30B (Shared)" }
      }
    }
  }
}

Each team member sets the baseURL in their opencode.json to point to the internal server IP. Replace 192.168.1.100 with your actual server IP.

The Most Common Mistakes in Practice

Omitting disabled_providers — If you don't disable cloud providers, you'll either get errors about missing API keys or inadvertently send requests to the cloud. When running in offline-only mode, always declare this explicitly.
Opening OpenCode while the Ollama server isn't running — If the model connection fails when you launch OpenCode, the first thing to check is whether ollama serve is running in a terminal. I spent a long time confused by this myself at first.
Not typing the model name exactly — The model name you put in the models key in opencode.json must exactly match the name shown by ollama list. For example, qwen2.5-coder and qwen2.5-coder:7b may be recognized as different models.

Pros and Cons

Advantages

Item	Details
Complete privacy	Code and context are never sent to an external server
No API costs	Unlimited queries after initial setup, zero fees
Offline operation	Works on airplanes, air-gapped networks, and unstable internet
No rate limits	Query as much as your hardware allows
Reduced latency	Token generation starts immediately with no API round-trip
No vendor lock-in	Switch models and providers at any time

Disadvantages and Caveats

Item	Details	Mitigation
Model quality gap	Inferior complex reasoning performance compared to GPT-4o and Claude Sonnet	Focus on simpler tasks like file editing and refactoring
Hardware costs	"Free" but GPU VRAM, power, and maintenance costs apply	Use hardware you already own; consider running alongside cloud API
Setup complexity	More initial configuration steps than cloud API key approach	Can be simplified to a single command with `ollama launch opencode`
Model download size	~4.5GB for 7B, 20GB+ for 30B required upfront	Download in advance when on a good network
Multi-step autonomous task limitations	Cloud models still outperform for complex agentic tasks	Use Plan mode and proceed step by step with manual confirmation

Quantization: A technique that compresses model weights from 32-bit floating point to 4-bit integers and similar formats. It improves memory usage and speed at the cost of minor quality degradation. Most models you download from Ollama are 4-bit quantized versions in GGUF format.

TUI (Terminal User Interface): A graphical interface composed of text in a terminal environment. OpenCode is a TUI app operated entirely by keyboard without a mouse.

Closing Thoughts

The core of an offline OpenCode setup is simple — start a local LLM server and point the baseURL in opencode.json at that address.

If code privacy matters to you, you need to develop without internet access, or you want to automate simple editing tasks without API costs, this setup is a sufficiently practical choice. I always prefer starting in Plan mode: begin by having it read files and answer simple questions to verify behavior, then move to Build mode once you're comfortable. That approach has saved me from a lot of mistakes.

Three steps you can try right now:

Run brew install ollama and then ollama pull qwen2.5-coder:7b to download a model. If you're on a 16GB RAM machine, that combination is enough to get started.
Write ~/.config/opencode/opencode.json following the example format above. The key points are setting baseURL to http://localhost:11434/v1 and listing your cloud providers in disabled_providers.
Run opencode from your project folder and explore in Plan mode first. Use the Tab key to switch between Build and Plan modes.

References

#Ollama#OpenCode#로컬LLM#LMStudio#OpenAI호환API#양자화#TUI#오프라인AI#AI코딩에이전트#프라이버시

Running an AI Coding Agent in the Terminal Without the Cloud — Connecting Local LLMs with Ollama + OpenCode | DEV BAK - 기술블로그

Running an AI Coding Agent in the Terminal Without the Cloud — Connecting Local LLMs with Ollama + OpenCode

Core Concepts

How OpenCode Connects to a Local LLM

[opencode.json config] ──read at startup──▶ [OpenCode TUI]
                                                   │
                                                   ▼
                                [Local LLM server (Ollama / LM Studio)]
                                                   │
                                                   ▼
                                [Model files: Qwen3-Coder, Llama 3, Gemma 3, etc.]

OpenAI-compatible API: An interface that follows the /v1/chat/completions endpoint specification defined by OpenAI. Most local LLM servers — Ollama, LM Studio, and others — support this spec, so you can switch providers by replacing only the baseURL without changing any client code.

Choosing a Model for Your Hardware

Model selection determines almost all of the perceived performance in an offline environment. Here are realistic options organized by memory:

Environment	Recommended Model	Notes
8GB VRAM (discrete GPU)	7B 4-bit quantized model	RTX 3060/4060, etc.
16GB RAM (unified memory)	Qwen2.5-Coder 7B, Gemma 3 4B	M1/M2 MacBook baseline
32GB RAM (unified memory)	Qwen3-Coder 30B, Gemma 3 27B	M2 Pro/Max or better
RTX 4060 Ti (16GB VRAM)	Llama 3.1 8B (~60 tokens/s)	High-performance discrete GPU

Practical Application

Example 1: Offline Setup with Ollama + Qwen2.5-Coder (Fastest Start)

This is the most commonly used combination. Ollama handles automatic GPU detection and OpenAI-compatible API in one package, keeping configuration minimal.

bash

# 1. Install Ollama (macOS)
brew install ollama
 
# 2. Download a model
ollama pull qwen2.5-coder:7b      # For 16GB RAM, ~4.5GB
# ollama pull qwen3-coder:30b     # Recommended if you have 32GB RAM, ~20GB
 
# 3. Install OpenCode
npm install -g opencode-ai
# If you use pnpm: pnpm add -g opencode-ai
# If you prefer Homebrew on macOS: brew install opencode
#   (the brew method lets the package manager handle automatic updates)
 
# 4. Write the config file
# Path: ~/.config/opencode/opencode.json

The config file is the crux of this — I wasted two hours on it when I first started. Let me walk through each field.

json

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Ollama (local)",
      "options": {
        "baseURL": "http://localhost:11434/v1"
      },
      "models": {
        "qwen2.5-coder:7b": { "name": "Qwen2.5 Coder 7B" }
      }
    }
  },
  "disabled_providers": ["openai", "anthropic", "gemini"]
}

Config Key	Role
`npm`	The adapter package OpenCode uses internally to communicate with this provider. `@ai-sdk/openai-compatible` is the official adapter for OpenAI-compatible APIs.
`baseURL`	The local server address OpenCode sends queries to
`disabled_providers`	Disables cloud providers so you don't get errors when no API keys are present
`models`	Explicitly registers models when OpenCode can't auto-detect the model list

bash

# 5-A. Ollama v0.15+ — if you want to start immediately without writing a config file
ollama launch opencode
 
# 5-B. If you wrote the config file manually
opencode

Example 2: Offline Environment with LM Studio and GPU Acceleration (GUI-Based)

Load the model you want in the LM Studio app, click "Start Server," then write the config file.

json

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "lmstudio": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "LM Studio (local)",
      "options": {
        "baseURL": "http://localhost:1234/v1"
      },
      "models": {
        "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF": {
          "name": "Llama 3.1 8B"
        }
      }
    }
  },
  "disabled_providers": ["openai", "anthropic", "gemini"]
}

Example 3: Shared Team Offline LLM Environment (Internal Corporate Network — Intermediate and Above)

bash

# Run Ollama on the internal server with team-wide access
OLLAMA_HOST=0.0.0.0:11434 ollama serve

Security note: Binding to 0.0.0.0 allows access from any host on the same network. Even within a corporate intranet, it is recommended to restrict allowed IPs with firewall rules. An Ollama server left open without authentication is something your security team will flag.

json

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "internal-llm": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Internal LLM Server",
      "options": {
        "baseURL": "http://192.168.1.100:11434/v1"
      },
      "models": {
        "qwen3-coder:30b": { "name": "Qwen3-Coder 30B (Shared)" }
      }
    }
  }
}

Each team member sets the baseURL in their opencode.json to point to the internal server IP. Replace 192.168.1.100 with your actual server IP.

The Most Common Mistakes in Practice

Omitting disabled_providers — If you don't disable cloud providers, you'll either get errors about missing API keys or inadvertently send requests to the cloud. When running in offline-only mode, always declare this explicitly.
Opening OpenCode while the Ollama server isn't running — If the model connection fails when you launch OpenCode, the first thing to check is whether ollama serve is running in a terminal. I spent a long time confused by this myself at first.
Not typing the model name exactly — The model name you put in the models key in opencode.json must exactly match the name shown by ollama list. For example, qwen2.5-coder and qwen2.5-coder:7b may be recognized as different models.

Pros and Cons

Advantages

Item	Details
Complete privacy	Code and context are never sent to an external server
No API costs	Unlimited queries after initial setup, zero fees
Offline operation	Works on airplanes, air-gapped networks, and unstable internet
No rate limits	Query as much as your hardware allows
Reduced latency	Token generation starts immediately with no API round-trip
No vendor lock-in	Switch models and providers at any time

Disadvantages and Caveats

Item	Details	Mitigation
Model quality gap	Inferior complex reasoning performance compared to GPT-4o and Claude Sonnet	Focus on simpler tasks like file editing and refactoring
Hardware costs	"Free" but GPU VRAM, power, and maintenance costs apply	Use hardware you already own; consider running alongside cloud API
Setup complexity	More initial configuration steps than cloud API key approach	Can be simplified to a single command with `ollama launch opencode`
Model download size	~4.5GB for 7B, 20GB+ for 30B required upfront	Download in advance when on a good network
Multi-step autonomous task limitations	Cloud models still outperform for complex agentic tasks	Use Plan mode and proceed step by step with manual confirmation

Quantization: A technique that compresses model weights from 32-bit floating point to 4-bit integers and similar formats. It improves memory usage and speed at the cost of minor quality degradation. Most models you download from Ollama are 4-bit quantized versions in GGUF format.

TUI (Terminal User Interface): A graphical interface composed of text in a terminal environment. OpenCode is a TUI app operated entirely by keyboard without a mouse.

Closing Thoughts

The core of an offline OpenCode setup is simple — start a local LLM server and point the baseURL in opencode.json at that address.

Three steps you can try right now:

Run brew install ollama and then ollama pull qwen2.5-coder:7b to download a model. If you're on a 16GB RAM machine, that combination is enough to get started.
Write ~/.config/opencode/opencode.json following the example format above. The key points are setting baseURL to http://localhost:11434/v1 and listing your cloud providers in disabled_providers.
Run opencode from your project folder and explore in Plan mode first. Use the Tab key to switch between Build and Plan modes.

References

#Ollama#OpenCode#로컬LLM#LMStudio#OpenAI호환API#양자화#TUI#오프라인AI#AI코딩에이전트#프라이버시

Core Concepts

How OpenCode Connects to a Local LLM

Choosing a Model for Your Hardware

Practical Application

Example 1: Offline Setup with Ollama + Qwen2.5-Coder (Fastest Start)

Example 2: Offline Environment with LM Studio and GPU Acceleration (GUI-Based)

Example 3: Shared Team Offline LLM Environment (Internal Corporate Network — Intermediate and Above)

The Most Common Mistakes in Practice

Pros and Cons

Advantages

Disadvantages and Caveats

Closing Thoughts

References

Core Concepts

How OpenCode Connects to a Local LLM

Choosing a Model for Your Hardware

Practical Application

Example 1: Offline Setup with Ollama + Qwen2.5-Coder (Fastest Start)

Example 2: Offline Environment with LM Studio and GPU Acceleration (GUI-Based)

Example 3: Shared Team Offline LLM Environment (Internal Corporate Network — Intermediate and Above)

The Most Common Mistakes in Practice

Pros and Cons

Advantages

Disadvantages and Caveats

Closing Thoughts

References

Recommended Posts

OpenCode Plan/Build Mode: Making AI Show You the Plan Before Touching Your Code

OpenCode Multi-Provider Model Routing Strategy That Cuts Your Monthly AI Coding Agent Bill by 40%+

Oh My OpenCode (oh-my-openagent) Configuration That Cuts Multi-Agent AI Coding API Costs to ~$11/Month with Category Routing

Building a TypeScript LSP Self-Correction Loop with OpenCode — AI That Catches Its Own Type Errors

OpenCode vs Claude Code: Comparing Terminal AI Agents and Choosing the Right One for Your Team

Building a Local LLM Infrastructure with Ollama + Hermes — $0 API Costs, Zero Data Leakage