Running an AI Coding Agent in the Terminal Without the Cloud — Connecting Local LLMs with Ollama + OpenCode
I'll be honest: I gave up on setting up a local LLM more than once. I relied on cloud APIs with the attitude of "why not just use the Claude API," until one day I found myself completely stuck trying to make an urgent code fix on an airplane. That's when I started seriously digging into an offline AI coding environment.
After about a month of trial and error, I finally landed on a setup that works pretty well, and I've documented it here. This is a great read for anyone comfortable with the terminal but new to local LLMs. OpenCode is a terminal-based AI coding agent built in Go by the SST team, and the structure for connecting it offline is surprisingly simple. The key is that OpenCode supports OpenAI-compatible API endpoints, so all you need to do is point baseURL at your local LLM server address. All computation runs on your own machine with no cloud dependency, and not a single line of your code leaves your computer.
By the end of this post, you'll understand how to connect Ollama or LM Studio to OpenCode, how to choose a model that fits your hardware, and the pitfalls that come up most often in practice. If code privacy matters to you or you want to automate your development workflow without API costs, this setup is a genuinely practical option.
Core Concepts
How OpenCode Connects to a Local LLM
OpenCode's architecture in one sentence: "wherever the config file points is where the model lives." At runtime it reads opencode.json, recognizes the baseURL as the address of a local server, and from that point on everything flows exactly the same as with a cloud API.
[opencode.json config] ──read at startup──▶ [OpenCode TUI]
│
▼
[Local LLM server (Ollama / LM Studio)]
│
▼
[Model files: Qwen3-Coder, Llama 3, Gemma 3, etc.]OpenAI-compatible API: An interface that follows the
/v1/chat/completionsendpoint specification defined by OpenAI. Most local LLM servers — Ollama, LM Studio, and others — support this spec, so you can switch providers by replacing only thebaseURLwithout changing any client code.
OpenCode offers two agent modes you can toggle with the Tab key. The Build agent has full permissions including file editing and shell execution, while the Plan agent is read-only and requires user approval before running bash commands. When setting up an offline environment for the first time, it's safer to verify behavior in Plan mode first. That's the order I followed, and it saved me from a lot of mistakes.
Choosing a Model for Your Hardware
Model selection determines almost all of the perceived performance in an offline environment. Here are realistic options organized by memory:
| Environment | Recommended Model | Notes |
|---|---|---|
| 8GB VRAM (discrete GPU) | 7B 4-bit quantized model | RTX 3060/4060, etc. |
| 16GB RAM (unified memory) | Qwen2.5-Coder 7B, Gemma 3 4B | M1/M2 MacBook baseline |
| 32GB RAM (unified memory) | Qwen3-Coder 30B, Gemma 3 27B | M2 Pro/Max or better |
| RTX 4060 Ti (16GB VRAM) | Llama 3.1 8B (~60 tokens/s) | High-performance discrete GPU |
Looking at the table, you might be confused to see "unified memory" and "VRAM" side by side — one thing is worth clarifying. Apple Silicon's unified memory architecture lets the CPU and GPU share the same memory pool, so all 16GB of RAM is available for loading models. With a discrete GPU, the model only fits within the VRAM capacity; anything that spills over to system RAM causes a dramatic speed drop. That's why the same 16GB feels different on an M1 MacBook versus an RTX 3060 (12GB VRAM).
I mostly use qwen2.5-coder:7b on an M2 MacBook Pro with 16GB, and it runs at a perfectly usable speed for straightforward refactoring and file edits. That said, if you're expecting complex multi-step tasks, you may be a little disappointed.
Practical Application
Example 1: Offline Setup with Ollama + Qwen2.5-Coder (Fastest Start)
This is the most commonly used combination. Ollama handles automatic GPU detection and OpenAI-compatible API in one package, keeping configuration minimal.
# 1. Install Ollama (macOS)
brew install ollama
# 2. Download a model
ollama pull qwen2.5-coder:7b # For 16GB RAM, ~4.5GB
# ollama pull qwen3-coder:30b # Recommended if you have 32GB RAM, ~20GB
# 3. Install OpenCode
npm install -g opencode-ai
# If you use pnpm: pnpm add -g opencode-ai
# If you prefer Homebrew on macOS: brew install opencode
# (the brew method lets the package manager handle automatic updates)
# 4. Write the config file
# Path: ~/.config/opencode/opencode.jsonThe config file is the crux of this — I wasted two hours on it when I first started. Let me walk through each field.
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"ollama": {
"npm": "@ai-sdk/openai-compatible",
"name": "Ollama (local)",
"options": {
"baseURL": "http://localhost:11434/v1"
},
"models": {
"qwen2.5-coder:7b": { "name": "Qwen2.5 Coder 7B" }
}
}
},
"disabled_providers": ["openai", "anthropic", "gemini"]
}| Config Key | Role |
|---|---|
npm |
The adapter package OpenCode uses internally to communicate with this provider. @ai-sdk/openai-compatible is the official adapter for OpenAI-compatible APIs. |
baseURL |
The local server address OpenCode sends queries to |
disabled_providers |
Disables cloud providers so you don't get errors when no API keys are present |
models |
Explicitly registers models when OpenCode can't auto-detect the model list |
# 5-A. Ollama v0.15+ — if you want to start immediately without writing a config file
ollama launch opencode
# 5-B. If you wrote the config file manually
opencodeollama launch opencode is a feature available since Ollama v0.15 that automatically handles the connection setup between Ollama and OpenCode. If you're setting this up for the first time, this is the recommended way to verify that things are working. However, if you need fine-grained control — like choosing a specific model or disabling certain providers — the JSON config file approach is more flexible.
Example 2: Offline Environment with LM Studio and GPU Acceleration (GUI-Based)
LM Studio lets you manage models through a GUI, making it accessible even for those less comfortable with the terminal. Once you start the local server, it exposes an OpenAI-compatible API at http://localhost:1234/v1. The structure is identical to Example 1 — only the baseURL port number and model path differ.
Load the model you want in the LM Studio app, click "Start Server," then write the config file.
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"lmstudio": {
"npm": "@ai-sdk/openai-compatible",
"name": "LM Studio (local)",
"options": {
"baseURL": "http://localhost:1234/v1"
},
"models": {
"lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF": {
"name": "Llama 3.1 8B"
}
}
}
},
"disabled_providers": ["openai", "anthropic", "gemini"]
}If you have a discrete GPU, LM Studio automatically leverages GPU layers, so the same model will run noticeably faster than CPU inference. I once watched a colleague run this on an RTX 4060 with LM Studio, and even with the same Llama 3.1 8B model, it felt clearly faster than CPU inference on my MacBook.
Example 3: Shared Team Offline LLM Environment (Internal Corporate Network — Intermediate and Above)
This example involves environment variables and internal IP configuration, so it's aimed at intermediate-level users. The setup deploys Ollama on an internal server in an air-gapped network and shares it across the entire team. Since code never leaves the environment, this is viable even in domains with strict data security requirements such as finance, healthcare, or public sector. I know someone who set this up as a shared team environment and has been using it successfully.
# Run Ollama on the internal server with team-wide access
OLLAMA_HOST=0.0.0.0:11434 ollama serveSecurity note: Binding to
0.0.0.0allows access from any host on the same network. Even within a corporate intranet, it is recommended to restrict allowed IPs with firewall rules. An Ollama server left open without authentication is something your security team will flag.
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"internal-llm": {
"npm": "@ai-sdk/openai-compatible",
"name": "Internal LLM Server",
"options": {
"baseURL": "http://192.168.1.100:11434/v1"
},
"models": {
"qwen3-coder:30b": { "name": "Qwen3-Coder 30B (Shared)" }
}
}
}
}Each team member sets the baseURL in their opencode.json to point to the internal server IP. Replace 192.168.1.100 with your actual server IP.
The Most Common Mistakes in Practice
-
Omitting
disabled_providers— If you don't disable cloud providers, you'll either get errors about missing API keys or inadvertently send requests to the cloud. When running in offline-only mode, always declare this explicitly. -
Opening OpenCode while the Ollama server isn't running — If the model connection fails when you launch OpenCode, the first thing to check is whether
ollama serveis running in a terminal. I spent a long time confused by this myself at first. -
Not typing the model name exactly — The model name you put in the
modelskey inopencode.jsonmust exactly match the name shown byollama list. For example,qwen2.5-coderandqwen2.5-coder:7bmay be recognized as different models.
Pros and Cons
Advantages
| Item | Details |
|---|---|
| Complete privacy | Code and context are never sent to an external server |
| No API costs | Unlimited queries after initial setup, zero fees |
| Offline operation | Works on airplanes, air-gapped networks, and unstable internet |
| No rate limits | Query as much as your hardware allows |
| Reduced latency | Token generation starts immediately with no API round-trip |
| No vendor lock-in | Switch models and providers at any time |
Disadvantages and Caveats
| Item | Details | Mitigation |
|---|---|---|
| Model quality gap | Inferior complex reasoning performance compared to GPT-4o and Claude Sonnet | Focus on simpler tasks like file editing and refactoring |
| Hardware costs | "Free" but GPU VRAM, power, and maintenance costs apply | Use hardware you already own; consider running alongside cloud API |
| Setup complexity | More initial configuration steps than cloud API key approach | Can be simplified to a single command with ollama launch opencode |
| Model download size | ~4.5GB for 7B, 20GB+ for 30B required upfront | Download in advance when on a good network |
| Multi-step autonomous task limitations | Cloud models still outperform for complex agentic tasks | Use Plan mode and proceed step by step with manual confirmation |
Quantization: A technique that compresses model weights from 32-bit floating point to 4-bit integers and similar formats. It improves memory usage and speed at the cost of minor quality degradation. Most models you download from Ollama are 4-bit quantized versions in GGUF format.
TUI (Terminal User Interface): A graphical interface composed of text in a terminal environment. OpenCode is a TUI app operated entirely by keyboard without a mouse.
Closing Thoughts
The core of an offline OpenCode setup is simple — start a local LLM server and point the baseURL in opencode.json at that address.
If code privacy matters to you, you need to develop without internet access, or you want to automate simple editing tasks without API costs, this setup is a sufficiently practical choice. I always prefer starting in Plan mode: begin by having it read files and answer simple questions to verify behavior, then move to Build mode once you're comfortable. That approach has saved me from a lot of mistakes.
Three steps you can try right now:
- Run
brew install ollamaand thenollama pull qwen2.5-coder:7bto download a model. If you're on a 16GB RAM machine, that combination is enough to get started. - Write
~/.config/opencode/opencode.jsonfollowing the example format above. The key points are settingbaseURLtohttp://localhost:11434/v1and listing your cloud providers indisabled_providers. - Run
opencodefrom your project folder and explore in Plan mode first. Use theTabkey to switch between Build and Plan modes.
References
- OpenCode Official Site | opencode.ai
- OpenCode Official Docs — Config | opencode.ai
- OpenCode Official Docs — Providers | opencode.ai
- OpenCode Official Docs — Models | opencode.ai
- GitHub — opencode-ai/opencode
- Building a Local AI Coding Environment with Ollama and OpenCode | DevelopersIO
- I Built a Local AI Coding Agent Home Lab Setup With OpenCode and Ollama | Virtualization Howto
- Run Gemma 4 Locally + OpenCode: Free, Offline, Unlimited Vibe Coding | Popular AI Tools
- OpenCode: A model-neutral AI coding assistant for OpenShift Dev Spaces | Red Hat Developer
- Drive a Local LLM From Your Terminal With OpenCode and LM Studio | Abishek Lakandri
- OpenCode with Local LLMs — Can a 16 GB GPU Compete With The Cloud? | Patshead.com
- OpenCode: AI-Assisted Coding with Free and Local LLMs | Infralovers
- OpenCode Quickstart | LiteLLM Docs
- OpenCode Tutorial – Build a Free Local AI Coding Environment with Ollama + Qwen3-Coder | AI Sparkup
- Local Agentic Development with Ollama and OpenCode | DEV Community
- OpenCode: The Open Source AI Coding Agent Reviewed | DEV Community