Privacy Policy© 2026 DEV BAK - TECH BLOG. All rights reserved.
DEV BAK - TECH BLOG
AI

How to Make LLMs Directly Call Your Internal REST APIs: TypeScript MCP Server Implementation and the Gateway Pattern

Have you ever tried to introduce an AI agent to your team, only to get stuck on the question "so how do we connect our internal APIs?" I started out trying to paste entire internal API docs into the LLM, or putting raw fetch code directly into the prompt. It works—exactly once. From the second request onward, the LLM starts inventing endpoint URLs, forgetting authentication headers, or sending parameters that don't exist.

Wrapping with an MCP (Model Context Protocol) server is an architectural pattern that lets LLMs call internal systems in a structured way without touching existing API logic. Instead of cramming API specs into a prompt, the LLM calls functions directly through a standardized Tool interface. By the end of this article, you'll be able to write TypeScript code that exposes existing REST APIs as MCP Tools. This guide assumes a basic Node.js + TypeScript environment is already set up.

With Claude Code, Cursor, and GitHub Copilot all operating as MCP clients today, exposing internal systems via MCP is also an exercise in building AI productivity infrastructure for your entire team.


Core Concepts

What Is MCP Wrapping?

The idea itself is simple. You add a thin JSON-RPC-based interface layer in front of an existing REST API. Through this layer, the LLM can clearly identify "which functions it can call" and "what each parameter is," and invoke them in a type-safe manner.

Terminology: JSON-RPC is a remote procedure call protocol that uses JSON. MCP builds on this to define a standard communication method between LLMs and external tools.

MCP provides three basic abstractions:

Abstraction Role Internal API Mapping Example
Tool A function through which the LLM triggers side effects or requests computation createTicket, triggerDeploy
Resource Read-only data exposure Internal wiki, API docs, codebase
Prompt Reusable prompt templates Domain-specific instructions

When wrapping internal APIs, Tool is what matters most in practice. The work mostly involves mapping internal API endpoints to Tools one-to-one, or grouping multiple endpoints into a single meaningful Tool.

Transport: stdio vs Streamable HTTP

Transport choice determines how the server is deployed. As of March 2025, the legacy HTTP+SSE approach has been officially deprecated, leaving two options:

Transport Use Case Characteristics
stdio Local development, single-user CLI tools Standard I/O between processes, simple setup
Streamable HTTP Team- or org-scale remote servers HTTP-based, OAuth 2.1 required, scalable

For exposing internal APIs to an entire team, the recommended approach is Streamable HTTP + OAuth 2.1. A practical sequence is to start with stdio for fast local testing, then switch to Streamable HTTP when promoting to a shared team server.

The Key to Schema Design

I initially exposed the nested object structures of my internal APIs directly as Tool parameters, and ran into a situation where the LLM was sending metadata.labels[0].value as a nonexistent field called labelsValue. After that, I adopted parameter flattening as a principle. In practice, schema ambiguity = LLM Tool miscalls is an equation that holds up pretty often.

Before — exposing nested objects directly (LLM frequently miscalls):

typescript
{
  metadata: z.object({
    labels: z.array(z.object({
      key: z.string(),
      value: z.string()
    }))
  })
}

After — flattened (LLM fills in accurately):

typescript
{
  label_key: z.string().describe("Label key (e.g., env, team)"),
  label_value: z.string().describe("Label value (e.g., production, backend)")
}

The principles for a good Tool schema in summary:

  • Flatten parameters as much as possible — simple fields over nested objects
  • Use describe() to specify the meaning and allowed values of each parameter
  • Make active use of enum types to constrain the LLM's choices

Pros and Cons Analysis

It's worth examining the tradeoffs before deciding to adopt MCP wrapping.

Advantages

Item Description
Fast integration Add only an MCP interface layer without rewriting existing API logic
Multi-client support Callable identically from any MCP-compatible client: Claude, GPT, Copilot, Cursor, etc.
OpenAPI automation Minimize manual Tool writing if you already have a spec
Standard security layer OAuth 2.1-based authentication enforced at the protocol level
Agent productivity Automate complex internal tasks with natural-language instructions

Disadvantages and Caveats

Item Description Mitigation
Prompt injection Malicious instructions in Tool descriptions or return values can cause the LLM to behave unintentionally Design with the principle of not trusting returned data; output sanitization
Context pollution Too many Tools fill the LLM context window with Tool descriptions, degrading performance Separate servers by domain; consider dynamic Tool loading
Schema complexity Complex internal API objects require redesign to simplify Flattened parameter structures, active use of enums
Missing audit logs Without records of AI agent API calls, compliance auditing is impossible Log all calls at the MCP server or gateway level
Over-privileged access A single MCP server holding broader API access than necessary Apply the principle of least privilege per Tool

Terminology: A Tool Poisoning Attack is an attack method that embeds malicious LLM instructions into an MCP Tool's description or return value to cause the agent to perform unintended actions. Because data coming from outside (user input, external API responses) can appear in Tool return values, this is a threat that can't be ignored even in internal systems.

The Most Common Mistakes in Practice

  1. Writing lazy Tool descriptions — A one-liner like "creates a ticket" makes it hard for the LLM to understand when and how to use the Tool. Using describe() to specify concrete examples and allowed values for each parameter directly determines Tool call accuracy.

  2. Accepting auth tokens as Tool parameters — Having the LLM handle tokens directly risks exposing them in logs or having them stolen via prompt injection. It's recommended to inject credentials from server environment variables and never expose them in the Tool interface.

  3. Deploying stdio servers to the whole team as-is — Using a local stdio server as-is for team-wide use exposes internal APIs with no authentication, rate limiting, or audit logs. Once you reach team scale, switching to Streamable HTTP + OAuth 2.1 is recommended.


Practical Application

Direct TypeScript SDK Implementation: Wrapping an Internal Issue Tracker as a Tool

Wrap the REST API of Jira, Linear, or your own ticket system as an MCP Tool, and just saying "create a P1 ticket for this bug" to your editor AI will actually create the issue.

First, install the dependencies:

bash
pnpm add @modelcontextprotocol/sdk zod axios

Next, the server code. Where internalClient comes from can be confusing at first glance — it's an axios instance initialized from environment variables:

typescript
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
import axios from "axios";
 
// Auth token is injected from environment variables — not accepted as a Tool parameter
const internalClient = axios.create({
  baseURL: process.env.INTERNAL_API_URL,
  headers: {
    Authorization: `Bearer ${process.env.API_TOKEN}`
  }
});
 
const server = new McpServer({ name: "internal-issue-tracker", version: "1.0.0" });
 
server.tool(
  "create_ticket",
  "Creates a new ticket in the internal issue tracker",
  {
    title: z.string().describe("Ticket title (clear and concise)"),
    priority: z.enum(["P1", "P2", "P3"]).describe("Priority: P1=urgent, P2=high, P3=normal"),
    assignee: z.string().email().optional().describe("Assignee's email address")
  },
  async ({ title, priority, assignee }) => {
    const res = await internalClient.post("/issues", { title, priority, assignee });
    return {
      content: [{ type: "text", text: `Ticket created: ${res.data.id} — ${res.data.url}` }]
    };
  }
);
 
// Start server — uses stdio transport for local development
const transport = new StdioServerTransport();
await server.connect(transport);
Point Description
z.enum(["P1", "P2", "P3"]) Locks the choices so the LLM cannot insert arbitrary values
.describe() Communicates the meaning of each parameter directly to the LLM
axios.create(...) Handles auth headers at the client level — not exposed in the Tool interface
server.connect(transport) This line is required for the server to actually run

Automatic OpenAPI Spec Conversion: Building an MCP Server Without Manual Work

If your internal API already has an OpenAPI (Swagger) spec, you don't need to write Tools by hand. In a TypeScript environment, you can use a CLI tool:

bash
npx openapi-mcp-generator --input ./api-spec.yaml --output ./mcp-server

In the Python ecosystem, FastMCP's OpenAPI integration is the fastest option:

python
from fastmcp import FastMCP
from your_internal_api import app  # existing FastAPI app
 
# Convert FastAPI app to an MCP server
mcp = FastMCP.from_fastapi(app=app)
 
if __name__ == "__main__":
    mcp.run()

One thing worth being honest about: writing from your_internal_api import app on one line looks simple, but in reality it brings along all the dependencies, environment variables, DB connections, etc. of that FastAPI app. If you can run it in the same Python environment as the existing app, this is the fastest approach; if the environments are separated, it's more practical to export the OpenAPI spec file and process it with openapi-mcp-generator.

The operationId, summary, and description from your OpenAPI spec map directly to Tool names and descriptions. Since spec quality becomes MCP Tool quality, this is especially effective for teams that already maintain a well-managed OpenAPI spec.

Scaling to Team Size: The MCP Gateway Pattern

When you have multiple teams and each domain starts running its own MCP server, authentication and audit logs become fragmented. Honestly, at this point the thought crosses your mind — "can't we just hardcode tokens in each server?" — but that choice comes back to bite you during a security audit.

The MCP Gateway is a pattern that solves this problem. The Gateway acts as a reverse proxy at the MCP protocol level. When an AI agent connects to the single endpoint (the Gateway), the Gateway handles OAuth 2.1 authentication and then internally routes Tool calls to each domain MCP server. The domain servers don't need to be exposed to the internet — they only need to exist behind the Gateway.

[Claude / AI Agent]
        |
        ↓ (single connection point)
[MCP Gateway]
  - OAuth 2.1 authentication handling
  - Rate limiting
  - Audit logs (records all Tool calls)
  - Routes to domain-specific servers
        |
   _____|_____
  |     |     |
  ↓     ↓     ↓
[HR   [CI/CD [Analytics
 API]  API]   API]
 MCP   MCP    MCP
 Srv   Srv    Srv
(internal network only)

Tools validated by real-world use cases:

Tool Characteristics
Kong AI MCP Proxy Bridges existing HTTP APIs to MCP; integrated rate limiting and authentication
Azure API Management + Entra ID MCP + AD federation in Microsoft stack environments
mcp-gateway-registry Open-source gateway registry with Keycloak/Entra integration

Behind the roughly 970x growth in MCP SDK monthly downloads over 18 months is this kind of enterprise proliferation of the Gateway pattern. As of 2026, this pattern is becoming the de facto standard for in-house AI infrastructure at team scale and above.


Closing Thoughts

Wrapping internal APIs with MCP creates infrastructure that lets your entire team interact with internal systems through natural language via AI agents, while leaving existing code untouched.

Here are 3 steps you can start with right now. Choose based on your situation:

  1. If your internal API has an OpenAPI spec, you can generate skeleton code first with npx openapi-mcp-generator --input ./your-api-spec.yaml --output ./mcp-server. In this case, you can skip step 2 (manual Tool writing).

  2. If you don't have a spec, install pnpm add @modelcontextprotocol/sdk zod axios and use the TypeScript example above as a reference to convert one API your team uses daily (ticket creation, deployment status check, etc.) into a single Tool. Even one Tool is enough to experience an agent turning a natural-language instruction into an actual API call.

  3. Connect a local server to Claude Code or Cursor's MCP settings and use it directly. Watching which parameters the agent fills when it calls a Tool will immediately reveal what needs to be improved in your schema. After going through this step, you'll feel firsthand why describe() and enums matter.


References

Recommended starting points:

  • Wrapping an Existing API with MCP: How to Expose Your Current APIs to LLMs | Gun.io
  • How to build MCP servers with TypeScript SDK | DEV Community
  • OpenAPI 🤝 FastMCP | FastMCP Official Docs

For deeper learning:

  • Should you wrap MCP around your existing API? | Scalekit
  • MCP Best Practices: Architecture & Implementation Guide | modelcontextprotocol.info
  • From OpenAPI Spec to MCP Server: A Practical Guide | Xata
  • API MCP Server Architecture Guide | Stainless
  • What Is an MCP Gateway and Why Your Enterprise Needs One in 2026 | Composio
  • Advanced authentication and authorization for MCP Gateway | Red Hat Developer
  • Understanding Authorization in MCP | MCP Official Docs
  • MCP Server Security Best Practices: 2026 Engineering Guide | Digital Applied
  • Model Context Protocol has prompt injection security problems | Simon Willison
  • From REST to MCP: An Empirical Study of API Wrapping | arXiv
#MCP#TypeScript#REST-API#LLM#Gateway패턴#OpenAPI#OAuth2#Zod#AI에이전트#JSON-RPC
Share

Table of Contents

Core ConceptsWhat Is MCP Wrapping?Transport:The Key to Schema DesignPros and Cons AnalysisAdvantagesDisadvantages and CaveatsThe Most Common Mistakes in PracticePractical ApplicationDirect TypeScript SDK Implementation: Wrapping an Internal Issue Tracker as a ToolAutomatic OpenAPI Spec Conversion: Building an MCP Server Without Manual WorkScaling to Team Size: The MCP Gateway PatternClosing ThoughtsReferences

Recommended Posts

Type-Safe LLM Response Validation with Pydantic AI
AI

Type-Safe LLM Response Validation with Pydantic AI

If you've ever wired an LLM into production, you've probably hit this situation at least once. You carefully wrote a system prompt telling GPT to respond in JSO...

June 7, 202622 min read
Cutting Long-Horizon Agent Costs by 60–90%: Caching, Compression, and Routing Strategies
AI

Cutting Long-Horizon Agent Costs by 60–90%: Caching, Compression, and Routing Strategies

I still remember the shock of receiving that first bill after putting an AI agent into production. A simple chatbot would have been predictable, but agents were...

June 7, 202624 min read
AI Writes It, AI Reviews It: Building a `/code-review ultra` Multi-Agent Pipeline
AI

AI Writes It, AI Reviews It: Building a `/code-review ultra` Multi-Agent Pipeline

Honestly, when I first heard about this concept, my reaction was "does that actually work?" It's already remarkable that an agent can write code on its own — bu...

June 7, 202620 min read
7 Major Patterns of Agentic AI Design
AI

7 Major Patterns of Agentic AI Design

Use + ReAct | KB, ticket DB, and other external systems with repeated lookups | | Response writing | Response agent | Reflection | Self-review of tone and accu...

June 6, 20269 min read
Open-Weight vs Closed AI 2026: Now That the Benchmark Gap Has Narrowed, the Criteria for Choosing Has Changed
AI

Open-Weight vs Closed AI 2026: Now That the Benchmark Gap Has Narrowed, the Criteria for Choosing Has Changed

To be honest, until a year ago I thought closed models would maintain an overwhelming lead for some time. It seemed only natural to plug in an OpenAI API key to...

June 6, 202623 min read
Running Qwen3-Coder Locally: Setting Up an SWE-bench 70% AI Coding Agent with a Single RTX 3090
AI

Running Qwen3-Coder Locally: Setting Up an SWE-bench 70% AI Coding Agent with a Single RTX 3090

After watching my cloud AI bills double two months in a row, I started seriously looking for alternatives. Honestly, it wasn't so much a bias of "how good could...

June 6, 202622 min read