Privacy Policy© 2026 DEV BAK - TECH BLOG. All rights reserved.
DEV BAK - TECH BLOG
AI

Pydantic AI: Implementing Type-Safe LLM Tool Calls in Python AI Agents

Catching runtime errors at write time with RunContext · output_type · dependency injection

When you layer LLM-powered features onto Python code, a nagging anxiety tends to set in at some point: "What happens if the LLM calls this tool function with the wrong type?" I remember spending a long time debugging this myself when first attaching an agent to a FastAPI project — an int showed up where a str was expected. Errors that only blow up at runtime, bugs that slip through tests only to surface for the first time in an actual LLM response. You've probably been there at least once.

The root cause is simple. LLMs pass arguments as JSON, and without type validation, that JSON just flows straight into your Python functions.

python
# The old approach with no type validation — errors only caught at runtime
def process_tool_call(args: dict):
    amount = args["amount"]   # LLM passed "22.5" (str), but it slips through
    return amount * 1.1        # TypeError: can't multiply str by float
 
# The Pydantic AI approach — argument types are guaranteed, and on error the LLM is asked to retry
@agent.tool
def process_amount(ctx: RunContext[str], amount: float) -> float:
    return amount * 1.1        # Pydantic has validated it — float is guaranteed

Pydantic AI solves this problem by "extending the type system all the way up to the agent layer." Python type hints and Pydantic validation apply across every layer — tool function arguments, return values, dependency injection, and LLM output parsing — so errors are caught at write time rather than at runtime. Just as FastAPI transformed the web API development experience, Pydantic AI is pulling AI agent development in the same direction. Let me show you concretely how type errors get caught by your IDE and Pydantic before they ever blow up at runtime.


Core Concepts

Why Tool Calls Become a Type Problem

A bit of background: "Tool Calls" (also called "Function Calling") is a mechanism by which an LLM executes external functions to retrieve information or perform actions. Here's how it works: we provide the LLM with function signatures and descriptions in JSON Schema form, the LLM selects the appropriate function based on the conversation flow and generates arguments as JSON, and it's our code's job to receive that JSON and invoke the actual Python function.

This is where the problem arises. The JSON the LLM generates may not always match the types we expect. "22.5" (a string) can land where a float is expected, or a null can arrive where a number belongs. With frameworks like LangChain, you had to handle this yourself with defensive code. Pydantic AI solves it at the framework level.

Three Core Mechanisms

There are three key components you need to understand Pydantic AI. Each has a clearly defined role, and once you've internalized them, the code becomes much easier to read.

① @agent.tool / @agent.tool_plain — Register a function as an LLM tool

The difference between the two decorators is straightforward. @agent.tool takes a RunContext as its first argument and can receive injected dependencies, while @agent.tool_plain is a pure function that operates without dependencies. A JSON schema is automatically generated from the type hints in the function signature and passed to the LLM, and the docstring is used as-is for the tool description and parameter descriptions.

Think of it like this: just as type hints on a FastAPI route function validate the incoming request, here they validate the JSON arguments passed in by the LLM.

② RunContext[DepsType] — A container for receiving dependencies in a type-safe way

It plays a similar role to FastAPI's Depends, but the key difference is that the type safety extends all the way into the LLM layer. It's a pattern where tool functions receive things like database connections, API clients, and configuration objects from the outside rather than creating them themselves. If deps_type is declared incorrectly, mypy/pyright catches it immediately.

③ Agent[ResultType] — Automatically validates LLM output against a Pydantic model

Declare the output type as a generic type parameter and LLM responses are automatically parsed and validated against that Pydantic model. On a parsing failure, the Pydantic error message is sent back as feedback to the LLM to prompt a retry. Specifying a maximum retry count like retries=2 guards against infinite loops.

python
import os
from pydantic_ai import Agent, RunContext
from pydantic import BaseModel
 
MODEL = os.getenv("LLM_MODEL", "anthropic:claude-sonnet-4-20250514")
 
class WeatherResult(BaseModel):
    city: str
    temperature: float
    condition: str
 
agent = Agent(
    MODEL,
    deps_type=str,              # Type of dependency to inject (an API key here)
    output_type=WeatherResult,  # Validation type for LLM output
    retries=2,                  # Max retry attempts on validation failure
)
 
@agent.tool_plain
def get_time() -> str:
    """Returns the current UTC time."""
    from datetime import datetime, timezone
    return datetime.now(timezone.utc).isoformat()
 
@agent.tool
def get_weather(ctx: RunContext[str], city: str) -> dict:
    """Fetches the weather for a given city.
 
    city: The name of the city to look up
    """
    api_key = ctx.deps  # Dependency injected in a type-safe manner
    return {"city": city, "temperature": 22.5, "condition": "Sunny"}
 
result = agent.run_sync("Tell me the weather in Seoul", deps="my-api-key")
print(result.output.temperature)  # The type system guarantees this is a float

One particularly useful aspect in practice is that docstrings are automatically populated into the description field of the JSON schema. A single docstring lets you control when and how the LLM uses a tool.

Automatic Schema Generation: Type Hints → JSON Schema

Honestly, this is the most convenient part. There's no need to write JSON schemas by hand.

python
from pydantic import BaseModel, Field
from typing import Optional
 
class SearchParams(BaseModel):
    query: str = Field(description="Search query")
    max_results: int = Field(default=10, ge=1, le=100, description="Maximum number of results")
    language: Optional[str] = Field(default=None, description="Language code (e.g. ko, en)")
 
@agent.tool
def search_docs(ctx: RunContext[str], params: SearchParams) -> list[str]:
    """Searches documents."""
    ...

One thing to watch out for: constraints like Field(ge=1, le=100) serve two purposes simultaneously. First, they include the constraint in the JSON schema to hint to the LLM that it should generate a value between 1 and 100. Second, if the LLM ignores the hint and generates an out-of-range value, Pydantic catches it at runtime and feeds the error back to the LLM. Since LLMs don't always follow the schema perfectly, think of Pydantic's runtime validation as the last line of defense.


Practical Application

The examples below increase in complexity in stages. Pick whichever fits your situation.

  • Example 1 (⭐): When you only need LLM output validation — structuring unstructured text
  • Example 2 (⭐⭐): When you have DB or external API dependencies — a customer support agent
  • Example 3 (⭐⭐): When adding AI streaming to a FastAPI project
  • Example 4 (⭐⭐⭐): When composing multiple agents — multi-agent orchestration

Example 1: Extracting Structured Data from Unstructured Text ⭐

The value of type validation is maximized when extracting needed information from unstructured text like receipts, contracts, and emails. This is especially useful in pipelines that insert the extracted data directly into a database.

python
import asyncio
import os
from pydantic import BaseModel, field_validator  # Pydantic v2 syntax
from pydantic_ai import Agent
from typing import Optional
 
MODEL = os.getenv("LLM_MODEL", "openai:gpt-4.1")
 
class InvoiceData(BaseModel):
    vendor: str
    amount: float
    date: str
    items: list[str]
    currency: Optional[str] = "USD"
 
    @field_validator('amount', mode='before')  # Pydantic v2: @field_validator
    @classmethod
    def amount_must_be_positive(cls, v):
        v = float(v)
        if v <= 0:
            raise ValueError('Amount must be positive')
        return v
 
agent = Agent(MODEL, output_type=InvoiceData, retries=2)
 
async def extract_invoice(raw_text: str) -> InvoiceData:
    result = await agent.run(f"Please extract information from the following receipt:\n{raw_text}")
    return result.output  # InvoiceData instance, type guaranteed
 
async def main():
    raw_text = """
    Vendor: TechSolutions Inc.
    Amount: $550.00
    Date: 2026-05-15
    Items: Cloud server costs, Monitoring tool license
    """
    invoice = await extract_invoice(raw_text)
 
    print(f"Vendor: {invoice.vendor}")           # str
    print(f"Amount: ${invoice.amount:,.2f}")      # float
    print(f"Items: {', '.join(invoice.items)}")   # list[str]
 
    # No type error worries before DB insert (db is the actual DB client)
    await db.insert("invoices", invoice.model_dump())
 
asyncio.run(main())

In Pydantic v2, use @field_validator instead of @validator. Pydantic AI is based on Pydantic v2, so using the old @validator will raise a PydanticUserError.

Element Role
output_type=InvoiceData Parses and validates LLM output as InvoiceData
@field_validator Additional business logic validation (e.g. positive amount check)
model_dump() Converts the Pydantic model to a dict for direct use in DB inserts

Example 2: Composing a Customer Support Agent with Dependency Injection ⭐⭐

This is a situation that comes up often in practice: when tool functions need to access a DB or external API, the key question is how to pass the dependencies. Using RunContext also makes swapping in mocks during tests clean and straightforward.

python
import os
from dataclasses import dataclass
from pydantic_ai import Agent, RunContext
 
MODEL = os.getenv("LLM_MODEL", "anthropic:claude-sonnet-4-20250514")
 
# AsyncDatabase and ExternalAPIClient are clients defined in your actual project.
# e.g. asyncpg.Connection, httpx.AsyncClient, etc.
@dataclass
class SupportDeps:
    db: "AsyncDatabase"
    user_id: int
    api_client: "ExternalAPIClient"
 
agent = Agent(
    MODEL,
    deps_type=SupportDeps,
    system_prompt="You are a customer support agent. Provide only accurate information.",
    retries=2,
)
 
@agent.tool
async def get_order_status(ctx: RunContext[SupportDeps], order_id: str) -> str:
    """Retrieves the status of an order.
 
    order_id: The order ID to look up (e.g. ORD-2026-001234)
    """
    order = await ctx.deps.db.fetch_order(ctx.deps.user_id, order_id)
    if not order:
        return "The requested order could not be found."
    return f"Order {order_id} status: {order.status} (estimated arrival: {order.estimated_arrival})"
 
@agent.tool
async def get_product_info(ctx: RunContext[SupportDeps], product_id: str) -> dict:
    """Retrieves product information.
 
    product_id: The product ID to look up
    """
    return await ctx.deps.api_client.fetch_product(product_id)
 
# Production usage
async def handle_request(user_id: int, message: str):
    deps = SupportDeps(db=real_db, user_id=user_id, api_client=real_client)
    result = await agent.run(message, deps=deps)
    return result.output
 
# During testing — just swap in mock objects
async def test_order_status():
    test_deps = SupportDeps(db=mock_db, user_id=99999, api_client=mock_client)
    result = await agent.run("What is the status of order ORD-2026-001234?", deps=test_deps)
    assert "status" in result.output

Because the type of ctx.deps is fixed as SupportDeps, your IDE provides autocomplete when calling ctx.deps.db.fetch_order and catches typos. You might be tempted to use a dict instead of a dataclass at first, but using a structured type (dataclass or BaseModel) is much better for mypy to properly verify type safety.

Example 3: FastAPI Integration with Real-Time Streaming ⭐⭐

When adding an AI streaming endpoint to a FastAPI project, you can share the same Pydantic models directly, resulting in almost no code duplication. If you've used FastAPI's Depends before, the dependency injection flow will feel familiar.

python
import os
from fastapi import FastAPI, Depends
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from pydantic_ai import Agent
 
MODEL = os.getenv("LLM_MODEL", "anthropic:claude-sonnet-4-20250514")
app = FastAPI()
 
class ChatRequest(BaseModel):
    message: str
    user_id: int
 
agent = Agent(MODEL, deps_type=SupportDeps, retries=2)
 
# get_db() and get_client() are factory functions defined in your actual project.
def get_deps(request: ChatRequest) -> SupportDeps:
    return SupportDeps(
        db=get_db(),
        user_id=request.user_id,
        api_client=get_client(),
    )
 
@app.post("/chat/stream")
async def chat_stream(
    request: ChatRequest,
    deps: SupportDeps = Depends(get_deps),
):
    async def generate():
        async with agent.run_stream(request.message, deps=deps) as response:
            async for chunk in response.stream_text():
                yield f"data: {chunk}\n\n"
        yield "data: [DONE]\n\n"
 
    return StreamingResponse(generate(), media_type="text/event-stream")
 
@app.post("/chat")
async def chat(
    request: ChatRequest,
    deps: SupportDeps = Depends(get_deps),
):
    result = await agent.run(request.message, deps=deps)
    return {"response": result.output}

What's interesting is that the ChatRequest model is used simultaneously by both FastAPI's request validation and Pydantic AI's dependency injection. Since they share the same Pydantic ecosystem, there's no need to write duplicate model definitions.

Example 4: Multi-Agent Orchestration ⭐⭐⭐

This is a pattern where you register one agent as a tool for another agent. Taking a code review system as an example, analysis and security review are separated into independent agents, and a coordinator combines them.

python
import os
from pydantic_ai import Agent, RunContext
 
FAST_MODEL = os.getenv("FAST_LLM_MODEL", "openai:gpt-4.1")
MAIN_MODEL = os.getenv("LLM_MODEL", "anthropic:claude-sonnet-4-20250514")
 
# Sub-agents — use str type when passing an API key as deps
code_analyzer = Agent(FAST_MODEL, deps_type=str, output_type=str)
security_reviewer = Agent(MAIN_MODEL, deps_type=str, output_type=str)
 
coordinator = Agent(
    MAIN_MODEL,
    deps_type=str,
    system_prompt="You oversee code review. Synthesize the analysis and security review results to produce a final review.",
    retries=2,
)
 
@coordinator.tool
async def analyze_code(ctx: RunContext[str], code: str) -> str:
    """Analyzes code quality and structure.
 
    code: The source code to analyze
    """
    result = await code_analyzer.run(
        f"Please analyze the following code:\n{code}",
        deps=ctx.deps,  # Pass the API key directly to the sub-agent
    )
    return result.output
 
@coordinator.tool
async def review_security(ctx: RunContext[str], code: str) -> str:
    """Reviews code for security vulnerabilities.
 
    code: The source code to review
    """
    result = await security_reviewer.run(
        f"Please review this from a security perspective:\n{code}",
        deps=ctx.deps,
    )
    return result.output
 
async def review_code(user_code: str, api_key: str) -> str:
    result = await coordinator.run(
        f"Please review the following code:\n{user_code}",
        deps=api_key,
    )
    return result.output

One thing to watch out for: when passing deps to sub-agents, the types must be consistent. Here, all agents are unified on deps_type=str (an API key). In a real project where each agent needs different dependencies, you can have the coordinator convert to the appropriate type before passing it along.


Pros and Cons Analysis

Reflecting on my experience using it in real projects, I see two decisive differences. One is "how important is type safety for this project," and the other is "how deeply is this integrated with the existing Python ecosystem (FastAPI, Pydantic)." If both conditions apply, Pydantic AI is by far the best choice.

Advantages

Item Description
End-to-end type safety IDE autocomplete and mypy/pyright validation apply across all layers — agents, tools, and output
Automatic schema generation JSON schemas passed to the LLM are generated automatically from type hints and docstrings alone
Built-in retry logic When the LLM returns invalid arguments, Pydantic errors are fed back as feedback and retries happen automatically. The upper limit is configurable via the retries parameter
Clean DI pattern RunContext injects DBs, APIs, and config into tool functions in a type-safe way, making test substitution easy
Model-agnostic API Switch between OpenAI, Anthropic, Gemini, Bedrock, and Ollama through the same interface
FastAPI-friendly Shares the same Pydantic models and async patterns, integrating naturally into existing FastAPI projects

Disadvantages and Caveats

Item Description Mitigation
Ecosystem size Community roughly 15× smaller than LangChain; fewer pre-built integrations Build custom integrations with @agent.tool directly; use GitHub Discussions
No security/compliance support No built-in RBAC, prompt injection detection, or guardrails Add a separate security layer (middleware, gateway)
Limited access to provider advanced features The least-common-denominator abstraction can make it hard to leverage provider-specific advanced features Write @agent.tool_plain wrappers that call the provider SDK directly
No graph-based workflows Not well-suited for complex state machines or workflows with heavy conditional branching Consider using LangGraph alongside it

The Most Common Mistakes in Practice

  1. Failing to use @field_validator instead of @validator: Pydantic AI is based on Pydantic v2. Using the v1-style @validator will raise a PydanticUserError. In v2, use @field_validator('field_name', mode='before').

  2. Mismatch between deps_type and the actual deps type: For example, declaring Agent(deps_type=SupportDeps) but passing agent.run(deps={"db": ...}) — a dict. pyright will catch it, but it can also go undetected until runtime, so using a dataclass or BaseModel from the start is strongly recommended.

  3. Writing sparse docstrings: The docstring of an @agent.tool function is the tool description sent to the LLM. Writing just a single vague line makes it hard for the LLM to judge when it should use the tool. Including thorough parameter descriptions and usage examples significantly improves call accuracy.


Closing Thoughts

Using Pydantic AI in real projects, what I noticed wasn't so much that type errors decreased — it was that the timing of when they occur shifted. LLM response parsing failures that used to blow up unexpectedly at runtime now surface much earlier, as red squiggles in the IDE or as Pydantic retry feedback. The productivity impact of being able to share Pydantic models without duplicate definitions when using it alongside FastAPI was particularly tangible.

Three steps you can take right now:

  1. Install and run your first agent: Install with pip install 'pydantic-ai[openai]' (or pydantic-ai[anthropic]), then copy the weather example above and run agent.run_sync() with a real API key. You'll need either the OPENAI_API_KEY or ANTHROPIC_API_KEY environment variable.

  2. Add type safety to existing tool functions: If you've already written LLM tool functions, you can refactor them to use @agent.tool and RunContext for a dependency injection pattern. Running mypy alongside will surface type mismatches immediately.

  3. Add output type validation: If you have code that receives LLM responses as str and parses them manually, try defining a Pydantic BaseModel and specifying it as output_type. Retry logic gets attached automatically, and parsing failure cases decrease significantly.


References

  • Function Tools | Pydantic AI Official Docs
  • Dependencies & RunContext | Pydantic AI Official Docs
  • Output Validation | Pydantic AI Official Docs
  • pydantic/pydantic-ai | GitHub
  • Build Type-Safe LLM Agents in Python | Real Python
  • Type-safe LLM agents with PydanticAI | Paul Simmering
  • PydanticAI v1: The Type-Safe Agent Framework | AgentMarketCap
  • PydanticAI vs LangChain vs LangGraph: Which Wins in 2026?
  • Building Type-Safe LLM Agents With Pydantic AI | n1n.ai
  • Pydantic AI | Thoughtworks Technology Radar
  • Bulletproof Agentic Workflows with PydanticAI | MarkTechPost
#PydanticAI#Python#LLM#타입안전성#AI에이전트#FastAPI#의존성주입#Pydantic#멀티에이전트#FunctionCalling
Share

Table of Contents

Core ConceptsWhy Tool Calls Become a Type ProblemThree Core MechanismsAutomatic Schema Generation: Type Hints → JSON SchemaPractical ApplicationExample 1: Extracting Structured Data from Unstructured Text ⭐Example 2: Composing a Customer Support Agent with Dependency Injection ⭐⭐Example 3: FastAPI Integration with Real-Time Streaming ⭐⭐Example 4: Multi-Agent Orchestration ⭐⭐⭐Pros and Cons AnalysisAdvantagesDisadvantages and CaveatsThe Most Common Mistakes in PracticeClosing ThoughtsReferences

Recommended Posts

Building Your Own LLM Evaluation Framework vs. Off-the-Shelf Tools: Team Decision Criteria for 2026
AI

Building Your Own LLM Evaluation Framework vs. Off-the-Shelf Tools: Team Decision Criteria for 2026

If your team is shipping RAG, chatbots, or agents to production, this decision is waiting for you If you've ever shipped an AI feature to your product and th...

May 30, 202624 min read
Building LLM Tracing with OpenTelemetry: Tracking RAG and Multi-Agent Flows with the gen_ai Standard
AI

Building LLM Tracing with OpenTelemetry: Tracking RAG and Multi-Agent Flows with the gen_ai Standard

A service connected to GPT-4 suddenly starts giving nonsensical answers. You dig through the logs and find no errors. HTTP response codes are all 200. But users...

May 30, 202625 min read
Building a Multimodal RAG Pipeline: Making LLMs Understand Images and Tables
AI

Building a Multimodal RAG Pipeline: Making LLMs Understand Images and Tables

When I first introduced RAG, I had a similar experience. I parsed a few hundred PDFs, loaded them into a vector DB, and ran some searches — it retrieved text-he...

May 30, 202620 min read
Comparing Long-Term Memory for AI Agents: Mem0 vs Letta vs Zep — Three Philosophies and How to Choose
AI

Comparing Long-Term Memory for AI Agents: Mem0 vs Letta vs Zep — Three Philosophies and How to Choose

If you've ever built an LLM-based app, you've hit this wall. "How do I make it remember past conversations?" You might think you can just shove the entire conve...

May 30, 202629 min read
LangGraph Supervisor Pattern: How to Stay in Control in a Multi-Agent System
AI

LangGraph Supervisor Pattern: How to Stay in Control in a Multi-Agent System

The most common mistake when first designing a multi-agent system is connecting agents loosely under the vague expectation that "they'll figure out how to collaborate." I thought the same thing at first, and the result was always the same: you can't tell where the control flow is, you can't trace where it failed, and debugging inevitably leads you to redesign everything from scratch.

May 30, 202622 min read
Why 88% of AI Agents Fail in Production: The 5-Layer Harness Architecture Is the Answer
AI

Why 88% of AI Agents Fail in Production: The 5-Layer Harness Architecture Is the Answer

When GPT-4 first came out, I—along with most developers around me—shared the same misconception: "Isn't a good model all you need?" We'd slap a few prompt lines...

May 29, 202628 min read