Pydantic AI: Implementing Type-Safe LLM Tool Calls in Python AI Agents
Catching runtime errors at write time with RunContext · output_type · dependency injection
When you layer LLM-powered features onto Python code, a nagging anxiety tends to set in at some point: "What happens if the LLM calls this tool function with the wrong type?" I remember spending a long time debugging this myself when first attaching an agent to a FastAPI project — an int showed up where a str was expected. Errors that only blow up at runtime, bugs that slip through tests only to surface for the first time in an actual LLM response. You've probably been there at least once.
The root cause is simple. LLMs pass arguments as JSON, and without type validation, that JSON just flows straight into your Python functions.
# The old approach with no type validation — errors only caught at runtime
def process_tool_call(args: dict):
amount = args["amount"] # LLM passed "22.5" (str), but it slips through
return amount * 1.1 # TypeError: can't multiply str by float
# The Pydantic AI approach — argument types are guaranteed, and on error the LLM is asked to retry
@agent.tool
def process_amount(ctx: RunContext[str], amount: float) -> float:
return amount * 1.1 # Pydantic has validated it — float is guaranteedPydantic AI solves this problem by "extending the type system all the way up to the agent layer." Python type hints and Pydantic validation apply across every layer — tool function arguments, return values, dependency injection, and LLM output parsing — so errors are caught at write time rather than at runtime. Just as FastAPI transformed the web API development experience, Pydantic AI is pulling AI agent development in the same direction. Let me show you concretely how type errors get caught by your IDE and Pydantic before they ever blow up at runtime.
Core Concepts
Why Tool Calls Become a Type Problem
A bit of background: "Tool Calls" (also called "Function Calling") is a mechanism by which an LLM executes external functions to retrieve information or perform actions. Here's how it works: we provide the LLM with function signatures and descriptions in JSON Schema form, the LLM selects the appropriate function based on the conversation flow and generates arguments as JSON, and it's our code's job to receive that JSON and invoke the actual Python function.
This is where the problem arises. The JSON the LLM generates may not always match the types we expect. "22.5" (a string) can land where a float is expected, or a null can arrive where a number belongs. With frameworks like LangChain, you had to handle this yourself with defensive code. Pydantic AI solves it at the framework level.
Three Core Mechanisms
There are three key components you need to understand Pydantic AI. Each has a clearly defined role, and once you've internalized them, the code becomes much easier to read.
① @agent.tool / @agent.tool_plain — Register a function as an LLM tool
The difference between the two decorators is straightforward. @agent.tool takes a RunContext as its first argument and can receive injected dependencies, while @agent.tool_plain is a pure function that operates without dependencies. A JSON schema is automatically generated from the type hints in the function signature and passed to the LLM, and the docstring is used as-is for the tool description and parameter descriptions.
Think of it like this: just as type hints on a FastAPI route function validate the incoming request, here they validate the JSON arguments passed in by the LLM.
② RunContext[DepsType] — A container for receiving dependencies in a type-safe way
It plays a similar role to FastAPI's Depends, but the key difference is that the type safety extends all the way into the LLM layer. It's a pattern where tool functions receive things like database connections, API clients, and configuration objects from the outside rather than creating them themselves. If deps_type is declared incorrectly, mypy/pyright catches it immediately.
③ Agent[ResultType] — Automatically validates LLM output against a Pydantic model
Declare the output type as a generic type parameter and LLM responses are automatically parsed and validated against that Pydantic model. On a parsing failure, the Pydantic error message is sent back as feedback to the LLM to prompt a retry. Specifying a maximum retry count like retries=2 guards against infinite loops.
import os
from pydantic_ai import Agent, RunContext
from pydantic import BaseModel
MODEL = os.getenv("LLM_MODEL", "anthropic:claude-sonnet-4-20250514")
class WeatherResult(BaseModel):
city: str
temperature: float
condition: str
agent = Agent(
MODEL,
deps_type=str, # Type of dependency to inject (an API key here)
output_type=WeatherResult, # Validation type for LLM output
retries=2, # Max retry attempts on validation failure
)
@agent.tool_plain
def get_time() -> str:
"""Returns the current UTC time."""
from datetime import datetime, timezone
return datetime.now(timezone.utc).isoformat()
@agent.tool
def get_weather(ctx: RunContext[str], city: str) -> dict:
"""Fetches the weather for a given city.
city: The name of the city to look up
"""
api_key = ctx.deps # Dependency injected in a type-safe manner
return {"city": city, "temperature": 22.5, "condition": "Sunny"}
result = agent.run_sync("Tell me the weather in Seoul", deps="my-api-key")
print(result.output.temperature) # The type system guarantees this is a floatOne particularly useful aspect in practice is that docstrings are automatically populated into the description field of the JSON schema. A single docstring lets you control when and how the LLM uses a tool.
Automatic Schema Generation: Type Hints → JSON Schema
Honestly, this is the most convenient part. There's no need to write JSON schemas by hand.
from pydantic import BaseModel, Field
from typing import Optional
class SearchParams(BaseModel):
query: str = Field(description="Search query")
max_results: int = Field(default=10, ge=1, le=100, description="Maximum number of results")
language: Optional[str] = Field(default=None, description="Language code (e.g. ko, en)")
@agent.tool
def search_docs(ctx: RunContext[str], params: SearchParams) -> list[str]:
"""Searches documents."""
...One thing to watch out for: constraints like Field(ge=1, le=100) serve two purposes simultaneously. First, they include the constraint in the JSON schema to hint to the LLM that it should generate a value between 1 and 100. Second, if the LLM ignores the hint and generates an out-of-range value, Pydantic catches it at runtime and feeds the error back to the LLM. Since LLMs don't always follow the schema perfectly, think of Pydantic's runtime validation as the last line of defense.
Practical Application
The examples below increase in complexity in stages. Pick whichever fits your situation.
- Example 1 (⭐): When you only need LLM output validation — structuring unstructured text
- Example 2 (⭐⭐): When you have DB or external API dependencies — a customer support agent
- Example 3 (⭐⭐): When adding AI streaming to a FastAPI project
- Example 4 (⭐⭐⭐): When composing multiple agents — multi-agent orchestration
Example 1: Extracting Structured Data from Unstructured Text ⭐
The value of type validation is maximized when extracting needed information from unstructured text like receipts, contracts, and emails. This is especially useful in pipelines that insert the extracted data directly into a database.
import asyncio
import os
from pydantic import BaseModel, field_validator # Pydantic v2 syntax
from pydantic_ai import Agent
from typing import Optional
MODEL = os.getenv("LLM_MODEL", "openai:gpt-4.1")
class InvoiceData(BaseModel):
vendor: str
amount: float
date: str
items: list[str]
currency: Optional[str] = "USD"
@field_validator('amount', mode='before') # Pydantic v2: @field_validator
@classmethod
def amount_must_be_positive(cls, v):
v = float(v)
if v <= 0:
raise ValueError('Amount must be positive')
return v
agent = Agent(MODEL, output_type=InvoiceData, retries=2)
async def extract_invoice(raw_text: str) -> InvoiceData:
result = await agent.run(f"Please extract information from the following receipt:\n{raw_text}")
return result.output # InvoiceData instance, type guaranteed
async def main():
raw_text = """
Vendor: TechSolutions Inc.
Amount: $550.00
Date: 2026-05-15
Items: Cloud server costs, Monitoring tool license
"""
invoice = await extract_invoice(raw_text)
print(f"Vendor: {invoice.vendor}") # str
print(f"Amount: ${invoice.amount:,.2f}") # float
print(f"Items: {', '.join(invoice.items)}") # list[str]
# No type error worries before DB insert (db is the actual DB client)
await db.insert("invoices", invoice.model_dump())
asyncio.run(main())In Pydantic v2, use
@field_validatorinstead of@validator. Pydantic AI is based on Pydantic v2, so using the old@validatorwill raise aPydanticUserError.
| Element | Role |
|---|---|
output_type=InvoiceData |
Parses and validates LLM output as InvoiceData |
@field_validator |
Additional business logic validation (e.g. positive amount check) |
model_dump() |
Converts the Pydantic model to a dict for direct use in DB inserts |
Example 2: Composing a Customer Support Agent with Dependency Injection ⭐⭐
This is a situation that comes up often in practice: when tool functions need to access a DB or external API, the key question is how to pass the dependencies. Using RunContext also makes swapping in mocks during tests clean and straightforward.
import os
from dataclasses import dataclass
from pydantic_ai import Agent, RunContext
MODEL = os.getenv("LLM_MODEL", "anthropic:claude-sonnet-4-20250514")
# AsyncDatabase and ExternalAPIClient are clients defined in your actual project.
# e.g. asyncpg.Connection, httpx.AsyncClient, etc.
@dataclass
class SupportDeps:
db: "AsyncDatabase"
user_id: int
api_client: "ExternalAPIClient"
agent = Agent(
MODEL,
deps_type=SupportDeps,
system_prompt="You are a customer support agent. Provide only accurate information.",
retries=2,
)
@agent.tool
async def get_order_status(ctx: RunContext[SupportDeps], order_id: str) -> str:
"""Retrieves the status of an order.
order_id: The order ID to look up (e.g. ORD-2026-001234)
"""
order = await ctx.deps.db.fetch_order(ctx.deps.user_id, order_id)
if not order:
return "The requested order could not be found."
return f"Order {order_id} status: {order.status} (estimated arrival: {order.estimated_arrival})"
@agent.tool
async def get_product_info(ctx: RunContext[SupportDeps], product_id: str) -> dict:
"""Retrieves product information.
product_id: The product ID to look up
"""
return await ctx.deps.api_client.fetch_product(product_id)
# Production usage
async def handle_request(user_id: int, message: str):
deps = SupportDeps(db=real_db, user_id=user_id, api_client=real_client)
result = await agent.run(message, deps=deps)
return result.output
# During testing — just swap in mock objects
async def test_order_status():
test_deps = SupportDeps(db=mock_db, user_id=99999, api_client=mock_client)
result = await agent.run("What is the status of order ORD-2026-001234?", deps=test_deps)
assert "status" in result.outputBecause the type of ctx.deps is fixed as SupportDeps, your IDE provides autocomplete when calling ctx.deps.db.fetch_order and catches typos. You might be tempted to use a dict instead of a dataclass at first, but using a structured type (dataclass or BaseModel) is much better for mypy to properly verify type safety.
Example 3: FastAPI Integration with Real-Time Streaming ⭐⭐
When adding an AI streaming endpoint to a FastAPI project, you can share the same Pydantic models directly, resulting in almost no code duplication. If you've used FastAPI's Depends before, the dependency injection flow will feel familiar.
import os
from fastapi import FastAPI, Depends
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from pydantic_ai import Agent
MODEL = os.getenv("LLM_MODEL", "anthropic:claude-sonnet-4-20250514")
app = FastAPI()
class ChatRequest(BaseModel):
message: str
user_id: int
agent = Agent(MODEL, deps_type=SupportDeps, retries=2)
# get_db() and get_client() are factory functions defined in your actual project.
def get_deps(request: ChatRequest) -> SupportDeps:
return SupportDeps(
db=get_db(),
user_id=request.user_id,
api_client=get_client(),
)
@app.post("/chat/stream")
async def chat_stream(
request: ChatRequest,
deps: SupportDeps = Depends(get_deps),
):
async def generate():
async with agent.run_stream(request.message, deps=deps) as response:
async for chunk in response.stream_text():
yield f"data: {chunk}\n\n"
yield "data: [DONE]\n\n"
return StreamingResponse(generate(), media_type="text/event-stream")
@app.post("/chat")
async def chat(
request: ChatRequest,
deps: SupportDeps = Depends(get_deps),
):
result = await agent.run(request.message, deps=deps)
return {"response": result.output}What's interesting is that the ChatRequest model is used simultaneously by both FastAPI's request validation and Pydantic AI's dependency injection. Since they share the same Pydantic ecosystem, there's no need to write duplicate model definitions.
Example 4: Multi-Agent Orchestration ⭐⭐⭐
This is a pattern where you register one agent as a tool for another agent. Taking a code review system as an example, analysis and security review are separated into independent agents, and a coordinator combines them.
import os
from pydantic_ai import Agent, RunContext
FAST_MODEL = os.getenv("FAST_LLM_MODEL", "openai:gpt-4.1")
MAIN_MODEL = os.getenv("LLM_MODEL", "anthropic:claude-sonnet-4-20250514")
# Sub-agents — use str type when passing an API key as deps
code_analyzer = Agent(FAST_MODEL, deps_type=str, output_type=str)
security_reviewer = Agent(MAIN_MODEL, deps_type=str, output_type=str)
coordinator = Agent(
MAIN_MODEL,
deps_type=str,
system_prompt="You oversee code review. Synthesize the analysis and security review results to produce a final review.",
retries=2,
)
@coordinator.tool
async def analyze_code(ctx: RunContext[str], code: str) -> str:
"""Analyzes code quality and structure.
code: The source code to analyze
"""
result = await code_analyzer.run(
f"Please analyze the following code:\n{code}",
deps=ctx.deps, # Pass the API key directly to the sub-agent
)
return result.output
@coordinator.tool
async def review_security(ctx: RunContext[str], code: str) -> str:
"""Reviews code for security vulnerabilities.
code: The source code to review
"""
result = await security_reviewer.run(
f"Please review this from a security perspective:\n{code}",
deps=ctx.deps,
)
return result.output
async def review_code(user_code: str, api_key: str) -> str:
result = await coordinator.run(
f"Please review the following code:\n{user_code}",
deps=api_key,
)
return result.outputOne thing to watch out for: when passing deps to sub-agents, the types must be consistent. Here, all agents are unified on deps_type=str (an API key). In a real project where each agent needs different dependencies, you can have the coordinator convert to the appropriate type before passing it along.
Pros and Cons Analysis
Reflecting on my experience using it in real projects, I see two decisive differences. One is "how important is type safety for this project," and the other is "how deeply is this integrated with the existing Python ecosystem (FastAPI, Pydantic)." If both conditions apply, Pydantic AI is by far the best choice.
Advantages
| Item | Description |
|---|---|
| End-to-end type safety | IDE autocomplete and mypy/pyright validation apply across all layers — agents, tools, and output |
| Automatic schema generation | JSON schemas passed to the LLM are generated automatically from type hints and docstrings alone |
| Built-in retry logic | When the LLM returns invalid arguments, Pydantic errors are fed back as feedback and retries happen automatically. The upper limit is configurable via the retries parameter |
| Clean DI pattern | RunContext injects DBs, APIs, and config into tool functions in a type-safe way, making test substitution easy |
| Model-agnostic API | Switch between OpenAI, Anthropic, Gemini, Bedrock, and Ollama through the same interface |
| FastAPI-friendly | Shares the same Pydantic models and async patterns, integrating naturally into existing FastAPI projects |
Disadvantages and Caveats
| Item | Description | Mitigation |
|---|---|---|
| Ecosystem size | Community roughly 15× smaller than LangChain; fewer pre-built integrations | Build custom integrations with @agent.tool directly; use GitHub Discussions |
| No security/compliance support | No built-in RBAC, prompt injection detection, or guardrails | Add a separate security layer (middleware, gateway) |
| Limited access to provider advanced features | The least-common-denominator abstraction can make it hard to leverage provider-specific advanced features | Write @agent.tool_plain wrappers that call the provider SDK directly |
| No graph-based workflows | Not well-suited for complex state machines or workflows with heavy conditional branching | Consider using LangGraph alongside it |
The Most Common Mistakes in Practice
-
Failing to use
@field_validatorinstead of@validator: Pydantic AI is based on Pydantic v2. Using the v1-style@validatorwill raise aPydanticUserError. In v2, use@field_validator('field_name', mode='before'). -
Mismatch between
deps_typeand the actualdepstype: For example, declaringAgent(deps_type=SupportDeps)but passingagent.run(deps={"db": ...})— a dict. pyright will catch it, but it can also go undetected until runtime, so using a dataclass or BaseModel from the start is strongly recommended. -
Writing sparse docstrings: The docstring of an
@agent.toolfunction is the tool description sent to the LLM. Writing just a single vague line makes it hard for the LLM to judge when it should use the tool. Including thorough parameter descriptions and usage examples significantly improves call accuracy.
Closing Thoughts
Using Pydantic AI in real projects, what I noticed wasn't so much that type errors decreased — it was that the timing of when they occur shifted. LLM response parsing failures that used to blow up unexpectedly at runtime now surface much earlier, as red squiggles in the IDE or as Pydantic retry feedback. The productivity impact of being able to share Pydantic models without duplicate definitions when using it alongside FastAPI was particularly tangible.
Three steps you can take right now:
-
Install and run your first agent: Install with
pip install 'pydantic-ai[openai]'(orpydantic-ai[anthropic]), then copy the weather example above and runagent.run_sync()with a real API key. You'll need either theOPENAI_API_KEYorANTHROPIC_API_KEYenvironment variable. -
Add type safety to existing tool functions: If you've already written LLM tool functions, you can refactor them to use
@agent.toolandRunContextfor a dependency injection pattern. Running mypy alongside will surface type mismatches immediately. -
Add output type validation: If you have code that receives LLM responses as
strand parses them manually, try defining a PydanticBaseModeland specifying it asoutput_type. Retry logic gets attached automatically, and parsing failure cases decrease significantly.
References
- Function Tools | Pydantic AI Official Docs
- Dependencies & RunContext | Pydantic AI Official Docs
- Output Validation | Pydantic AI Official Docs
- pydantic/pydantic-ai | GitHub
- Build Type-Safe LLM Agents in Python | Real Python
- Type-safe LLM agents with PydanticAI | Paul Simmering
- PydanticAI v1: The Type-Safe Agent Framework | AgentMarketCap
- PydanticAI vs LangChain vs LangGraph: Which Wins in 2026?
- Building Type-Safe LLM Agents With Pydantic AI | n1n.ai
- Pydantic AI | Thoughtworks Technology Radar
- Bulletproof Agentic Workflows with PydanticAI | MarkTechPost