Type-Safe LLM Response Validation with Pydantic AI
If you've ever wired an LLM into production, you've probably hit this situation at least once. You carefully wrote a system prompt telling GPT to respond in JSON, and then a KeyError blows up at runtime. Or result["block_card"] is True on some days and "true" on others. Code that receives LLM output as a dictionary always carries this kind of time bomb.
Pydantic AI is a Python-based agent framework that forces LLM responses through Pydantic BaseModel parsing and validation, catching type mismatches and missing fields before they reach runtime. The Pydantic team released it in late 2024 and shipped the stable v1.0 in September 2025; it was listed as "Trial" on the Thoughtworks Technology Radar 2025. Its philosophy — "bring the DX that FastAPI gave web API development to GenAI agent development" — is woven throughout the codebase, so Python backend developers who've used FastAPI will find the learning curve gentler than expected.
By the end of this post you'll have practical techniques you can use immediately: making mypy pass on LLM integration code, testing agents in CI without real API costs, and attaching LLM analysis to FastAPI endpoints without extra serialization code. I'll also give you an honest take on when to choose Pydantic AI versus LangChain or LangGraph.
Table of Contents
- Core Concepts — type-safe output, dependency injection, model-agnostic design
- Real-World Application — banking agent, FastAPI integration
- Testing Without Cost
- Which Framework Should You Choose?
- Pros and Cons
- Closing Thoughts
Core Concepts
Type-Safe Output — Turning LLM Responses into Python Objects with output_type
Traditional agent frameworks return LLM responses as strings or loosely typed dictionaries. Pydantic AI is different. Declare a Pydantic model as output_type, and its JSON Schema is automatically injected into the LLM prompt; the response is then returned as a fully validated Python object.
import asyncio
from typing import Literal
from pydantic import BaseModel
from pydantic_ai import Agent
class AnalysisResult(BaseModel):
sentiment: Literal["positive", "negative", "neutral"] # literal type, not str
confidence: float # 0.0 ~ 1.0
summary: str
agent = Agent('openai:gpt-4o', output_type=AnalysisResult)
async def main():
result = await agent.run('Analyze this review: The delivery was way too slow')
print(result.output.confidence) # guaranteed float, IDE autocomplete works
print(result.output.sentiment) # one of "positive" | "negative" | "neutral"
asyncio.run(main())At first I wondered "how well does injecting a JSON Schema into the prompt actually work in practice?" — it turns out to be far more stable than I expected. When the LLM returns invalid JSON, the entire validation error message is fed back to the LLM as feedback and a retry is triggered. The default is one retry (two attempts total), adjustable with Agent(retries=3). If it still fails, a ValidationError is raised clearly.
Note that Structured Output is implemented differently across LLM providers, and Pydantic AI abstracts all of that away. OpenAI uses response_format={"type": "json_schema"}, Anthropic enforces the schema via tool_use — but from the developer's perspective, all you declare is output_type.
Structured Output — A technique for prompting an LLM to return a response conforming to a specific JSON schema rather than free text. Pydantic AI automates this process and handles retries with error feedback when validation fails.
Dependency Injection — Passing DB and HTTP Clients to Tools in a Type-Safe Way
If you've used FastAPI's Depends pattern, this will feel immediately familiar. Declare a dependency container with deps_type, and inside agent tool functions you can pull it out type-safely via ctx.deps. It lets you inject DB connections, HTTP clients, config values, and more explicitly — without global state or environment variables.
from dataclasses import dataclass
from httpx import AsyncClient # pip install httpx
from pydantic_ai import Agent, RunContext
# from myapp.db import DatabaseConn # your project's actual DB connection class
@dataclass
class AppDeps:
db: DatabaseConn
http_client: AsyncClient
user_id: int
agent = Agent('anthropic:claude-3-7-sonnet', deps_type=AppDeps)
@agent.tool
async def fetch_user_orders(ctx: RunContext[AppDeps]) -> list[dict]:
# ctx.deps.db and ctx.deps.user_id are both fully type-inferred
return await ctx.deps.db.query(
'SELECT * FROM orders WHERE user_id = $1', ctx.deps.user_id
)RunContext[T] — The context object passed to tool functions. Declare the deps type as a generic
Tand both IDE autocomplete and mypy checks work onctx.depsaccess.
The real advantage of this structure shows up in testing. You can put mock objects into AppDeps and write unit tests without any real API cost. More on this in the Testing Without Cost section below.
Model-Agnostic — Swap Vendors in One Line
According to the official documentation, over 25 LLM providers are supported — including OpenAI, Anthropic, Gemini, DeepSeek, Mistral, and Ollama — and you can switch providers by changing a single model string without touching any business logic.
# Development: local Ollama (no API cost)
agent = Agent('ollama:llama3.1')
# Production: OpenAI
agent = Agent('openai:gpt-4o')
# Cost-cutting experiment: DeepSeek
agent = Agent('deepseek:deepseek-chat')Model-Agnostic — A design that avoids lock-in to any specific LLM vendor, allowing you to respond flexibly to vendor price changes or service outages.
Real-World Application
Example 1: Banking Customer Support Agent
This is a classic case in the financial domain where LLM output must be consumed programmatically. You can't have block_card fluctuating between True, "true", and 1. Our team started without output_type, using plain dictionaries, and when we migrated later we ended up touching far more code than we expected. That experience made it clear: defining a Pydantic model from the start is the much better path.
import asyncio
from dataclasses import dataclass
from pydantic import BaseModel, Field
from pydantic_ai import Agent, RunContext
# from myapp.db import DatabaseConn # your project's actual DB connection class
@dataclass
class SupportDeps:
customer_id: int
db: DatabaseConn
class SupportResult(BaseModel):
support_advice: str
block_card: bool
risk_level: int = Field(ge=1, le=10) # Pydantic validation: must be between 1 and 10
agent = Agent(
'openai:gpt-4o',
deps_type=SupportDeps,
output_type=SupportResult,
system_prompt='You are a bank customer support agent. Evaluate the risk of fraudulent transactions.',
)
@agent.tool
async def get_customer_balance(ctx: RunContext[SupportDeps]) -> float:
return await ctx.deps.db.get_balance(ctx.deps.customer_id)
@agent.tool
async def get_recent_transactions(ctx: RunContext[SupportDeps]) -> list[dict]:
return await ctx.deps.db.get_transactions(ctx.deps.customer_id, limit=10)
async def main():
result = await agent.run(
'I lost my card and there are unknown charges from overseas',
deps=SupportDeps(customer_id=123, db=db_conn)
)
# Fully type-safe access — IDE autocomplete, mypy passes
if result.output.block_card:
await card_service.block(customer_id=123)
print(f"Risk level: {result.output.risk_level}/10") # guaranteed int
asyncio.run(main())| Point | Description |
|---|---|
block_card: bool |
Whatever form the LLM responds in, it is coerced and validated to Python bool |
risk_level: int = Field(ge=1, le=10) |
Range validation is handled by Pydantic; out-of-range values trigger a retry |
RunContext[SupportDeps] |
Type inference works perfectly inside tool functions, no global state needed |
Example 2: Integration with a FastAPI Endpoint
This is a common real-world scenario: attaching LLM analysis to a FastAPI router cleanly, without any extra serialization code. It works because FastAPI's response_model and the agent's output_type share the same Pydantic model.
from fastapi import FastAPI
from pydantic import BaseModel, Field
from pydantic_ai import Agent
class ReviewRequest(BaseModel):
text: str
language: str = 'en'
class ReviewAnalysis(BaseModel):
sentiment: str
key_issues: list[str]
recommended_action: str
priority: int = Field(ge=1, le=5)
app = FastAPI()
agent = Agent('anthropic:claude-3-7-sonnet', output_type=ReviewAnalysis)
@app.post('/analyze-review', response_model=ReviewAnalysis)
async def analyze_review(request: ReviewRequest) -> ReviewAnalysis:
result = await agent.run(
f"Please analyze the following {request.language} review: {request.text}"
)
return result.output # return the already-validated Pydantic model directlyBecause result.output is already a validated ReviewAnalysis object, FastAPI serializes it directly. The core of this pattern is that no separate conversion code is needed at the API boundary.
Testing Without Cost
Honestly, this is the feature I like most about Pydantic AI. Using TestModel, you can test agent logic without making any real LLM API calls.
TestModel automatically generates default values appropriate for each field type in your Pydantic model: str becomes an empty string (""), int becomes 0, bool becomes False, float becomes 0.0, and so on. This lets you focus on testing "does the agent honor the correct type contract" rather than "what value will the LLM return."
import pytest
from unittest.mock import AsyncMock
from pydantic_ai.models.test import TestModel
# requires pytest-asyncio: pip install pytest-asyncio
# @pytest.mark.asyncio — decorator that lets pytest run async test functions
@pytest.mark.asyncio
async def test_support_result_schema_contract():
mock_db = AsyncMock()
mock_db.get_balance.return_value = 50000.0
mock_db.get_transactions.return_value = [
{"amount": 9999.99, "country": "NG", "merchant": "unknown"}
]
with agent.override(model=TestModel()):
result = await agent.run(
'There is a suspicious large overseas charge',
deps=SupportDeps(customer_id=456, db=mock_db)
)
# TestModel return values: block_card=False, risk_level=0, support_advice=""
assert isinstance(result.output, SupportResult) # validate type contract
assert isinstance(result.output.block_card, bool) # guarantee bool type
# Field(ge=1, le=10) constraint → risk_level=0 will fail, so actual validation range is also verifiableYou can catch type contract violations in CI without worrying about LLM API costs. When you need to validate actual LLM behavior, building a separate evaluation pipeline with pydantic-evals is also a solid approach.
Which Framework Should You Choose?
I promised an honest comparison with LangChain and LangGraph at the start, so here it is. This isn't an argument that "Pydantic AI is always better" — the right choice depends on your situation.
| Situation | Recommended Framework | Reason |
|---|---|---|
| Consuming LLM output programmatically (API responses, DB writes) | Pydantic AI | Type safety, mypy integration, FastAPI synergy |
| RAG pipelines, diverse document loaders needed | LangChain | Rich ecosystem of loaders, embeddings, and vector DBs |
| Complex multi-agent with many branches and loops | LangGraph | Mature graph-based state control |
| Role-based multi-agent collaboration (researcher, writer, reviewer) | CrewAI | Role abstractions, human-readable configuration |
| Only lightweight Structured Output needed | Instructor | Lighter than Pydantic AI, patches directly onto LLM libraries |
Mixing frameworks is a perfectly realistic choice in practice. An architecture where LangChain handles document chunking and embedding, with Pydantic AI processing the final LLM output, works naturally.
When you need pydantic-graph
Within Pydantic AI itself, you can consider switching to the pydantic-graph module when complex workflows are required. The following criteria serve as a rough guide:
- When there are two or more branch conditions and each branch needs to call different tools
- When you need to save state between steps and resume later
- When certain tool calls require Human-in-the-Loop approval
That said, it is less mature than LangGraph, so for complex graph workflows it is worth evaluating LangGraph in parallel.
pydantic-graph — An optional Pydantic AI module that supports complex multi-step workflows based on state machines. Similar in concept to LangGraph's graph-based approach, but currently less mature.
Pros and Cons
Advantages
| Item | Details |
|---|---|
| Pre-runtime error detection | mypy/pyright catches type mismatches in agent logic at development time |
| Automatic retry and validation | When the LLM returns invalid JSON, it retries with error feedback; raises a clear exception on final failure |
| Cost-free testing | Unit tests are possible without real API costs using TestModel and deps mocking |
| Model independence | Supports 25+ LLMs; switch providers without changing business logic |
| FastAPI compatibility | Same team, same DI patterns, natural ecosystem integration |
| Pure Python | Native async/await support with no separate DSL |
Disadvantages and Caveats
| Item | Details | Mitigation |
|---|---|---|
| Python only | TypeScript/Go teams cannot use it | Explore alternatives like Mastra for the JS ecosystem |
| Small community | ~16.5K GitHub stars vs. LangChain (100K+); fewer templates and examples | Official docs and example repos are well-organized, partially compensating |
| Observability requires a paid service | Production tracing depends on Pydantic Logfire (paid) | Can be replaced by configuring a direct OpenTelemetry connection |
| Complex multi-agent | Sophisticated branching/looping workflows require pydantic-graph, which is less mature than LangGraph |
Evaluate LangGraph in parallel for complex graph workflows |
| Tool call inefficiency | In some scenarios, repeated tool calls increase token costs | Minimize call count in tool design; consider adding a caching layer |
OpenTelemetry — A vendor-agnostic observability standard and the foundation of Pydantic Logfire. Connecting to it directly lets you integrate with existing infrastructure like Datadog or Grafana.
The Most Common Real-World Mistakes
-
Defining an agent without
output_type— Our team started with string responses and migrated to Pydantic models later, touching far more code than expected. Declaringoutput_typefrom the start is much better. The only overhead is a few lines of Pydantic model definition. -
Managing dependencies as global state — Referencing a global DB connection instead of using
deps_typemakes test isolation impossible. The explicit injection pattern viadepsis far better for maintainability. -
Trying to build an entire RAG pipeline with Pydantic AI — Pydantic AI specializes in the agent layer. LangChain has a richer ecosystem for RAG scenarios that require diverse document loaders and embedding pipelines. Mixing the two is a perfectly realistic option.
Closing Thoughts
We've walked through code examples demonstrating the three patterns — output_type → deps_type → TestModel — that you can try yourself. At the single-agent level, this combination alone satisfies most production type-safety requirements. One natural question remains: what exactly is this agent doing in production, and where are the token costs coming from?
If you're working primarily on Python backends and need to consume LLM output programmatically, you can get started right now.
-
Install with
pip install pydantic-ai→ Try replacing one LLM response that you're currently handling as a dictionary with a Pydantic BaseModel. Adding just theoutput_typeparameter will immediately change how IDE autocomplete behaves. -
Introduce
deps_type→ Refactor DB connections or HTTP clients from global state into explicit dependencies. Writing unit tests withTestModel()becomes natural as a result. -
Connect to one FastAPI endpoint → Sharing the same Pydantic model as
response_modelandoutput_typelets you immediately see the pattern of clean integration with no serialization code.
References
Official Documentation
- Pydantic AI Official Documentation
- Pydantic AI GitHub Repository
- Pydantic Logfire AI Observability Official Documentation
Comparative Analysis
- Pydantic AI — Thoughtworks Technology Radar
- PydanticAI v1: The Type-Safe Agent Framework Rewriting the Python Agent Stack — AgentMarketCap
- Pydantic AI vs LangChain 2026: Type-Safe or Flexible — Which Wins? — Kunal Ganglani
- LangChain vs PydanticAI for building an AI Agent — Medium
- The 2026 AI Agent Framework Decision Guide — DEV Community
Further Learning
- Pydantic AI: Build Type-Safe LLM Agents in Python — Real Python
- Pydantic AI Tutorial: How I Build Type-Safe AI Agents That Actually Work in Production — DEV Community
- Building AI Agents in Python with Pydantic AI — MachineLearningMastery
- Pydantic AI and MCP: Building Production-Grade AI Applications — Medium
- What is Pydantic AI? Type-Safe Agent Framework in 2026 — FutureAGI