Privacy Policy© 2026 DEV BAK - TECH BLOG. All rights reserved.
DEV BAK - TECH BLOG
AI

Type-Safe LLM Response Validation with Pydantic AI

If you've ever wired an LLM into production, you've probably hit this situation at least once. You carefully wrote a system prompt telling GPT to respond in JSON, and then a KeyError blows up at runtime. Or result["block_card"] is True on some days and "true" on others. Code that receives LLM output as a dictionary always carries this kind of time bomb.

Pydantic AI is a Python-based agent framework that forces LLM responses through Pydantic BaseModel parsing and validation, catching type mismatches and missing fields before they reach runtime. The Pydantic team released it in late 2024 and shipped the stable v1.0 in September 2025; it was listed as "Trial" on the Thoughtworks Technology Radar 2025. Its philosophy — "bring the DX that FastAPI gave web API development to GenAI agent development" — is woven throughout the codebase, so Python backend developers who've used FastAPI will find the learning curve gentler than expected.

By the end of this post you'll have practical techniques you can use immediately: making mypy pass on LLM integration code, testing agents in CI without real API costs, and attaching LLM analysis to FastAPI endpoints without extra serialization code. I'll also give you an honest take on when to choose Pydantic AI versus LangChain or LangGraph.

Table of Contents

  • Core Concepts — type-safe output, dependency injection, model-agnostic design
  • Real-World Application — banking agent, FastAPI integration
  • Testing Without Cost
  • Which Framework Should You Choose?
  • Pros and Cons
  • Closing Thoughts

Core Concepts

Type-Safe Output — Turning LLM Responses into Python Objects with output_type

Traditional agent frameworks return LLM responses as strings or loosely typed dictionaries. Pydantic AI is different. Declare a Pydantic model as output_type, and its JSON Schema is automatically injected into the LLM prompt; the response is then returned as a fully validated Python object.

python
import asyncio
from typing import Literal
from pydantic import BaseModel
from pydantic_ai import Agent
 
class AnalysisResult(BaseModel):
    sentiment: Literal["positive", "negative", "neutral"]  # literal type, not str
    confidence: float  # 0.0 ~ 1.0
    summary: str
 
agent = Agent('openai:gpt-4o', output_type=AnalysisResult)
 
async def main():
    result = await agent.run('Analyze this review: The delivery was way too slow')
    print(result.output.confidence)  # guaranteed float, IDE autocomplete works
    print(result.output.sentiment)   # one of "positive" | "negative" | "neutral"
 
asyncio.run(main())

At first I wondered "how well does injecting a JSON Schema into the prompt actually work in practice?" — it turns out to be far more stable than I expected. When the LLM returns invalid JSON, the entire validation error message is fed back to the LLM as feedback and a retry is triggered. The default is one retry (two attempts total), adjustable with Agent(retries=3). If it still fails, a ValidationError is raised clearly.

Note that Structured Output is implemented differently across LLM providers, and Pydantic AI abstracts all of that away. OpenAI uses response_format={"type": "json_schema"}, Anthropic enforces the schema via tool_use — but from the developer's perspective, all you declare is output_type.

Structured Output — A technique for prompting an LLM to return a response conforming to a specific JSON schema rather than free text. Pydantic AI automates this process and handles retries with error feedback when validation fails.


Dependency Injection — Passing DB and HTTP Clients to Tools in a Type-Safe Way

If you've used FastAPI's Depends pattern, this will feel immediately familiar. Declare a dependency container with deps_type, and inside agent tool functions you can pull it out type-safely via ctx.deps. It lets you inject DB connections, HTTP clients, config values, and more explicitly — without global state or environment variables.

python
from dataclasses import dataclass
from httpx import AsyncClient  # pip install httpx
from pydantic_ai import Agent, RunContext
 
# from myapp.db import DatabaseConn  # your project's actual DB connection class
 
@dataclass
class AppDeps:
    db: DatabaseConn
    http_client: AsyncClient
    user_id: int
 
agent = Agent('anthropic:claude-3-7-sonnet', deps_type=AppDeps)
 
@agent.tool
async def fetch_user_orders(ctx: RunContext[AppDeps]) -> list[dict]:
    # ctx.deps.db and ctx.deps.user_id are both fully type-inferred
    return await ctx.deps.db.query(
        'SELECT * FROM orders WHERE user_id = $1', ctx.deps.user_id
    )

RunContext[T] — The context object passed to tool functions. Declare the deps type as a generic T and both IDE autocomplete and mypy checks work on ctx.deps access.

The real advantage of this structure shows up in testing. You can put mock objects into AppDeps and write unit tests without any real API cost. More on this in the Testing Without Cost section below.


Model-Agnostic — Swap Vendors in One Line

According to the official documentation, over 25 LLM providers are supported — including OpenAI, Anthropic, Gemini, DeepSeek, Mistral, and Ollama — and you can switch providers by changing a single model string without touching any business logic.

python
# Development: local Ollama (no API cost)
agent = Agent('ollama:llama3.1')
 
# Production: OpenAI
agent = Agent('openai:gpt-4o')
 
# Cost-cutting experiment: DeepSeek
agent = Agent('deepseek:deepseek-chat')

Model-Agnostic — A design that avoids lock-in to any specific LLM vendor, allowing you to respond flexibly to vendor price changes or service outages.


Real-World Application

Example 1: Banking Customer Support Agent

This is a classic case in the financial domain where LLM output must be consumed programmatically. You can't have block_card fluctuating between True, "true", and 1. Our team started without output_type, using plain dictionaries, and when we migrated later we ended up touching far more code than we expected. That experience made it clear: defining a Pydantic model from the start is the much better path.

python
import asyncio
from dataclasses import dataclass
from pydantic import BaseModel, Field
from pydantic_ai import Agent, RunContext
 
# from myapp.db import DatabaseConn  # your project's actual DB connection class
 
@dataclass
class SupportDeps:
    customer_id: int
    db: DatabaseConn
 
class SupportResult(BaseModel):
    support_advice: str
    block_card: bool
    risk_level: int = Field(ge=1, le=10)  # Pydantic validation: must be between 1 and 10
 
agent = Agent(
    'openai:gpt-4o',
    deps_type=SupportDeps,
    output_type=SupportResult,
    system_prompt='You are a bank customer support agent. Evaluate the risk of fraudulent transactions.',
)
 
@agent.tool
async def get_customer_balance(ctx: RunContext[SupportDeps]) -> float:
    return await ctx.deps.db.get_balance(ctx.deps.customer_id)
 
@agent.tool
async def get_recent_transactions(ctx: RunContext[SupportDeps]) -> list[dict]:
    return await ctx.deps.db.get_transactions(ctx.deps.customer_id, limit=10)
 
async def main():
    result = await agent.run(
        'I lost my card and there are unknown charges from overseas',
        deps=SupportDeps(customer_id=123, db=db_conn)
    )
 
    # Fully type-safe access — IDE autocomplete, mypy passes
    if result.output.block_card:
        await card_service.block(customer_id=123)
    print(f"Risk level: {result.output.risk_level}/10")  # guaranteed int
 
asyncio.run(main())
Point Description
block_card: bool Whatever form the LLM responds in, it is coerced and validated to Python bool
risk_level: int = Field(ge=1, le=10) Range validation is handled by Pydantic; out-of-range values trigger a retry
RunContext[SupportDeps] Type inference works perfectly inside tool functions, no global state needed

Example 2: Integration with a FastAPI Endpoint

This is a common real-world scenario: attaching LLM analysis to a FastAPI router cleanly, without any extra serialization code. It works because FastAPI's response_model and the agent's output_type share the same Pydantic model.

python
from fastapi import FastAPI
from pydantic import BaseModel, Field
from pydantic_ai import Agent
 
class ReviewRequest(BaseModel):
    text: str
    language: str = 'en'
 
class ReviewAnalysis(BaseModel):
    sentiment: str
    key_issues: list[str]
    recommended_action: str
    priority: int = Field(ge=1, le=5)
 
app = FastAPI()
agent = Agent('anthropic:claude-3-7-sonnet', output_type=ReviewAnalysis)
 
@app.post('/analyze-review', response_model=ReviewAnalysis)
async def analyze_review(request: ReviewRequest) -> ReviewAnalysis:
    result = await agent.run(
        f"Please analyze the following {request.language} review: {request.text}"
    )
    return result.output  # return the already-validated Pydantic model directly

Because result.output is already a validated ReviewAnalysis object, FastAPI serializes it directly. The core of this pattern is that no separate conversion code is needed at the API boundary.


Testing Without Cost

Honestly, this is the feature I like most about Pydantic AI. Using TestModel, you can test agent logic without making any real LLM API calls.

TestModel automatically generates default values appropriate for each field type in your Pydantic model: str becomes an empty string (""), int becomes 0, bool becomes False, float becomes 0.0, and so on. This lets you focus on testing "does the agent honor the correct type contract" rather than "what value will the LLM return."

python
import pytest
from unittest.mock import AsyncMock
from pydantic_ai.models.test import TestModel
 
# requires pytest-asyncio: pip install pytest-asyncio
# @pytest.mark.asyncio — decorator that lets pytest run async test functions
 
@pytest.mark.asyncio
async def test_support_result_schema_contract():
    mock_db = AsyncMock()
    mock_db.get_balance.return_value = 50000.0
    mock_db.get_transactions.return_value = [
        {"amount": 9999.99, "country": "NG", "merchant": "unknown"}
    ]
 
    with agent.override(model=TestModel()):
        result = await agent.run(
            'There is a suspicious large overseas charge',
            deps=SupportDeps(customer_id=456, db=mock_db)
        )
 
    # TestModel return values: block_card=False, risk_level=0, support_advice=""
    assert isinstance(result.output, SupportResult)    # validate type contract
    assert isinstance(result.output.block_card, bool)  # guarantee bool type
    # Field(ge=1, le=10) constraint → risk_level=0 will fail, so actual validation range is also verifiable

You can catch type contract violations in CI without worrying about LLM API costs. When you need to validate actual LLM behavior, building a separate evaluation pipeline with pydantic-evals is also a solid approach.


Which Framework Should You Choose?

I promised an honest comparison with LangChain and LangGraph at the start, so here it is. This isn't an argument that "Pydantic AI is always better" — the right choice depends on your situation.

Situation Recommended Framework Reason
Consuming LLM output programmatically (API responses, DB writes) Pydantic AI Type safety, mypy integration, FastAPI synergy
RAG pipelines, diverse document loaders needed LangChain Rich ecosystem of loaders, embeddings, and vector DBs
Complex multi-agent with many branches and loops LangGraph Mature graph-based state control
Role-based multi-agent collaboration (researcher, writer, reviewer) CrewAI Role abstractions, human-readable configuration
Only lightweight Structured Output needed Instructor Lighter than Pydantic AI, patches directly onto LLM libraries

Mixing frameworks is a perfectly realistic choice in practice. An architecture where LangChain handles document chunking and embedding, with Pydantic AI processing the final LLM output, works naturally.

When you need pydantic-graph

Within Pydantic AI itself, you can consider switching to the pydantic-graph module when complex workflows are required. The following criteria serve as a rough guide:

  • When there are two or more branch conditions and each branch needs to call different tools
  • When you need to save state between steps and resume later
  • When certain tool calls require Human-in-the-Loop approval

That said, it is less mature than LangGraph, so for complex graph workflows it is worth evaluating LangGraph in parallel.

pydantic-graph — An optional Pydantic AI module that supports complex multi-step workflows based on state machines. Similar in concept to LangGraph's graph-based approach, but currently less mature.


Pros and Cons

Advantages

Item Details
Pre-runtime error detection mypy/pyright catches type mismatches in agent logic at development time
Automatic retry and validation When the LLM returns invalid JSON, it retries with error feedback; raises a clear exception on final failure
Cost-free testing Unit tests are possible without real API costs using TestModel and deps mocking
Model independence Supports 25+ LLMs; switch providers without changing business logic
FastAPI compatibility Same team, same DI patterns, natural ecosystem integration
Pure Python Native async/await support with no separate DSL

Disadvantages and Caveats

Item Details Mitigation
Python only TypeScript/Go teams cannot use it Explore alternatives like Mastra for the JS ecosystem
Small community ~16.5K GitHub stars vs. LangChain (100K+); fewer templates and examples Official docs and example repos are well-organized, partially compensating
Observability requires a paid service Production tracing depends on Pydantic Logfire (paid) Can be replaced by configuring a direct OpenTelemetry connection
Complex multi-agent Sophisticated branching/looping workflows require pydantic-graph, which is less mature than LangGraph Evaluate LangGraph in parallel for complex graph workflows
Tool call inefficiency In some scenarios, repeated tool calls increase token costs Minimize call count in tool design; consider adding a caching layer

OpenTelemetry — A vendor-agnostic observability standard and the foundation of Pydantic Logfire. Connecting to it directly lets you integrate with existing infrastructure like Datadog or Grafana.

The Most Common Real-World Mistakes

  1. Defining an agent without output_type — Our team started with string responses and migrated to Pydantic models later, touching far more code than expected. Declaring output_type from the start is much better. The only overhead is a few lines of Pydantic model definition.

  2. Managing dependencies as global state — Referencing a global DB connection instead of using deps_type makes test isolation impossible. The explicit injection pattern via deps is far better for maintainability.

  3. Trying to build an entire RAG pipeline with Pydantic AI — Pydantic AI specializes in the agent layer. LangChain has a richer ecosystem for RAG scenarios that require diverse document loaders and embedding pipelines. Mixing the two is a perfectly realistic option.


Closing Thoughts

We've walked through code examples demonstrating the three patterns — output_type → deps_type → TestModel — that you can try yourself. At the single-agent level, this combination alone satisfies most production type-safety requirements. One natural question remains: what exactly is this agent doing in production, and where are the token costs coming from?

If you're working primarily on Python backends and need to consume LLM output programmatically, you can get started right now.

  1. Install with pip install pydantic-ai → Try replacing one LLM response that you're currently handling as a dictionary with a Pydantic BaseModel. Adding just the output_type parameter will immediately change how IDE autocomplete behaves.

  2. Introduce deps_type → Refactor DB connections or HTTP clients from global state into explicit dependencies. Writing unit tests with TestModel() becomes natural as a result.

  3. Connect to one FastAPI endpoint → Sharing the same Pydantic model as response_model and output_type lets you immediately see the pattern of clean integration with no serialization code.


References

Official Documentation

  • Pydantic AI Official Documentation
  • Pydantic AI GitHub Repository
  • Pydantic Logfire AI Observability Official Documentation

Comparative Analysis

  • Pydantic AI — Thoughtworks Technology Radar
  • PydanticAI v1: The Type-Safe Agent Framework Rewriting the Python Agent Stack — AgentMarketCap
  • Pydantic AI vs LangChain 2026: Type-Safe or Flexible — Which Wins? — Kunal Ganglani
  • LangChain vs PydanticAI for building an AI Agent — Medium
  • The 2026 AI Agent Framework Decision Guide — DEV Community

Further Learning

  • Pydantic AI: Build Type-Safe LLM Agents in Python — Real Python
  • Pydantic AI Tutorial: How I Build Type-Safe AI Agents That Actually Work in Production — DEV Community
  • Building AI Agents in Python with Pydantic AI — MachineLearningMastery
  • Pydantic AI and MCP: Building Production-Grade AI Applications — Medium
  • What is Pydantic AI? Type-Safe Agent Framework in 2026 — FutureAGI
#PydanticAI#Python#LLM#타입안전성#FastAPI#의존성주입#StructuredOutput#AI에이전트#LangChain#OpenTelemetry
Share

Table of Contents

Core ConceptsType-Safe Output — Turning LLM Responses into Python Objects withDependency Injection — Passing DB and HTTP Clients to Tools in a Type-Safe WayModel-Agnostic — Swap Vendors in One LineReal-World ApplicationExample 1: Banking Customer Support AgentExample 2: Integration with a FastAPI EndpointTesting Without CostWhich Framework Should You Choose?Pros and ConsAdvantagesDisadvantages and CaveatsThe Most Common Real-World MistakesClosing ThoughtsReferences

Recommended Posts

Cutting Long-Horizon Agent Costs by 60–90%: Caching, Compression, and Routing Strategies
AI

Cutting Long-Horizon Agent Costs by 60–90%: Caching, Compression, and Routing Strategies

I still remember the shock of receiving that first bill after putting an AI agent into production. A simple chatbot would have been predictable, but agents were...

June 7, 202624 min read
AI Writes It, AI Reviews It: Building a `/code-review ultra` Multi-Agent Pipeline
AI

AI Writes It, AI Reviews It: Building a `/code-review ultra` Multi-Agent Pipeline

Honestly, when I first heard about this concept, my reaction was "does that actually work?" It's already remarkable that an agent can write code on its own — bu...

June 7, 202620 min read
How AI Coding Agents Are Reshaping Dev Team Structure: How to Transition into an Orchestrator
AI

How AI Coding Agents Are Reshaping Dev Team Structure: How to Transition into an Orchestrator

To be honest, when I first heard "we're restructuring the team after adopting coding agents," I dismissed it as inflated marketing speak. I could feel that AI-a...

June 12, 202625 min read
How to Make LLMs Directly Call Your Internal REST APIs: TypeScript MCP Server Implementation and the Gateway Pattern
AI

How to Make LLMs Directly Call Your Internal REST APIs: TypeScript MCP Server Implementation and the Gateway Pattern

Have you ever tried to introduce an AI agent to your team, only to get stuck on the question "so how do we connect our internal APIs?" I started out trying to p...

June 7, 202619 min read
7 Major Patterns of Agentic AI Design
AI

7 Major Patterns of Agentic AI Design

Use + ReAct | KB, ticket DB, and other external systems with repeated lookups | | Response writing | Response agent | Reflection | Self-review of tone and accu...

June 6, 20269 min read
Open-Weight vs Closed AI 2026: Now That the Benchmark Gap Has Narrowed, the Criteria for Choosing Has Changed
AI

Open-Weight vs Closed AI 2026: Now That the Benchmark Gap Has Narrowed, the Criteria for Choosing Has Changed

To be honest, until a year ago I thought closed models would maintain an overwhelming lead for some time. It seemed only natural to plug in an OpenAI API key to...

June 6, 202623 min read