How to Delegate Agent Session State and Error Recovery to Anthropic with Claude Managed Agents — Direct Implementation of /v1/agents·/v1/sessions vs. the Boundaries of Hosted Infrastructure
When you try to attach AI in the middle of a microservice, you quickly realize something: the surrounding code—how to recover when a session drops midway, how to implement retries when tool execution fails, where to run the sandbox environment—is far more complex than the model API calls themselves. I once lost three full days to a checkpoint logic bug while building my own agent loop. You've probably experienced it too: the business logic is all done, but the release keeps slipping because of agent infrastructure.
Claude Managed Agents, released by Anthropic as a public beta in April 2026, proposes a structure that lets you delegate this entire infrastructure burden. You define an agent with POST /v1/agents and start a long-running session with POST /v1/sessions—then Anthropic's infrastructure takes over session state management, automated error recovery, and sandbox operations. This article examines how that delegation actually works in practice, along with code, and where the line is between what you hand off and what you keep for yourself.
This is less about a "convenient wrapper API" and more about an architectural decision. Make the wrong call and you may end up with vendor lock-in or data sovereignty issues down the road. Let's look at the pros and cons honestly.
Core Concepts
What Actually Happens When You Implement an Agent Loop Yourself
If you implement an agent on your own, the flow looks like this:
Developer code
→ Model API call
→ Parse tool_use from response
→ Execute tool
→ Retry logic on failure (backoff, retry count, exception branching)
→ Save state to DB or Redis
→ Next loop iterationTo make this robust, you need to implement checkpoints, distributed locks, container crash handling, and timeout management. Honestly, this infrastructure often ends up more complex than the business logic itself. Managed Agents moves this entire loop to Anthropic's side.
Developer code
→ POST /v1/sessions (send task event)
→ Receive result stream
↕
Anthropic infrastructure (agent loop + error recovery + state management)The Brain-Hands Split: What Can You Delegate and What Can You Keep?
This is the core design principle of the system, published by the Anthropic Engineering blog. It's the most important concept for understanding the scope of delegation.
| Role | Responsibility | Execution Location |
|---|---|---|
| Brain | Claude model inference, prompt caching, context compression, agent loop | Always Anthropic infrastructure |
| Hands | Shell command execution, file read/write, the sandbox where code runs | Default: Anthropic-managed cloud / Option: your own infrastructure (Self-Hosted Sandboxes) |
Self-Hosted Sandboxes is the feature that lets you move only the "hands" to your own infrastructure. You keep the model inference logic (brain) at Anthropic, but bring only the environment where code actually runs (hands) onto your internal servers. This is especially useful in environments like finance or healthcare where data cannot leave the premises.
How Session State Delegation Actually Works
Core principle: Session event logs are separated into external persistent storage, so the execution container (hands) is treated as stateless. Even if a container crashes, the harness takes over on another container based on the session log, so no work is lost.
From the developer's perspective, upon reconnection you just call client.beta.sessions.stream() again and you receive the full event log up to that point. You don't need to manually track how far the internal checkpointing has progressed.
harness: The execution environment component in Anthropic's infrastructure that actually runs and manages the agent loop. It is responsible for orchestrating tool calls, relaying errors, and maintaining the session log. When a network error or container crash occurs, the harness automatically picks up on another container and resumes the session.
How Error Recovery Is Delegated
When tool execution fails, the harness catches that failure and delivers it to Claude as a tool call response. Claude then decides on its own, based on the error context, whether to retry and what alternatives to try. Developers don't need to implement separate retry logic, backoff strategies, or state checkpoints.
Practical Application
Example 1: Long-Running Background Tasks
A pattern for running data analysis tasks that take tens of minutes to hours in the background without a user connection. This is a method cited in Anthropic's official use cases.
import anthropic
client = anthropic.Anthropic()
# Step 1: Define the agent (once only. Reusable, versioned)
agent = client.beta.agents.create(
model="claude-opus-4-5",
name="data-analyst",
system="You are a data analysis agent. When given sales data, produce a structured report.",
tools=[
{"type": "computer_use_20250124"},
{"type": "bash_20250124"},
],
betas=["managed-agents-2026-04-01"],
)
# Step 2: Start a session — Anthropic handles all state management from here on
session = client.beta.sessions.create(
agent_id=agent.id,
betas=["managed-agents-2026-04-01"],
)
# Step 3: Send the task
client.beta.sessions.send_event(
session_id=session.id,
event={"type": "user", "content": "Analyze Q1 sales data and produce a summary report."},
)
# Step 4: Receive streaming results
# Even if the connection drops due to a network error, the harness maintains the session
# On reconnection, just call this again and the full event log is restored
for chunk in client.beta.sessions.stream(session_id=session.id):
print(chunk)| Code Line | Role |
|---|---|
agents.create() |
Registers agent configuration with Anthropic. Includes model, tools, and system prompt |
sessions.create() |
Starts a long-running session. State is maintained by Anthropic in external storage |
sessions.send_event() |
Injects a user task into the session |
sessions.stream() |
Streams results from the running session. Full log is restored even on reconnection |
All requests require betas=["managed-agents-2026-04-01"]. Leave it out and the request gets routed to the regular Messages API, causing genuinely baffling errors—the error messages won't match either, so you'll spend a long time finding the root cause.
Example 2: Enterprise Internal Network Integration (Self-Hosted Sandboxes)
This pattern assumes financial or healthcare environments with regulations against data leaving the premises. For general web service development, the pattern in Example 1 is sufficient.
A structure that keeps only the execution environment (hands) in-house while the inference logic (brain) stays at Anthropic.
[Anthropic Infrastructure]
Claude inference + agent loop + session state management
↕
MCP proxy (encrypted tunnel)
↕
[Customer Infrastructure]
Sandbox containers / internal DB / file system
Credentials stored in Vault — never exposed to Claude-generated codeMCP Tunnels: Only an outbound connection needs to be opened from the customer infrastructure side. This means agents can communicate with internal MCP servers without changing firewall inbound rules. Supports end-to-end encryption and is currently in research preview.
To enable Self-Hosted Sandboxes, specify your own infrastructure as the execution environment when creating the agent.
# Example Self-Hosted Sandbox connection configuration
agent = client.beta.agents.create(
model="claude-opus-4-5",
name="internal-analyst",
system="You are a data analysis agent with access to internal systems.",
tools=[{"type": "bash_20250124"}],
sandbox={
"type": "self_hosted",
"endpoint": "https://your-mcp-proxy.internal/v1", # Internal MCP proxy endpoint
"auth": {"type": "bearer", "token_env": "MCP_TOKEN"},
},
betas=["managed-agents-2026-04-01"],
)The security-critical point of this structure is that credentials are only accessed through the MCP proxy and are never exposed to the code Claude generates.
Example 3: Separating User-Facing Real-Time Responses from Background Tasks
The most commonly used hybrid pattern in practice. It handles latency-sensitive user-facing interactions and heavy async tasks through separate paths.
# User-facing real-time responses — use raw Messages API
# Chat interfaces where fast response speed matters: no session initialization overhead
def handle_user_chat(user_message: str):
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": user_message}],
)
return response.content
# Long-running async tasks — offload to Managed Agents
# Report generation, data pipelines, code generation, etc.
def dispatch_background_task(task_description: str, agent_id: str) -> str:
session = client.beta.sessions.create(
agent_id=agent_id,
betas=["managed-agents-2026-04-01"],
)
client.beta.sessions.send_event(
session_id=session.id,
event={"type": "user", "content": task_description},
)
# Save session.id and retrieve results later like this:
# client.beta.sessions.stream(session_id=session_id)
return session.id
# Actual usage flow:
# 1. User chat → handle_user_chat() → immediate response
# 2. "Generate a report" → dispatch_background_task() → returns session_id
# 3. On completion, receive result via webhook or polling → notify userWhen these two functions coexist within a single product, the key is which path makes the user wait. Use the Messages API for anything requiring an immediate response; use Managed Agents for heavy work that can wait—keep just that rule in mind.
Pros and Cons Analysis
Advantages
| Item | Details |
|---|---|
| Development speed | No need to implement sandboxes, session state, error recovery, or credential management yourself. Time from prototype to production shrinks from weeks to days |
| Built-in optimizations | Prompt caching and context compression are built into the harness, automatically reducing token costs. Per Anthropic's official announcement (not independently verified), they claim p50 TTFT reduced by 60% and p95 reduced by over 90% |
| Resilience | Automatically resumes based on session logs even after container crashes or network disconnections |
| Security isolation | Credentials are not exposed to the sandbox by design. MCP proxy-based isolation is provided by default |
| Performance improvement | Benchmark results show up to 10 points higher success rates for structured file generation tasks compared to standard prompt loops |
Disadvantages and Caveats
| Item | Details | Mitigation |
|---|---|---|
| Vendor lock-in | Claude-only. You cannot use other models like GPT-4o, Gemini, or DeepSeek in the same pipeline | If multi-vendor flexibility is needed, design with a LangGraph abstraction layer on top to reduce the cost of swapping models |
| Cost opacity | A hybrid pricing model of token fees + session runtime fees. Prices may change since it's a beta period—check the official pricing page for the latest information | Set an upper limit on session time and explicitly terminate sessions per task unit |
| Beta instability | Beta headers are required on all endpoints. Anthropic may change how the harness behaves | Subscribe to Anthropic release notes and plan ahead for breaking changes |
| Feature immaturity | Multi-agent coordination and self-evaluation are both in research preview. Separate access requests are required | Safer to start at pilot scale for now and avoid relying on these for production-critical use |
| Data sovereignty | Session data is stored in Anthropic-managed databases. Even with Self-Hosted Sandboxes, the agent loop itself remains at Anthropic | If fully on-premises is required, self-implementing the agent loop is currently the only option |
| Observability limitations | External monitoring tools like Braintrust and Langfuse may have incomplete instrumentation for Managed Agents sessions in some cases | Storing session event logs directly as audit logs makes debugging much easier later |
Observability: The degree to which the internal state of a system can be measured and tracked from the outside. In the agent context, this refers to monitoring per-session token usage, tool call counts, error rates, and similar metrics.
The Most Common Mistakes in Practice
Things I've actually witnessed on teams.
-
Applying Managed Agents to latency-sensitive user-facing responses. Session initialization overhead makes it unsuitable for real-time chat. Consider a hybrid structure that uses the raw Messages API or Agent SDK for user-facing interactions and only offloads to Managed Agents for background tasks.
-
Omitting the
betasheader. Sending a request withoutbetas=["managed-agents-2026-04-01"]routes it to the regular Messages API, causing strange errors. The error messages won't match either, so you end up wasting time finding the cause. -
Running long-running tasks without a cost ceiling on sessions. In a structure where session runtime fees apply, costs accumulate quickly if sessions run longer than expected. It's good practice to explicitly terminate sessions per task unit or set a timeout.
Closing Thoughts
Managed Agents can be a practical alternative for teams that lack the capacity or time to maintain their own agent infrastructure—but it's important to understand the tradeoffs of Claude lock-in and data sovereignty limitations before choosing it.
The selection criteria are relatively clear. If your tasks are long-running background jobs lasting minutes to hours, you want to spend your time on feature development rather than infrastructure maintenance, and your tools are already exposed via the public internet or MCP servers, then Managed Agents is a good choice. Conversely, if you need to combine multiple LLMs, require fully on-premises deployment, or need latency-sensitive real-time responses, a self-built agent loop is the right fit.
Looking at the bigger picture, services like Managed Agents are part of an industry-wide trend toward "increasingly delegating agent infrastructure to the cloud"—much like the shift from managing servers to serverless. Thinking ahead now about how today's choices will affect technical debt two years from now—especially around vendor lock-in—will significantly reduce your switching costs later.
Three steps you can take right now:
-
Get an Anthropic API key and install the SDK with
pip install anthropicornpm install @anthropic-ai/sdk, then apply for beta access at the Managed Agents official documentation. Since documentation URLs and feature configurations may change during the beta period, be sure to check the official channels for the latest guidance. -
Copy the code from Example 1 above and run a simple analysis task to see firsthand how a session is created and what structure the event log comes in. If you've never built an agent before, this is the fastest onboarding path.
-
If your team is currently implementing an agent loop from scratch, measure how much code is being consumed by session state storage and error recovery. If that volume exceeds your business logic, it's a good time to evaluate migrating to Managed Agents. For projects you haven't started yet, it's also a valid approach to prototype with Managed Agents first and switch to a self-built implementation when you start feeling the constraints.
References
- Claude Managed Agents overview | Claude API Docs
- Start a session | Claude API Docs
- Self-hosted sandboxes | Claude API Docs
- Scaling Managed Agents: Decoupling the brain from the hands | Anthropic Engineering
- New in Claude Managed Agents: self-hosted sandboxes and MCP tunnels | Claude Blog
- Claude Managed Agents: get to production 10x faster | Claude Blog
- Anthropic Introduces Managed Agents to Simplify AI Agent Deployment | InfoQ
- Anthropic's Claude Managed Agents gives enterprises a new one-stop shop but raises vendor 'lock-in' risk | VentureBeat
- Claude Managed Agents vs Claude Agent SDK | WaveSpeed Blog
- Deep Dive: How Anthropic's Claude Managed Agents Solve the AI Scaffolding Nightmare | Medium
- Anthropic Agent Lock-In: 9 Critical Enterprise Risks | Progressive Robot
- GitHub - anthropics/claude-agent-sdk-python