How to Delegate Agent Session State and Error Recovery to Anthropic with Claude Managed Agents — Direct Implementation of /v1/agents·/v1/sessions vs. the Boundaries of Hosted Infrastructure

When you try to attach AI in the middle of a microservice, you quickly realize something: the surrounding code—how to recover when a session drops midway, how to implement retries when tool execution fails, where to run the sandbox environment—is far more complex than the model API calls themselves. I once lost three full days to a checkpoint logic bug while building my own agent loop. You've probably experienced it too: the business logic is all done, but the release keeps slipping because of agent infrastructure.

Claude Managed Agents, released by Anthropic as a public beta in April 2026, proposes a structure that lets you delegate this entire infrastructure burden. You define an agent with POST /v1/agents and start a long-running session with POST /v1/sessions—then Anthropic's infrastructure takes over session state management, automated error recovery, and sandbox operations. This article examines how that delegation actually works in practice, along with code, and where the line is between what you hand off and what you keep for yourself.

This is less about a "convenient wrapper API" and more about an architectural decision. Make the wrong call and you may end up with vendor lock-in or data sovereignty issues down the road. Let's look at the pros and cons honestly.

Core Concepts

What Actually Happens When You Implement an Agent Loop Yourself

If you implement an agent on your own, the flow looks like this:

Developer code
  → Model API call
  → Parse tool_use from response
  → Execute tool
  → Retry logic on failure (backoff, retry count, exception branching)
  → Save state to DB or Redis
  → Next loop iteration

To make this robust, you need to implement checkpoints, distributed locks, container crash handling, and timeout management. Honestly, this infrastructure often ends up more complex than the business logic itself. Managed Agents moves this entire loop to Anthropic's side.

Developer code
  → POST /v1/sessions (send task event)
  → Receive result stream
         ↕
  Anthropic infrastructure (agent loop + error recovery + state management)

The Brain-Hands Split: What Can You Delegate and What Can You Keep?

This is the core design principle of the system, published by the Anthropic Engineering blog. It's the most important concept for understanding the scope of delegation.

Role	Responsibility	Execution Location
Brain	Claude model inference, prompt caching, context compression, agent loop	Always Anthropic infrastructure
Hands	Shell command execution, file read/write, the sandbox where code runs	Default: Anthropic-managed cloud / Option: your own infrastructure (Self-Hosted Sandboxes)

Self-Hosted Sandboxes is the feature that lets you move only the "hands" to your own infrastructure. You keep the model inference logic (brain) at Anthropic, but bring only the environment where code actually runs (hands) onto your internal servers. This is especially useful in environments like finance or healthcare where data cannot leave the premises.

How Session State Delegation Actually Works

Core principle: Session event logs are separated into external persistent storage, so the execution container (hands) is treated as stateless. Even if a container crashes, the harness takes over on another container based on the session log, so no work is lost.

From the developer's perspective, upon reconnection you just call client.beta.sessions.stream() again and you receive the full event log up to that point. You don't need to manually track how far the internal checkpointing has progressed.

harness: The execution environment component in Anthropic's infrastructure that actually runs and manages the agent loop. It is responsible for orchestrating tool calls, relaying errors, and maintaining the session log. When a network error or container crash occurs, the harness automatically picks up on another container and resumes the session.

How Error Recovery Is Delegated

When tool execution fails, the harness catches that failure and delivers it to Claude as a tool call response. Claude then decides on its own, based on the error context, whether to retry and what alternatives to try. Developers don't need to implement separate retry logic, backoff strategies, or state checkpoints.

Practical Application

Example 1: Long-Running Background Tasks

A pattern for running data analysis tasks that take tens of minutes to hours in the background without a user connection. This is a method cited in Anthropic's official use cases.

python

import anthropic
 
client = anthropic.Anthropic()
 
# Step 1: Define the agent (once only. Reusable, versioned)
agent = client.beta.agents.create(
    model="claude-opus-4-5",
    name="data-analyst",
    system="You are a data analysis agent. When given sales data, produce a structured report.",
    tools=[
        {"type": "computer_use_20250124"},
        {"type": "bash_20250124"},
    ],
    betas=["managed-agents-2026-04-01"],
)
 
# Step 2: Start a session — Anthropic handles all state management from here on
session = client.beta.sessions.create(
    agent_id=agent.id,
    betas=["managed-agents-2026-04-01"],
)
 
# Step 3: Send the task
client.beta.sessions.send_event(
    session_id=session.id,
    event={"type": "user", "content": "Analyze Q1 sales data and produce a summary report."},
)
 
# Step 4: Receive streaming results
# Even if the connection drops due to a network error, the harness maintains the session
# On reconnection, just call this again and the full event log is restored
for chunk in client.beta.sessions.stream(session_id=session.id):
    print(chunk)

Code Line	Role
`agents.create()`	Registers agent configuration with Anthropic. Includes model, tools, and system prompt
`sessions.create()`	Starts a long-running session. State is maintained by Anthropic in external storage
`sessions.send_event()`	Injects a user task into the session
`sessions.stream()`	Streams results from the running session. Full log is restored even on reconnection

All requests require betas=["managed-agents-2026-04-01"]. Leave it out and the request gets routed to the regular Messages API, causing genuinely baffling errors—the error messages won't match either, so you'll spend a long time finding the root cause.

Example 2: Enterprise Internal Network Integration (Self-Hosted Sandboxes)

This pattern assumes financial or healthcare environments with regulations against data leaving the premises. For general web service development, the pattern in Example 1 is sufficient.

A structure that keeps only the execution environment (hands) in-house while the inference logic (brain) stays at Anthropic.

[Anthropic Infrastructure]
  Claude inference + agent loop + session state management
          ↕
    MCP proxy (encrypted tunnel)
          ↕
[Customer Infrastructure]
  Sandbox containers / internal DB / file system
  Credentials stored in Vault — never exposed to Claude-generated code

MCP Tunnels: Only an outbound connection needs to be opened from the customer infrastructure side. This means agents can communicate with internal MCP servers without changing firewall inbound rules. Supports end-to-end encryption and is currently in research preview.

To enable Self-Hosted Sandboxes, specify your own infrastructure as the execution environment when creating the agent.

python

# Example Self-Hosted Sandbox connection configuration
agent = client.beta.agents.create(
    model="claude-opus-4-5",
    name="internal-analyst",
    system="You are a data analysis agent with access to internal systems.",
    tools=[{"type": "bash_20250124"}],
    sandbox={
        "type": "self_hosted",
        "endpoint": "https://your-mcp-proxy.internal/v1",  # Internal MCP proxy endpoint
        "auth": {"type": "bearer", "token_env": "MCP_TOKEN"},
    },
    betas=["managed-agents-2026-04-01"],
)

The security-critical point of this structure is that credentials are only accessed through the MCP proxy and are never exposed to the code Claude generates.

Example 3: Separating User-Facing Real-Time Responses from Background Tasks

The most commonly used hybrid pattern in practice. It handles latency-sensitive user-facing interactions and heavy async tasks through separate paths.

python

# User-facing real-time responses — use raw Messages API
# Chat interfaces where fast response speed matters: no session initialization overhead
def handle_user_chat(user_message: str):
    response = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=1024,
        messages=[{"role": "user", "content": user_message}],
    )
    return response.content
 
# Long-running async tasks — offload to Managed Agents
# Report generation, data pipelines, code generation, etc.
def dispatch_background_task(task_description: str, agent_id: str) -> str:
    session = client.beta.sessions.create(
        agent_id=agent_id,
        betas=["managed-agents-2026-04-01"],
    )
    client.beta.sessions.send_event(
        session_id=session.id,
        event={"type": "user", "content": task_description},
    )
    # Save session.id and retrieve results later like this:
    # client.beta.sessions.stream(session_id=session_id)
    return session.id
 
# Actual usage flow:
#   1. User chat → handle_user_chat() → immediate response
#   2. "Generate a report" → dispatch_background_task() → returns session_id
#   3. On completion, receive result via webhook or polling → notify user

When these two functions coexist within a single product, the key is which path makes the user wait. Use the Messages API for anything requiring an immediate response; use Managed Agents for heavy work that can wait—keep just that rule in mind.

Pros and Cons Analysis

Advantages

Item	Details
Development speed	No need to implement sandboxes, session state, error recovery, or credential management yourself. Time from prototype to production shrinks from weeks to days
Built-in optimizations	Prompt caching and context compression are built into the harness, automatically reducing token costs. Per Anthropic's official announcement (not independently verified), they claim p50 TTFT reduced by 60% and p95 reduced by over 90%
Resilience	Automatically resumes based on session logs even after container crashes or network disconnections
Security isolation	Credentials are not exposed to the sandbox by design. MCP proxy-based isolation is provided by default
Performance improvement	Benchmark results show up to 10 points higher success rates for structured file generation tasks compared to standard prompt loops

Disadvantages and Caveats

Item	Details	Mitigation
Vendor lock-in	Claude-only. You cannot use other models like GPT-4o, Gemini, or DeepSeek in the same pipeline	If multi-vendor flexibility is needed, design with a LangGraph abstraction layer on top to reduce the cost of swapping models
Cost opacity	A hybrid pricing model of token fees + session runtime fees. Prices may change since it's a beta period—check the official pricing page for the latest information	Set an upper limit on session time and explicitly terminate sessions per task unit
Beta instability	Beta headers are required on all endpoints. Anthropic may change how the harness behaves	Subscribe to Anthropic release notes and plan ahead for breaking changes
Feature immaturity	Multi-agent coordination and self-evaluation are both in research preview. Separate access requests are required	Safer to start at pilot scale for now and avoid relying on these for production-critical use
Data sovereignty	Session data is stored in Anthropic-managed databases. Even with Self-Hosted Sandboxes, the agent loop itself remains at Anthropic	If fully on-premises is required, self-implementing the agent loop is currently the only option
Observability limitations	External monitoring tools like Braintrust and Langfuse may have incomplete instrumentation for Managed Agents sessions in some cases	Storing session event logs directly as audit logs makes debugging much easier later

Observability: The degree to which the internal state of a system can be measured and tracked from the outside. In the agent context, this refers to monitoring per-session token usage, tool call counts, error rates, and similar metrics.

The Most Common Mistakes in Practice

Things I've actually witnessed on teams.

Applying Managed Agents to latency-sensitive user-facing responses. Session initialization overhead makes it unsuitable for real-time chat. Consider a hybrid structure that uses the raw Messages API or Agent SDK for user-facing interactions and only offloads to Managed Agents for background tasks.
Omitting the betas header. Sending a request without betas=["managed-agents-2026-04-01"] routes it to the regular Messages API, causing strange errors. The error messages won't match either, so you end up wasting time finding the cause.
Running long-running tasks without a cost ceiling on sessions. In a structure where session runtime fees apply, costs accumulate quickly if sessions run longer than expected. It's good practice to explicitly terminate sessions per task unit or set a timeout.

Closing Thoughts

Managed Agents can be a practical alternative for teams that lack the capacity or time to maintain their own agent infrastructure—but it's important to understand the tradeoffs of Claude lock-in and data sovereignty limitations before choosing it.

The selection criteria are relatively clear. If your tasks are long-running background jobs lasting minutes to hours, you want to spend your time on feature development rather than infrastructure maintenance, and your tools are already exposed via the public internet or MCP servers, then Managed Agents is a good choice. Conversely, if you need to combine multiple LLMs, require fully on-premises deployment, or need latency-sensitive real-time responses, a self-built agent loop is the right fit.

Looking at the bigger picture, services like Managed Agents are part of an industry-wide trend toward "increasingly delegating agent infrastructure to the cloud"—much like the shift from managing servers to serverless. Thinking ahead now about how today's choices will affect technical debt two years from now—especially around vendor lock-in—will significantly reduce your switching costs later.

Three steps you can take right now:

Get an Anthropic API key and install the SDK with pip install anthropic or npm install @anthropic-ai/sdk, then apply for beta access at the Managed Agents official documentation. Since documentation URLs and feature configurations may change during the beta period, be sure to check the official channels for the latest guidance.
Copy the code from Example 1 above and run a simple analysis task to see firsthand how a session is created and what structure the event log comes in. If you've never built an agent before, this is the fastest onboarding path.
If your team is currently implementing an agent loop from scratch, measure how much code is being consumed by session state storage and error recovery. If that volume exceeds your business logic, it's a good time to evaluate migrating to Managed Agents. For projects you haven't started yet, it's also a valid approach to prototype with Managed Agents first and switch to a self-built implementation when you start feeling the constraints.

References

#ClaudeManagedAgents#AI에이전트#세션관리#MCP#Python#오류복구#Self-HostedSandbox#벤더종속#에이전트루프#Anthropic

Claude

How to Delegate Agent Session State and Error Recovery to Anthropic with Claude Managed Agents — Direct Implementation of /v1/agents·/v1/sessions vs. the Boundaries of Hosted Infrastructure

Core Concepts

What Actually Happens When You Implement an Agent Loop Yourself

If you implement an agent on your own, the flow looks like this:

Developer code
  → Model API call
  → Parse tool_use from response
  → Execute tool
  → Retry logic on failure (backoff, retry count, exception branching)
  → Save state to DB or Redis
  → Next loop iteration

Developer code
  → POST /v1/sessions (send task event)
  → Receive result stream
         ↕
  Anthropic infrastructure (agent loop + error recovery + state management)

The Brain-Hands Split: What Can You Delegate and What Can You Keep?

This is the core design principle of the system, published by the Anthropic Engineering blog. It's the most important concept for understanding the scope of delegation.

Role	Responsibility	Execution Location
Brain	Claude model inference, prompt caching, context compression, agent loop	Always Anthropic infrastructure
Hands	Shell command execution, file read/write, the sandbox where code runs	Default: Anthropic-managed cloud / Option: your own infrastructure (Self-Hosted Sandboxes)

How Session State Delegation Actually Works

Core principle: Session event logs are separated into external persistent storage, so the execution container (hands) is treated as stateless. Even if a container crashes, the harness takes over on another container based on the session log, so no work is lost.

harness: The execution environment component in Anthropic's infrastructure that actually runs and manages the agent loop. It is responsible for orchestrating tool calls, relaying errors, and maintaining the session log. When a network error or container crash occurs, the harness automatically picks up on another container and resumes the session.

How Error Recovery Is Delegated

Practical Application

Example 1: Long-Running Background Tasks

A pattern for running data analysis tasks that take tens of minutes to hours in the background without a user connection. This is a method cited in Anthropic's official use cases.

python

import anthropic
 
client = anthropic.Anthropic()
 
# Step 1: Define the agent (once only. Reusable, versioned)
agent = client.beta.agents.create(
    model="claude-opus-4-5",
    name="data-analyst",
    system="You are a data analysis agent. When given sales data, produce a structured report.",
    tools=[
        {"type": "computer_use_20250124"},
        {"type": "bash_20250124"},
    ],
    betas=["managed-agents-2026-04-01"],
)
 
# Step 2: Start a session — Anthropic handles all state management from here on
session = client.beta.sessions.create(
    agent_id=agent.id,
    betas=["managed-agents-2026-04-01"],
)
 
# Step 3: Send the task
client.beta.sessions.send_event(
    session_id=session.id,
    event={"type": "user", "content": "Analyze Q1 sales data and produce a summary report."},
)
 
# Step 4: Receive streaming results
# Even if the connection drops due to a network error, the harness maintains the session
# On reconnection, just call this again and the full event log is restored
for chunk in client.beta.sessions.stream(session_id=session.id):
    print(chunk)

Code Line	Role
`agents.create()`	Registers agent configuration with Anthropic. Includes model, tools, and system prompt
`sessions.create()`	Starts a long-running session. State is maintained by Anthropic in external storage
`sessions.send_event()`	Injects a user task into the session
`sessions.stream()`	Streams results from the running session. Full log is restored even on reconnection

Example 2: Enterprise Internal Network Integration (Self-Hosted Sandboxes)

This pattern assumes financial or healthcare environments with regulations against data leaving the premises. For general web service development, the pattern in Example 1 is sufficient.

A structure that keeps only the execution environment (hands) in-house while the inference logic (brain) stays at Anthropic.

[Anthropic Infrastructure]
  Claude inference + agent loop + session state management
          ↕
    MCP proxy (encrypted tunnel)
          ↕
[Customer Infrastructure]
  Sandbox containers / internal DB / file system
  Credentials stored in Vault — never exposed to Claude-generated code

MCP Tunnels: Only an outbound connection needs to be opened from the customer infrastructure side. This means agents can communicate with internal MCP servers without changing firewall inbound rules. Supports end-to-end encryption and is currently in research preview.

To enable Self-Hosted Sandboxes, specify your own infrastructure as the execution environment when creating the agent.

python

# Example Self-Hosted Sandbox connection configuration
agent = client.beta.agents.create(
    model="claude-opus-4-5",
    name="internal-analyst",
    system="You are a data analysis agent with access to internal systems.",
    tools=[{"type": "bash_20250124"}],
    sandbox={
        "type": "self_hosted",
        "endpoint": "https://your-mcp-proxy.internal/v1",  # Internal MCP proxy endpoint
        "auth": {"type": "bearer", "token_env": "MCP_TOKEN"},
    },
    betas=["managed-agents-2026-04-01"],
)

The security-critical point of this structure is that credentials are only accessed through the MCP proxy and are never exposed to the code Claude generates.

Example 3: Separating User-Facing Real-Time Responses from Background Tasks

The most commonly used hybrid pattern in practice. It handles latency-sensitive user-facing interactions and heavy async tasks through separate paths.

python

# User-facing real-time responses — use raw Messages API
# Chat interfaces where fast response speed matters: no session initialization overhead
def handle_user_chat(user_message: str):
    response = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=1024,
        messages=[{"role": "user", "content": user_message}],
    )
    return response.content
 
# Long-running async tasks — offload to Managed Agents
# Report generation, data pipelines, code generation, etc.
def dispatch_background_task(task_description: str, agent_id: str) -> str:
    session = client.beta.sessions.create(
        agent_id=agent_id,
        betas=["managed-agents-2026-04-01"],
    )
    client.beta.sessions.send_event(
        session_id=session.id,
        event={"type": "user", "content": task_description},
    )
    # Save session.id and retrieve results later like this:
    # client.beta.sessions.stream(session_id=session_id)
    return session.id
 
# Actual usage flow:
#   1. User chat → handle_user_chat() → immediate response
#   2. "Generate a report" → dispatch_background_task() → returns session_id
#   3. On completion, receive result via webhook or polling → notify user

Pros and Cons Analysis

Advantages

Item	Details
Development speed	No need to implement sandboxes, session state, error recovery, or credential management yourself. Time from prototype to production shrinks from weeks to days
Built-in optimizations	Prompt caching and context compression are built into the harness, automatically reducing token costs. Per Anthropic's official announcement (not independently verified), they claim p50 TTFT reduced by 60% and p95 reduced by over 90%
Resilience	Automatically resumes based on session logs even after container crashes or network disconnections
Security isolation	Credentials are not exposed to the sandbox by design. MCP proxy-based isolation is provided by default
Performance improvement	Benchmark results show up to 10 points higher success rates for structured file generation tasks compared to standard prompt loops

Disadvantages and Caveats

Item	Details	Mitigation
Vendor lock-in	Claude-only. You cannot use other models like GPT-4o, Gemini, or DeepSeek in the same pipeline	If multi-vendor flexibility is needed, design with a LangGraph abstraction layer on top to reduce the cost of swapping models
Cost opacity	A hybrid pricing model of token fees + session runtime fees. Prices may change since it's a beta period—check the official pricing page for the latest information	Set an upper limit on session time and explicitly terminate sessions per task unit
Beta instability	Beta headers are required on all endpoints. Anthropic may change how the harness behaves	Subscribe to Anthropic release notes and plan ahead for breaking changes
Feature immaturity	Multi-agent coordination and self-evaluation are both in research preview. Separate access requests are required	Safer to start at pilot scale for now and avoid relying on these for production-critical use
Data sovereignty	Session data is stored in Anthropic-managed databases. Even with Self-Hosted Sandboxes, the agent loop itself remains at Anthropic	If fully on-premises is required, self-implementing the agent loop is currently the only option
Observability limitations	External monitoring tools like Braintrust and Langfuse may have incomplete instrumentation for Managed Agents sessions in some cases	Storing session event logs directly as audit logs makes debugging much easier later

Observability: The degree to which the internal state of a system can be measured and tracked from the outside. In the agent context, this refers to monitoring per-session token usage, tool call counts, error rates, and similar metrics.

The Most Common Mistakes in Practice

Things I've actually witnessed on teams.

Applying Managed Agents to latency-sensitive user-facing responses. Session initialization overhead makes it unsuitable for real-time chat. Consider a hybrid structure that uses the raw Messages API or Agent SDK for user-facing interactions and only offloads to Managed Agents for background tasks.
Omitting the betas header. Sending a request without betas=["managed-agents-2026-04-01"] routes it to the regular Messages API, causing strange errors. The error messages won't match either, so you end up wasting time finding the cause.
Running long-running tasks without a cost ceiling on sessions. In a structure where session runtime fees apply, costs accumulate quickly if sessions run longer than expected. It's good practice to explicitly terminate sessions per task unit or set a timeout.

Closing Thoughts

Three steps you can take right now:

Get an Anthropic API key and install the SDK with pip install anthropic or npm install @anthropic-ai/sdk, then apply for beta access at the Managed Agents official documentation. Since documentation URLs and feature configurations may change during the beta period, be sure to check the official channels for the latest guidance.
Copy the code from Example 1 above and run a simple analysis task to see firsthand how a session is created and what structure the event log comes in. If you've never built an agent before, this is the fastest onboarding path.
If your team is currently implementing an agent loop from scratch, measure how much code is being consumed by session state storage and error recovery. If that volume exceeds your business logic, it's a good time to evaluate migrating to Managed Agents. For projects you haven't started yet, it's also a valid approach to prototype with Managed Agents first and switch to a self-built implementation when you start feeling the constraints.

References

#ClaudeManagedAgents#AI에이전트#세션관리#MCP#Python#오류복구#Self-HostedSandbox#벤더종속#에이전트루프#Anthropic

Core Concepts

What Actually Happens When You Implement an Agent Loop Yourself

The Brain-Hands Split: What Can You Delegate and What Can You Keep?

How Session State Delegation Actually Works

How Error Recovery Is Delegated

Practical Application

Example 1: Long-Running Background Tasks

Example 2: Enterprise Internal Network Integration (Self-Hosted Sandboxes)

Example 3: Separating User-Facing Real-Time Responses from Background Tasks

Pros and Cons Analysis

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Closing Thoughts

References

Core Concepts

What Actually Happens When You Implement an Agent Loop Yourself

The Brain-Hands Split: What Can You Delegate and What Can You Keep?

How Session State Delegation Actually Works

How Error Recovery Is Delegated

Practical Application

Example 1: Long-Running Background Tasks

Example 2: Enterprise Internal Network Integration (Self-Hosted Sandboxes)

Example 3: Separating User-Facing Real-Time Responses from Background Tasks

Pros and Cons Analysis

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Closing Thoughts

References

Recommended Posts

Multi-Agent Pipeline Design — State Sharing and Error Propagation Between Claude Agent SDK Orchestrators and Subagents

Claude Opus 4.8 Dynamic Workflows and Effort Control — A Structure for Automating Codebase Migration with Parallel Agents

Claude Code Hooks — Controlling Agent Tool Execution in Code with PreToolUse·PostToolUse

Claude Code /goal & Session Management: How to Continue Multi-Day Tasks with AI Without Losing Your Place

How to Declaratively Separate Team-Based AI Tool Access Permissions Using Claude Code MCP and `.claude/rules/`

How to Modularize Team-Specific AI Rules with `Claude Code .claude/rules/` — A Separation Strategy for Frontend, Backend, and Security Teams