Comparing Long-Term Memory for AI Agents: Mem0 vs Letta vs Zep — Three Philosophies and How to Choose
If you've ever built an LLM-based app, you've hit this wall. "How do I make it remember past conversations?" You might think you can just shove the entire conversation into the context window, but reality doesn't cooperate. Token costs explode, the LLM's attention drifts as conversations grow longer, and when a session ends, everything disappears. Long-term memory systems for agents emerged to solve this problem, and between 2025 and 2026 this space matured rapidly, with the options crystallizing into three distinct paths.
Mem0, Letta, and Zep — these three represent completely different approaches in terms of GitHub stars, community adoption, and most importantly, architectural philosophy. A memory layer you bolt onto your existing stack with minimal changes, an agent platform that manages memory autonomously like an OS, and a temporal knowledge graph that records how facts change over time — I want to walk through how these three philosophies actually shape technology choices, drawing on hands-on experience. By the end of this article, you'll be able to narrow down the right memory system for your service's character in under 30 minutes. We'll cover how each system works, runnable code examples, and common beginner mistakes — in that order.
Core Concepts
Why External Memory Is Needed
An LLM context window is like RAM. Turn off the power and it's gone, and capacity has limits. Even if GPT-4o supports 128k tokens, storing the full history of dozens of sessions makes costs unsustainable. Long-term memory systems solve this problem with external storage. They selectively store only important facts, and when a new conversation starts, they retrieve only the relevant memories and inject them into the context.
Long-Term Memory: An infrastructure layer that enables AI agents to continuously remember and update user information, preferences, and context across sessions and time. It supplements the LLM's context window limitations with external storage, and since the memory system itself involves LLM calls, it's worth keeping in mind upfront that additional latency and cost will be incurred.
Mem0 — A Memory Layer You Bolt Onto Your Existing Stack
Mem0's approach is "minimal invasion." It doesn't change the agent framework you're already using — LangGraph, CrewAI, AutoGen — it connects a memory layer as a bolt-on on top.
Bolt-on: An integration pattern where functionality is added externally without modifying the existing system. Like a plugin, it can be attached and detached, which makes it favorable for gradual adoption.
Internally, it uses a mixture of three storage types: vector, graph, and key-value. Roughly speaking, facts where natural language semantics matter (preferences, emotions, descriptions) go into the vector store; relationships between people, organizations, and objects go into the graph; and frequently referenced key-based data goes into key-value storage. When extracting facts from conversations, it uses an LLM to deduplicate, and when conflicts arise, it overwrites with the latest information.
An April 2026 update introduced a single-pass layered extraction algorithm. Where the previous approach extracted and classified facts across multiple stages, this algorithm handles extraction and classification simultaneously in a single LLM call. Benchmark results showed improvements of +29.6pp in temporal query accuracy and +23.1pp in multi-hop reasoning. Numbers like 48,000+ GitHub stars and a $24M Series A in 2024 speak to the community response. Honestly, when I first saw those numbers I thought "can this actually be real?" — but after trying it myself, the difference was tangible for a simple customer support scenario.
Letta — An Agent Platform That Manages Memory Like an OS
Letta (formerly MemGPT) originated from the MemGPT paper out of UC Berkeley. The core idea is to layer memory the way a computer operating system does.
- Core Memory: Essential information always loaded in context, like RAM (name, persona, key facts)
- Recall Memory: A disk cache that stores recent conversation history in searchable form
- Archival Memory: Cold storage for vast long-term knowledge
The most distinctive aspect is that the agent itself moves and edits data between these layers via function calls. The agent manages memory autonomously without human intervention. I personally lost half a day getting the initial memory function design right — if you don't first define the criteria for what information belongs in Core versus what should go to Archival, the agent starts storing things in the wrong layer. That design cost is paid upfront, but once it's done, the tradeoff is that you can then track behavior transparently. It's fully open source (MIT), and in January 2026, a Conversations API was released that supports shared memory across parallel agents.
Zep — A Knowledge Graph That Remembers How Things Change Over Time
Zep's core engine, Graphiti, takes a fundamentally different approach. It assigns a validity window to each fact. For example, if the fact "the user's address is Gangnam-gu, Seoul" later changes to "Haeundae-gu, Busan," the old record isn't deleted — it's invalidated. This invalidation isn't rule-based; the LLM interprets the meaning of the new episode and determines whether it conflicts with existing facts. This means you can accurately answer questions like "what was the address as of March 2025?"
Temporal Knowledge Graph: A knowledge graph that preserves the change history of facts along with timestamps. Instead of deletion, it uses invalidation to simultaneously track past and current states. It runs on top of Neo4j, and each fact node manages its validity period with
valid_atandinvalid_atproperties.
Multi-hop Reasoning: The ability to reason toward an answer that requires connecting multiple relationships — for example, "the project managed by the team of B, who is A's supervisor." Simple vector search doesn't handle these chained relationships well; a graph structure is needed for effective traversal.
Publishing the Graphiti engine as open source under Apache 2.0 led to rapid community adoption. It achieved 94.8% accuracy on the DMR Benchmark (which measures how accurately facts can be retrieved from conversations), and with SOC 2 Type 2, HIPAA, and GDPR certifications, it has a strong presence in enterprise environments. Neo4j setup may feel like a barrier at first, but using Graphiti on its own is actually much lighter to get started with than you'd expect.
Practical Application
Example 1: Adding Memory to a Customer Support Chatbot with Mem0
Adding Mem0 to an existing OpenAI chatbot is simpler than you'd think. I kept thinking "can it really be this easy?" — and yes, it can. One caveat though: if you don't explicitly handle the case where relevant_memories is empty (like a new user's first conversation), you end up with a blank line in the system prompt, which produces slightly awkward responses. The code below shows how to handle that too.
from mem0 import Memory
from openai import OpenAI
# If you don't have Qdrant, you can spin it up instantly with: docker run -p 6333:6333 qdrant/qdrant
config = {
"vector_store": {
"provider": "qdrant",
"config": {
"collection_name": "customer_support",
"host": "localhost",
"port": 6333,
}
},
"llm": {
"provider": "openai",
"config": {"model": "gpt-4o", "temperature": 0}
}
}
memory = Memory.from_config(config)
client = OpenAI()
def chat_with_memory(user_id: str, user_message: str) -> str:
relevant_memories = memory.search(user_message, user_id=user_id)
# Handle new user's first conversation — skip the memory section entirely if the list is empty
if relevant_memories:
memory_context = "\n".join([m["memory"] for m in relevant_memories])
memory_section = f"\nWhat I know about this user:\n{memory_context}"
else:
memory_section = ""
system_prompt = f"You are a friendly customer support agent.{memory_section}"
try:
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_message}
]
)
assistant_message = response.choices[0].message.content
# Automatically extract and store important facts from the conversation
# In production, it's recommended to add retry logic and failure alerts here
memory.add(
[
{"role": "user", "content": user_message},
{"role": "assistant", "content": assistant_message}
],
user_id=user_id
)
return assistant_message
except Exception as e:
# Better to handle memory storage failures separately so they don't block the conversation itself
raise
# Usage example
print(chat_with_memory("user_123", "I'm a premium plan user and I'm having a billing issue"))
# In the next session, the context "premium user who had a billing issue" is automatically injected
print(chat_with_memory("user_123", "Was the issue I mentioned earlier resolved?"))| Code Point | Description |
|---|---|
memory.search() |
Vector search scoped to the user ID, returning only relevant memories |
memory_section branch |
Prevents an empty context block from entering the prompt on a new user's first conversation |
memory.add() |
Pass a conversation pair and the LLM automatically extracts and stores important facts |
user_id parameter |
Memory isolation per user — prevents memories from different users from mixing |
Example 2: Setting Up a Long-Running Coding Agent with Letta
For a simple customer support chatbot, Mem0 is sufficient. You choose Letta when an agent needs to work autonomously over multiple days. The key is the autonomy: a coding agent that remembers the "authentication method change" it learned today and still recalls it a week later, deciding on its own which memory layer to store it in.
from letta import create_client
from letta.schemas.memory import ChatMemory
client = create_client()
# Set up initial information in Core Memory — the baseline always included in context
# The human and persona design is the foundation of agent behavior, so it's worth taking time to get this right initially
agent = client.create_agent(
name="coding-assistant",
memory=ChatMemory(
human="Developer, primarily uses Python/TypeScript, working on a FastAPI project",
persona="Experienced senior developer assistant. Helps with code review and debugging."
),
)
# The agent decides on its own whether to store in Core, Recall, or Archival
response = client.send_message(
agent_id=agent.id,
role="user",
message="Remember that our project's API authentication changed from JWT to OAuth2"
)
print(response.messages[-1].text)
# Memory persists across new sessions days later — the same agent_id keeps state permanently on the server
response2 = client.send_message(
agent_id=agent.id,
role="user",
message="What were the things to watch out for when writing auth-related code?"
)
print(response2.messages[-1].text)
# Responds with awareness that the switch to OAuth2 happened| Code Point | Description |
|---|---|
ChatMemory(human, persona) |
Core Memory initialization — baseline information always included in every context |
agent_id reuse |
Loads agent state stored on the server — maintains continuity even after session restarts |
| Autonomous agent judgment | Which layer to store in is decided by the agent through internal function calls — traceable in logs |
Example 3: Managing Temporal Facts with Graphiti (Zep's Engine)
Graphiti shines in domains like finance or CRM where you need to clearly distinguish "previous state" from "current state." Mem0 and Letta overwrite or update with the latest information, but Graphiti invalidates old facts while preserving them. The code below must run inside an async function, so I've included the asyncio.run() wrapper.
import asyncio
from graphiti_core import Graphiti
from graphiti_core.nodes import EpisodeType
from datetime import datetime
async def main():
# If you don't have Neo4j: docker run -p 7687:7687 -e NEO4J_AUTH=neo4j/password neo4j
graphiti = Graphiti(
neo4j_uri="bolt://localhost:7687",
neo4j_user="neo4j",
neo4j_password="password"
)
# Initialize indices and constraints — only needs to run once
await graphiti.build_indices_and_constraints()
# Add a fact — stored as an episode with temporal information
await graphiti.add_episode(
name="Customer address change",
episode_body="Customer Kim Cheolsoo's address changed from Gangnam-gu, Seoul to Haeundae-gu, Busan",
source=EpisodeType.text,
reference_time=datetime(2025, 6, 1), # The actual time the change occurred — separate from storage time
source_description="CRM system update"
)
# Temporal query — accurately responds whether asking for current address or address at a specific point in time
results = await graphiti.search("Kim Cheolsoo's address", num_results=5)
for edge in results:
print(f"Fact: {edge.fact}")
print(f"Valid from: {edge.valid_at}")
print(f"Valid until: {edge.invalid_at or 'still valid'}")
print("---")
# Sample output:
# Fact: Kim Cheolsoo's address is Gangnam-gu, Seoul
# Valid from: 2024-01-15
# Valid until: 2025-06-01
# ---
# Fact: Kim Cheolsoo's address is Haeundae-gu, Busan
# Valid from: 2025-06-01
# Valid until: still valid
await graphiti.close()
asyncio.run(main())| Code Point | Description |
|---|---|
async def main() + asyncio.run() |
The entire Graphiti API is async — you cannot call await directly at the top level |
reference_time |
The time the fact actually occurred — separated from storage time for accurate temporal tracking |
invalid_at |
The time the fact was invalidated — None means still currently valid |
add_episode() |
Adds a new episode without deletion; the LLM evaluates conflicts with existing facts and automatically invalidates them |
What to Choose and When
The matrix below is the fastest way to narrow down your choice. If even one row is "yes," explore that system first.
| Situation | Mem0 | Letta | Zep / Graphiti |
|---|---|---|---|
| Want to keep existing LangChain/CrewAI stack | ✅ | ❌ | △ |
| Need a fast prototype, want to plug it in today | ✅ | △ | △ |
| Agent works autonomously over multiple days | △ | ✅ | △ |
| Open source self-hosting is required | △ | ✅ | ✅ (Graphiti) |
| Temporal state changes (address changes, contract renewals, etc.) are core | ❌ | △ | ✅ |
| HIPAA / SOC 2 / GDPR certification required | △ | ❌ | ✅ |
| Multi-hop relationship reasoning needed frequently | △ | △ | ✅ |
| Want fine-grained code-level control over memory behavior | ❌ | ✅ | △ |
△ = Possible but limited or requires additional configuration
Pros and Cons
Strengths
| System | Core Strengths |
|---|---|
| Mem0 | Connects to existing frameworks without modification; p95 search latency 200ms, 91% token reduction; supports 20+ vector store backends; active community (GitHub 48k+) |
| Letta | Fully open source (MIT); achieves effectively infinite context through autonomous agent memory management; write-immediately-readable transactional consistency; memory behavior is traceable via logs |
| Zep | Best-in-class temporal queries with temporal fact management; SOC 2 / HIPAA / GDPR certified; hybrid search (semantic + keyword + graph) achieves 94.8% on DMR Benchmark; Graphiti engine is an independent Apache 2.0 open source project |
Weaknesses and Caveats
One trap that's easy to miss: Mem0's graph memory is exclusive to the cloud Pro plan ($249/month). Self-hosting gives you only vector search — relational multi-hop reasoning is unavailable. If multi-hop reasoning is a core requirement, directly integrating Graphiti from the start may be the better choice.
Letta has high memory function design complexity. The mistake our team made early on was ignoring Core Memory's capacity limits and trying to push everything into Core. Since Core is always loaded into context, it has size constraints, and if you don't nail that design upfront, the agent starts storing data in unexpected layers. For small-scale prototypes, it's recommended to start with Letta Cloud to get a feel for it, then migrate to self-hosting.
Honestly, Zep's token consumption can be higher than expected. Because the LLM evaluates fact conflicts every time, it's safer to run a cost test first for high-frequency conversation services. If cost is a concern, you can also use only the Graphiti engine independently instead of the full Zep platform.
| System | Weakness | Mitigation |
|---|---|---|
| Mem0 | Graph memory is cloud Pro only; self-hosting limited to pure vector search | Consider direct Graphiti integration if multi-hop reasoning is required |
| Letta | Self-hosting operational overhead; Core Memory capacity design required; initial learning curve | Start with Letta Cloud for small prototypes, then migrate |
| Zep | Token consumption can be high due to LLM-based fact evaluation; full platform is SaaS-centric | For cost-sensitive cases, use only the Graphiti engine independently |
The Most Common Mistakes in Practice
-
Trying to store every piece of conversation content unconditionally — our team made this mistake early on. Memory systems involve LLM calls, which means additional latency and cost. Designing criteria upfront for "what information to store and why" (importance thresholds, information type filters) is what saves money later.
-
Choosing based solely on benchmark scores — LOCOMO (conversational memory accuracy) or LongMemEval (comprehensive long-term memory reasoning, accepted at ICLR 2025) scores don't fully represent real-world service quality. Running your own evaluation with 10–20 scenarios tailored to your workload characteristics (proportion of temporal queries, multi-hop reasoning needs, session frequency) is important to do alongside benchmarks.
-
Deciding between self-hosting and managed SaaS based on cost alone — data sovereignty, compliance requirements (HIPAA, GDPR), and your operations team's capabilities all need to be factored in. In particular, it's worth confirming upfront that Mem0's graph memory is cloud-only, since the feature gap with self-hosting is significant.
Closing Thoughts
Personally, if I were starting a new project, I'd probably prototype quickly with Mem0 first, then switch to Graphiti or Letta at the point where I answer "yes" to either "are temporal state changes important?" or "is the agent working autonomously over multiple days?" Mem0 for fast integration, Letta for autonomous agent control, Zep for enterprise environments where you need to track changes over time — the right choice depends on the nature of your service and your team's operational capabilities.
Three steps you can take right now:
-
30-minute prototyping with Mem0 — Run
pip install mem0ai, then add just two lines —memory.add()andmemory.search()— to your existing chatbot code to quickly get a feel for long-term memory. The official documentation has a well-organized Getting Started example. -
Explore Letta or Graphiti — If "does the agent work autonomously over multiple days?" then try Letta; if "are temporal state changes (address changes, contract renewals, etc.) core?" then spin up Graphiti via Docker and run the example code above. If both answers are "no," digging deeper into Mem0 may be the more efficient path.
-
Build your own evaluation set — Beyond public benchmarks, write 10–20 real service scenarios yourself and measure retrieval accuracy and latency for each system. Letta also provides an open source evaluation framework, Letta Evals, for this purpose.
References
- Mem0 Open Source Overview | docs.mem0.ai — Getting Started and official API reference
- Mem0 GitHub | mem0ai/mem0 — Source code and issue tracker
- Mem0 Paper | arXiv:2504.19413 — Original single-pass layered extraction algorithm paper
- State of AI Agent Memory 2026 | Mem0 Blog — Benchmark numbers and market overview
- Letta Official Site | letta.com — Letta Cloud and Evals framework
- Letta GitHub | letta-ai/letta — Full MIT open source codebase
- Letta Memory Management Docs | docs.letta.com — In-depth explanation of Core/Recall/Archival layers
- Agent Memory: How to Build Agents that Learn and Remember | Letta Blog — Agent memory design architecture
- Zep Official Site | getzep.com — Enterprise plans and certification information
- Zep Paper | arXiv:2501.13956 — Original Temporal Knowledge Graph architecture paper
- Graphiti GitHub | getzep/graphiti — Apache 2.0 standalone open source engine
- Graphiti: Knowledge Graph Memory for an Agentic World | Neo4j Blog — Deep dive on Neo4j integration architecture
- Mem0 + AWS Reference Architecture | AWS Blog — Fully managed setup with ElastiCache + Neptune Analytics
- Mem0 vs Zep vs Letta Comparison | HydraDB — Feature comparison table for all three systems
- Agent Memory at Scale 2026 Comparison | AgentMarketCap — Market vendor landscape and comprehensive benchmarks
- Zep vs Mem0: Benchmarks and Pricing | Atlan — Detailed performance and cost comparison