How AI Agents Remember Across Sessions — A Deep Dive into Hermes's 3-Tier Memory Architecture
The first wall you hit when deploying AI agents in production is this: the context you carefully set up in today's session vanishes completely when you open a new session tomorrow. You have to re-explain "use pnpm as the package manager" every single time, and cross-session recall — like "you know that auth bug Alice mentioned last time" — is simply impossible. Honestly, at first I thought I could just cram everything into the prompt, but the context window (the amount of text an agent can process at once) is frustratingly short, and repetitive tasks just keep piling up.
Hermes Agent, released by NousResearch in February 2026, addresses this problem as a structural question: "how do we organize memory?" The core answer is a 3-tier memory architecture — a structure that lets the AI remember project rules across broken sessions, search conversations from weeks ago, and progressively solidify recurring workflows into skills. This post examines what each of those three tiers stores, when they activate, and how they're used in practice. If you've ever used an AI coding tool, you'll find something directly applicable here.
Core Concepts
Why "3 Tiers"? — Different Kinds of Memory
Humans don't manage memory as a single bucket either. Things you always need at hand (names, titles), things you look up when needed (last meeting notes), and ingrained habits (riding a bike) are all processed by different regions of the brain. Hermes's design is essentially a software translation of that same distinction.
Core principle: "Important facts always in memory, everything else searchable" — the goal is to maintain large-scale memory without polluting the context window.
The three tiers have clearly separated purposes, storage locations, and load timings.
| Tier | Name | Storage Location | Load Timing | Role |
|---|---|---|---|---|
| Tier 1 | Frozen System Prompt Memory | SOUL.md, MEMORY.md, USER.md |
Auto-injected every session | Project rules, user info |
| Tier 2 | Episodic Archive | ~/.hermes/state.db (SQLite FTS5) |
On explicit search | Full conversation history, retained indefinitely |
| Tier 3 | Procedural Memory (Skills) | ~/.hermes/skills/ |
On trigger match | Reuse of successful workflows |
I initially ignored these three tiers and tried to cram everything into Tier 1. The result was predictable — capacity overflow, loading delays, and a system prompt full of useless information. There's a good reason the tiers have clearly separated roles.
Tier 1 — Frozen System Prompt Memory
These are files automatically injected into the system prompt (the initial configuration containing the agent's behavioral guidelines) every time a session starts. There are three components:
SOUL.md: The agent's persona, values, and behavioral principles — an identity that rarely changesMEMORY.md: Project rules, important decisions, and caveats (approximately 2,200-character hard cap per official docs)USER.md: User preferences, communication style, and role information (approximately 1,375-character hard cap per official docs)
# MEMORY.md Example
- Project root: ~/workspace/my-app
- Package manager: pnpm (never use npm)
- Always run prisma migrate when changing DB schema
- Code reviews in PR units, no direct pushes to main
- Deployment environments: k8s staging (staging.myapp.internal), production (prod.myapp.io)The capacity limit is quite tight — I tried cramming things in thinking "this should fit," and got cut off. The key is the selectivity to keep only what would be most catastrophic for the agent not to know. Detailed content is much better off split into Tier 2 or separate documents.
Tier 2 — Episodic Archive (SQLite FTS5)
This is the complete conversation history stored in ~/.hermes/state.db. There's no capacity limit, and it's not loaded into context by default. This is exactly why context can be conserved — when needed, you fire an FTS5 (Full-Text Search) query via session_search to pull only relevant excerpts.
FTS5 (Full-Text Search 5): A full-text search extension built into SQLite. It enables local text indexing and fast queries without a separate search engine, and also guarantees ACID transactions. Because it's keyword-based, it's most effective when searching with specific terms like "auth bug Alice." If you need semantic similarity search, you can consider a vector-search-based provider like Supermemory.
Below is a conceptual example of how a search works internally. It may not be identical to the actual internal schema of Hermes.
-- Conceptual example: FTS5-based conversation search
SELECT session_id, excerpt, relevance_score
FROM conversations_fts
WHERE conversations_fts MATCH 'Alice auth bug'
ORDER BY relevance_score
LIMIT 5;In practice, this tier shines brightest when "revisiting a discussion from weeks ago." "What was that authentication issue Alice mentioned last month?" — fire a query like this and you get relevant session excerpts returned along with an LLM summary. I still remember the moment I first used it and thought, "oh, this actually works."
Tier 3 — Procedural Memory (Skills)
This is the tier I find most distinctive about Hermes. Per the official docs, when a workflow completed with 5 or more tool calls (individual units of work the agent actually executes) is repeated 3–4 times, a background process automatically generates a markdown skill file.
# ~/.hermes/skills/deploy-to-staging/SKILL.md
---
name: deploy-to-staging
triggers: ["deploy", "push to staging", "deploy staging"]
---
1. pnpm build && pnpm test
2. docker build -t app:staging .
3. kubectl apply -f k8s/staging/
4. slack notify #deploymentsAfter that, typing "push to staging" matches the trigger and reuses that skill.
The skill list uses lazy loading — only names and descriptions are loaded first (approximately 3K tokens), and the actual skill content is loaded only when a trigger matches. This means even with dozens of accumulated skills, no context is wasted — just having the loaded skill list is enough for the agent to determine "this skill is what I should use here."
Practical Application
Example 1: Pinning Project Onboarding Context in Tier 1
This is a scenario for ensuring the agent always knows "what to watch out for in this project" when joining a new project or juggling multiple ones in parallel. This situation comes up quite frequently in practice.
# MEMORY.md — Real-World Project Example
## Project Basics
- Root: ~/workspace/payment-service
- Language: TypeScript strict mode, Node 20 LTS
- Package manager: pnpm (no npm/yarn)
## Database
- ORM: Prisma (schema changes require prisma migrate dev)
- DB: PostgreSQL 15 (local: localhost:5432/payment_dev)
## Deployment Rules
- No direct pushes to main, PR + 2-person review required
- Staging: use deploy:staging script
- Secrets: .env.local (never commit, source lives in Vault)
## Known Caveats
- payment_transactions table allows soft deletes only
- Stripe webhooks must handle idempotency keysThe section I pay most attention to in this table is "Known Caveats." Things you'd think "of course it knows that" are often precisely what the agent doesn't know.
| Element | Description |
|---|---|
| Section separation | Grouping by category helps the agent apply relevant rules more effectively |
| Known caveats | The key is being explicit rather than assuming "it'll obviously know this" |
| Capacity management | Only the essentials within the 2,200-character limit — detailed content goes to Tier 2 or separate docs |
Example 2: Hardening a Recurring Deployment Workflow into a Tier 3 Skill
After repeating a staging deployment 3–4 times a week, Hermes will automatically generate a skill, but you can also write one yourself from the start. For important production workflows especially, writing it yourself is actually recommended — auto-generated skills can cement the mistakes from the original workflow as-is.
# ~/.hermes/skills/full-deploy/SKILL.md
---
name: full-deploy
triggers: ["full deploy", "production deploy", "full deploy", "release"]
preconditions:
- "Run on main branch only"
- "Confirm CHANGELOG is updated"
---
## Pre-Deployment Checks
1. Check git status (no uncommitted changes)
2. pnpm test && pnpm lint
3. Verify latest version entry in CHANGELOG.md
## Build & Deploy
4. pnpm build:prod
# $VERSION is specified directly from the latest CHANGELOG.md entry (e.g., v1.4.2)
5. docker build -t payment-service:$VERSION .
6. docker push registry.myco.io/payment-service:$VERSION
# The direct kubectl call below may not be appropriate in GitOps environments (ArgoCD, Flux, etc.)
7. kubectl set image deployment/payment-service app=registry.myco.io/payment-service:$VERSION -n production
## Post-Deployment Validation
8. kubectl rollout status deployment/payment-service -n production
9. curl https://api.myco.io/health | grep '"status":"ok"'
10. slack notify #releases "payment-service $VERSION deployment complete"The preconditions field is something that often gets left out in practice — specifying it here means the agent will first verify conditions are met before proceeding.
Example 3: Integrating an External Mem0 Provider (Advanced)
When the Tier 1 capacity limit feels constraining, or you want to manage user preferences across multiple projects, you can connect an external memory provider. Hermes officially supports 8 external providers including Honcho, Mem0, Hindsight, and Supermemory — each specialized in different directions: user behavior pattern modeling, long-term memory management, retrospection-based extraction, and vector semantic search, respectively.
Here's the simplest to configure among them — a Mem0 example.
# 30-second setup
hermes config set memory.provider mem0
hermes config set memory.mem0.api_key $MEM0_API_KEYMem0 automatically manages two memory scopes:
| Scope | Range | Usage Example |
|---|---|---|
| Session memory | Current conversation | "API design decisions discussed this session" |
| User memory | Persistent across all sessions | "This person prefers functional style and always enables TypeScript strict" |
Relevant memories are automatically retrieved and injected at the start of new sessions, dynamically supplementing the static files in Tier 1. Note that only one external provider can be active at a time — worth keeping in mind.
Pros and Cons
Advantages
| Item | Description |
|---|---|
| Unlimited episodic memory | SQLite FTS5 preserves all conversations indefinitely without consuming context capacity |
| Automatic self-improvement | Successful workflows are automatically converted into skills, eliminating the need to re-learn repetitive tasks |
| Zero infrastructure | SQLite-based, so it runs locally immediately without additional servers |
| Per-tier loading optimization | Tier 3 loads only names and descriptions (~3K tokens) and loads full content on demand |
| Pluggable extensibility | Functionality can be extended with external memory providers |
| Model-agnostic | Supports various LLM backends including Claude, GPT-4o, and Grok (2M context) |
Disadvantages and Caveats
| Item | Description | Mitigation |
|---|---|---|
| Tier 1 capacity limits | MEMORY.md ~2,200 char, USER.md ~1,375 char hard caps (per official docs) | Select by importance, split details to Tier 2 |
| Judgment-based storage | Agent decides what to store autonomously, so contaminated memories can persist | Periodically review and clean up MEMORY.md |
| Context loss bug | Cases exist where Tier 2 search results overwrite disk files during large file processing (check official release notes for current version status) | Back up important files separately, consider external provider integration |
| Minimum context requirement | 64K+ context window recommended — poor fit with smaller models | Recommend using large models like Claude, GPT-4o |
| Single external provider activation | Only one external provider can be active at a time | Watch plugin ecosystem developments for multi-provider support |
| Inconsistent skill quality | Quality of auto-generated skills depends on the success rate of the original workflow | Manually review and refine important skills |
Among these downsides, the one that bothered me most was the contamination from judgment-based storage. Once the agent "remembers" a wrong rule as fact, it keeps repeating that pattern unintentionally. That's why I developed the habit of opening MEMORY.md directly about once a month.
The Most Common Mistakes in Practice
- Trying to put everything in MEMORY.md and hitting the capacity limit — The 2,200-character limit fills up faster than you'd think. It's best to select based on "would it be catastrophic if the agent didn't know this?" Detailed background explanations are better moved to Tier 2 or linked to a separate document.
- Trusting auto-generated skills without review — For critical workflows like production deployments, it's strongly recommended to open the auto-generated skill file and verify each step. If there were mistakes in the original workflow, they get hardened in as-is.
- Not backing up Tier 2 data — The entire conversation history is stored in a single
~/.hermes/state.dbfile. A disk problem or OS reinstall can wipe it all out, so for important projects consider periodic backups or external provider integration.
Closing Thoughts
What you place in which tier determines the agent's actual usefulness.
Hermes's 3-tier memory is an attempt to structure "how AI remembers." It's not a perfect architecture — there are real limitations: Tier 1 capacity constraints, the possibility of contamination from judgment-based storage, context loss bugs. But it's clearly a framework that offers a practical answer to the longstanding problem of maintaining context across sessions.
Three steps you can start right now:
- Start by writing MEMORY.md — List the 5 things "most inconvenient when the agent didn't know them" about your current project — package manager, DB migration rules, deployment caveats, etc. — within 2,200 characters.
- After a month of use, try leveraging Tier 2 search — Try firing a question you've actually wondered about at work — like "what was that X issue we discussed last month?" — via
session_search, and you'll get a feel for the real value of the episodic archive. - When a recurring workflow emerges, try writing a skill file yourself — Rather than waiting for auto-generation, write tasks you repeat 3+ times a week — deployments, tests, PR creation — directly in YAML under
~/.hermes/skills/for immediate reuse.
References
- Hermes Agent Official Docs — Persistent Memory
- Hermes Agent Official Docs — Memory Providers
- Hermes Agent Memory System: How Persistent AI Memory Actually Works — Rost Glukhov
- How Hermes Agent Memory Works — 3-Layer System Explained
- Hermes Agent memory: SOUL.md, MEMORY.md and state.db — LumaDock
- How Hermes Agent Solves the Context Window Problem — Medium
- Hermes Agent Memory Providers: All 7 Options Compared — Vectorize.io
- Hermes Agent 5-Pillar Architecture — MindStudio
- Hermes Agent Masterclass — Daily Dose of Data Science
- NousResearch/hermes-agent GitHub — DeepWiki