How AI Agents Remember Across Sessions — A Deep Dive into Hermes's 3-Tier Memory Architecture

The first wall you hit when deploying AI agents in production is this: the context you carefully set up in today's session vanishes completely when you open a new session tomorrow. You have to re-explain "use pnpm as the package manager" every single time, and cross-session recall — like "you know that auth bug Alice mentioned last time" — is simply impossible. Honestly, at first I thought I could just cram everything into the prompt, but the context window (the amount of text an agent can process at once) is frustratingly short, and repetitive tasks just keep piling up.

Hermes Agent, released by NousResearch in February 2026, addresses this problem as a structural question: "how do we organize memory?" The core answer is a 3-tier memory architecture — a structure that lets the AI remember project rules across broken sessions, search conversations from weeks ago, and progressively solidify recurring workflows into skills. This post examines what each of those three tiers stores, when they activate, and how they're used in practice. If you've ever used an AI coding tool, you'll find something directly applicable here.

Core Concepts

Why "3 Tiers"? — Different Kinds of Memory

Humans don't manage memory as a single bucket either. Things you always need at hand (names, titles), things you look up when needed (last meeting notes), and ingrained habits (riding a bike) are all processed by different regions of the brain. Hermes's design is essentially a software translation of that same distinction.

Core principle: "Important facts always in memory, everything else searchable" — the goal is to maintain large-scale memory without polluting the context window.

The three tiers have clearly separated purposes, storage locations, and load timings.

Tier	Name	Storage Location	Load Timing	Role
Tier 1	Frozen System Prompt Memory	`SOUL.md`, `MEMORY.md`, `USER.md`	Auto-injected every session	Project rules, user info
Tier 2	Episodic Archive	`~/.hermes/state.db` (SQLite FTS5)	On explicit search	Full conversation history, retained indefinitely
Tier 3	Procedural Memory (Skills)	`~/.hermes/skills/`	On trigger match	Reuse of successful workflows

I initially ignored these three tiers and tried to cram everything into Tier 1. The result was predictable — capacity overflow, loading delays, and a system prompt full of useless information. There's a good reason the tiers have clearly separated roles.

Tier 1 — Frozen System Prompt Memory

These are files automatically injected into the system prompt (the initial configuration containing the agent's behavioral guidelines) every time a session starts. There are three components:

SOUL.md: The agent's persona, values, and behavioral principles — an identity that rarely changes
MEMORY.md: Project rules, important decisions, and caveats (approximately 2,200-character hard cap per official docs)
USER.md: User preferences, communication style, and role information (approximately 1,375-character hard cap per official docs)

markdown

# MEMORY.md Example
 
- Project root: ~/workspace/my-app
- Package manager: pnpm (never use npm)
- Always run prisma migrate when changing DB schema
- Code reviews in PR units, no direct pushes to main
- Deployment environments: k8s staging (staging.myapp.internal), production (prod.myapp.io)

The capacity limit is quite tight — I tried cramming things in thinking "this should fit," and got cut off. The key is the selectivity to keep only what would be most catastrophic for the agent not to know. Detailed content is much better off split into Tier 2 or separate documents.

Tier 2 — Episodic Archive (SQLite FTS5)

This is the complete conversation history stored in ~/.hermes/state.db. There's no capacity limit, and it's not loaded into context by default. This is exactly why context can be conserved — when needed, you fire an FTS5 (Full-Text Search) query via session_search to pull only relevant excerpts.

FTS5 (Full-Text Search 5): A full-text search extension built into SQLite. It enables local text indexing and fast queries without a separate search engine, and also guarantees ACID transactions. Because it's keyword-based, it's most effective when searching with specific terms like "auth bug Alice." If you need semantic similarity search, you can consider a vector-search-based provider like Supermemory.

Below is a conceptual example of how a search works internally. It may not be identical to the actual internal schema of Hermes.

sql

-- Conceptual example: FTS5-based conversation search
SELECT session_id, excerpt, relevance_score
FROM conversations_fts
WHERE conversations_fts MATCH 'Alice auth bug'
ORDER BY relevance_score
LIMIT 5;

In practice, this tier shines brightest when "revisiting a discussion from weeks ago." "What was that authentication issue Alice mentioned last month?" — fire a query like this and you get relevant session excerpts returned along with an LLM summary. I still remember the moment I first used it and thought, "oh, this actually works."

Tier 3 — Procedural Memory (Skills)

This is the tier I find most distinctive about Hermes. Per the official docs, when a workflow completed with 5 or more tool calls (individual units of work the agent actually executes) is repeated 3–4 times, a background process automatically generates a markdown skill file.

yaml

# ~/.hermes/skills/deploy-to-staging/SKILL.md
---
name: deploy-to-staging
triggers: ["deploy", "push to staging", "deploy staging"]
---
1. pnpm build && pnpm test
2. docker build -t app:staging .
3. kubectl apply -f k8s/staging/
4. slack notify #deployments

After that, typing "push to staging" matches the trigger and reuses that skill.

The skill list uses lazy loading — only names and descriptions are loaded first (approximately 3K tokens), and the actual skill content is loaded only when a trigger matches. This means even with dozens of accumulated skills, no context is wasted — just having the loaded skill list is enough for the agent to determine "this skill is what I should use here."

Practical Application

Example 1: Pinning Project Onboarding Context in Tier 1

This is a scenario for ensuring the agent always knows "what to watch out for in this project" when joining a new project or juggling multiple ones in parallel. This situation comes up quite frequently in practice.

markdown

# MEMORY.md — Real-World Project Example
 
## Project Basics
- Root: ~/workspace/payment-service
- Language: TypeScript strict mode, Node 20 LTS
- Package manager: pnpm (no npm/yarn)
 
## Database
- ORM: Prisma (schema changes require prisma migrate dev)
- DB: PostgreSQL 15 (local: localhost:5432/payment_dev)
 
## Deployment Rules
- No direct pushes to main, PR + 2-person review required
- Staging: use deploy:staging script
- Secrets: .env.local (never commit, source lives in Vault)
 
## Known Caveats
- payment_transactions table allows soft deletes only
- Stripe webhooks must handle idempotency keys

The section I pay most attention to in this table is "Known Caveats." Things you'd think "of course it knows that" are often precisely what the agent doesn't know.

Element	Description
Section separation	Grouping by category helps the agent apply relevant rules more effectively
Known caveats	The key is being explicit rather than assuming "it'll obviously know this"
Capacity management	Only the essentials within the 2,200-character limit — detailed content goes to Tier 2 or separate docs

Example 2: Hardening a Recurring Deployment Workflow into a Tier 3 Skill

After repeating a staging deployment 3–4 times a week, Hermes will automatically generate a skill, but you can also write one yourself from the start. For important production workflows especially, writing it yourself is actually recommended — auto-generated skills can cement the mistakes from the original workflow as-is.

yaml

# ~/.hermes/skills/full-deploy/SKILL.md
---
name: full-deploy
triggers: ["full deploy", "production deploy", "full deploy", "release"]
preconditions:
  - "Run on main branch only"
  - "Confirm CHANGELOG is updated"
---
 
## Pre-Deployment Checks
1. Check git status (no uncommitted changes)
2. pnpm test && pnpm lint
3. Verify latest version entry in CHANGELOG.md
 
## Build & Deploy
4. pnpm build:prod
# $VERSION is specified directly from the latest CHANGELOG.md entry (e.g., v1.4.2)
5. docker build -t payment-service:$VERSION .
6. docker push registry.myco.io/payment-service:$VERSION
# The direct kubectl call below may not be appropriate in GitOps environments (ArgoCD, Flux, etc.)
7. kubectl set image deployment/payment-service app=registry.myco.io/payment-service:$VERSION -n production
 
## Post-Deployment Validation
8. kubectl rollout status deployment/payment-service -n production
9. curl https://api.myco.io/health | grep '"status":"ok"'
10. slack notify #releases "payment-service $VERSION deployment complete"

The preconditions field is something that often gets left out in practice — specifying it here means the agent will first verify conditions are met before proceeding.

Example 3: Integrating an External Mem0 Provider (Advanced)

When the Tier 1 capacity limit feels constraining, or you want to manage user preferences across multiple projects, you can connect an external memory provider. Hermes officially supports 8 external providers including Honcho, Mem0, Hindsight, and Supermemory — each specialized in different directions: user behavior pattern modeling, long-term memory management, retrospection-based extraction, and vector semantic search, respectively.

Here's the simplest to configure among them — a Mem0 example.

bash

# 30-second setup
hermes config set memory.provider mem0
hermes config set memory.mem0.api_key $MEM0_API_KEY

Mem0 automatically manages two memory scopes:

Scope	Range	Usage Example
Session memory	Current conversation	"API design decisions discussed this session"
User memory	Persistent across all sessions	"This person prefers functional style and always enables TypeScript strict"

Relevant memories are automatically retrieved and injected at the start of new sessions, dynamically supplementing the static files in Tier 1. Note that only one external provider can be active at a time — worth keeping in mind.

Pros and Cons

Advantages

Item	Description
Unlimited episodic memory	SQLite FTS5 preserves all conversations indefinitely without consuming context capacity
Automatic self-improvement	Successful workflows are automatically converted into skills, eliminating the need to re-learn repetitive tasks
Zero infrastructure	SQLite-based, so it runs locally immediately without additional servers
Per-tier loading optimization	Tier 3 loads only names and descriptions (~3K tokens) and loads full content on demand
Pluggable extensibility	Functionality can be extended with external memory providers
Model-agnostic	Supports various LLM backends including Claude, GPT-4o, and Grok (2M context)

Disadvantages and Caveats

Item	Description	Mitigation
Tier 1 capacity limits	MEMORY.md ~2,200 char, USER.md ~1,375 char hard caps (per official docs)	Select by importance, split details to Tier 2
Judgment-based storage	Agent decides what to store autonomously, so contaminated memories can persist	Periodically review and clean up MEMORY.md
Context loss bug	Cases exist where Tier 2 search results overwrite disk files during large file processing (check official release notes for current version status)	Back up important files separately, consider external provider integration
Minimum context requirement	64K+ context window recommended — poor fit with smaller models	Recommend using large models like Claude, GPT-4o
Single external provider activation	Only one external provider can be active at a time	Watch plugin ecosystem developments for multi-provider support
Inconsistent skill quality	Quality of auto-generated skills depends on the success rate of the original workflow	Manually review and refine important skills

Among these downsides, the one that bothered me most was the contamination from judgment-based storage. Once the agent "remembers" a wrong rule as fact, it keeps repeating that pattern unintentionally. That's why I developed the habit of opening MEMORY.md directly about once a month.

The Most Common Mistakes in Practice

Trying to put everything in MEMORY.md and hitting the capacity limit — The 2,200-character limit fills up faster than you'd think. It's best to select based on "would it be catastrophic if the agent didn't know this?" Detailed background explanations are better moved to Tier 2 or linked to a separate document.
Trusting auto-generated skills without review — For critical workflows like production deployments, it's strongly recommended to open the auto-generated skill file and verify each step. If there were mistakes in the original workflow, they get hardened in as-is.
Not backing up Tier 2 data — The entire conversation history is stored in a single ~/.hermes/state.db file. A disk problem or OS reinstall can wipe it all out, so for important projects consider periodic backups or external provider integration.

Closing Thoughts

What you place in which tier determines the agent's actual usefulness.

Hermes's 3-tier memory is an attempt to structure "how AI remembers." It's not a perfect architecture — there are real limitations: Tier 1 capacity constraints, the possibility of contamination from judgment-based storage, context loss bugs. But it's clearly a framework that offers a practical answer to the longstanding problem of maintaining context across sessions.

Three steps you can start right now:

Start by writing MEMORY.md — List the 5 things "most inconvenient when the agent didn't know them" about your current project — package manager, DB migration rules, deployment caveats, etc. — within 2,200 characters.
After a month of use, try leveraging Tier 2 search — Try firing a question you've actually wondered about at work — like "what was that X issue we discussed last month?" — via session_search, and you'll get a feel for the real value of the episodic archive.
When a recurring workflow emerges, try writing a skill file yourself — Rather than waiting for auto-generation, write tasks you repeat 3+ times a week — deployments, tests, PR creation — directly in YAML under ~/.hermes/skills/ for immediate reuse.

References

#AI에이전트#메모리아키텍처#SQLite#FTS5#LLM#컨텍스트윈도우#LazyLoading#벡터검색#워크플로우자동화#Mem0

How AI Agents Remember Across Sessions — A Deep Dive into Hermes's 3-Tier Memory Architecture | DEV BAK - 기술블로그

frontend

How AI Agents Remember Across Sessions — A Deep Dive into Hermes's 3-Tier Memory Architecture

Core Concepts

Why "3 Tiers"? — Different Kinds of Memory

Core principle: "Important facts always in memory, everything else searchable" — the goal is to maintain large-scale memory without polluting the context window.

The three tiers have clearly separated purposes, storage locations, and load timings.

Tier	Name	Storage Location	Load Timing	Role
Tier 1	Frozen System Prompt Memory	`SOUL.md`, `MEMORY.md`, `USER.md`	Auto-injected every session	Project rules, user info
Tier 2	Episodic Archive	`~/.hermes/state.db` (SQLite FTS5)	On explicit search	Full conversation history, retained indefinitely
Tier 3	Procedural Memory (Skills)	`~/.hermes/skills/`	On trigger match	Reuse of successful workflows

Tier 1 — Frozen System Prompt Memory

These are files automatically injected into the system prompt (the initial configuration containing the agent's behavioral guidelines) every time a session starts. There are three components:

SOUL.md: The agent's persona, values, and behavioral principles — an identity that rarely changes
MEMORY.md: Project rules, important decisions, and caveats (approximately 2,200-character hard cap per official docs)
USER.md: User preferences, communication style, and role information (approximately 1,375-character hard cap per official docs)

markdown

# MEMORY.md Example
 
- Project root: ~/workspace/my-app
- Package manager: pnpm (never use npm)
- Always run prisma migrate when changing DB schema
- Code reviews in PR units, no direct pushes to main
- Deployment environments: k8s staging (staging.myapp.internal), production (prod.myapp.io)

Tier 2 — Episodic Archive (SQLite FTS5)

FTS5 (Full-Text Search 5): A full-text search extension built into SQLite. It enables local text indexing and fast queries without a separate search engine, and also guarantees ACID transactions. Because it's keyword-based, it's most effective when searching with specific terms like "auth bug Alice." If you need semantic similarity search, you can consider a vector-search-based provider like Supermemory.

Below is a conceptual example of how a search works internally. It may not be identical to the actual internal schema of Hermes.

sql

-- Conceptual example: FTS5-based conversation search
SELECT session_id, excerpt, relevance_score
FROM conversations_fts
WHERE conversations_fts MATCH 'Alice auth bug'
ORDER BY relevance_score
LIMIT 5;

Tier 3 — Procedural Memory (Skills)

yaml

# ~/.hermes/skills/deploy-to-staging/SKILL.md
---
name: deploy-to-staging
triggers: ["deploy", "push to staging", "deploy staging"]
---
1. pnpm build && pnpm test
2. docker build -t app:staging .
3. kubectl apply -f k8s/staging/
4. slack notify #deployments

After that, typing "push to staging" matches the trigger and reuses that skill.

Practical Application

Example 1: Pinning Project Onboarding Context in Tier 1

markdown

# MEMORY.md — Real-World Project Example
 
## Project Basics
- Root: ~/workspace/payment-service
- Language: TypeScript strict mode, Node 20 LTS
- Package manager: pnpm (no npm/yarn)
 
## Database
- ORM: Prisma (schema changes require prisma migrate dev)
- DB: PostgreSQL 15 (local: localhost:5432/payment_dev)
 
## Deployment Rules
- No direct pushes to main, PR + 2-person review required
- Staging: use deploy:staging script
- Secrets: .env.local (never commit, source lives in Vault)
 
## Known Caveats
- payment_transactions table allows soft deletes only
- Stripe webhooks must handle idempotency keys

The section I pay most attention to in this table is "Known Caveats." Things you'd think "of course it knows that" are often precisely what the agent doesn't know.

Element	Description
Section separation	Grouping by category helps the agent apply relevant rules more effectively
Known caveats	The key is being explicit rather than assuming "it'll obviously know this"
Capacity management	Only the essentials within the 2,200-character limit — detailed content goes to Tier 2 or separate docs

Example 2: Hardening a Recurring Deployment Workflow into a Tier 3 Skill

yaml

# ~/.hermes/skills/full-deploy/SKILL.md
---
name: full-deploy
triggers: ["full deploy", "production deploy", "full deploy", "release"]
preconditions:
  - "Run on main branch only"
  - "Confirm CHANGELOG is updated"
---
 
## Pre-Deployment Checks
1. Check git status (no uncommitted changes)
2. pnpm test && pnpm lint
3. Verify latest version entry in CHANGELOG.md
 
## Build & Deploy
4. pnpm build:prod
# $VERSION is specified directly from the latest CHANGELOG.md entry (e.g., v1.4.2)
5. docker build -t payment-service:$VERSION .
6. docker push registry.myco.io/payment-service:$VERSION
# The direct kubectl call below may not be appropriate in GitOps environments (ArgoCD, Flux, etc.)
7. kubectl set image deployment/payment-service app=registry.myco.io/payment-service:$VERSION -n production
 
## Post-Deployment Validation
8. kubectl rollout status deployment/payment-service -n production
9. curl https://api.myco.io/health | grep '"status":"ok"'
10. slack notify #releases "payment-service $VERSION deployment complete"

The preconditions field is something that often gets left out in practice — specifying it here means the agent will first verify conditions are met before proceeding.

Example 3: Integrating an External Mem0 Provider (Advanced)

Here's the simplest to configure among them — a Mem0 example.

bash

# 30-second setup
hermes config set memory.provider mem0
hermes config set memory.mem0.api_key $MEM0_API_KEY

Mem0 automatically manages two memory scopes:

Scope	Range	Usage Example
Session memory	Current conversation	"API design decisions discussed this session"
User memory	Persistent across all sessions	"This person prefers functional style and always enables TypeScript strict"

Pros and Cons

Advantages

Item	Description
Unlimited episodic memory	SQLite FTS5 preserves all conversations indefinitely without consuming context capacity
Automatic self-improvement	Successful workflows are automatically converted into skills, eliminating the need to re-learn repetitive tasks
Zero infrastructure	SQLite-based, so it runs locally immediately without additional servers
Per-tier loading optimization	Tier 3 loads only names and descriptions (~3K tokens) and loads full content on demand
Pluggable extensibility	Functionality can be extended with external memory providers
Model-agnostic	Supports various LLM backends including Claude, GPT-4o, and Grok (2M context)

Disadvantages and Caveats

Item	Description	Mitigation
Tier 1 capacity limits	MEMORY.md ~2,200 char, USER.md ~1,375 char hard caps (per official docs)	Select by importance, split details to Tier 2
Judgment-based storage	Agent decides what to store autonomously, so contaminated memories can persist	Periodically review and clean up MEMORY.md
Context loss bug	Cases exist where Tier 2 search results overwrite disk files during large file processing (check official release notes for current version status)	Back up important files separately, consider external provider integration
Minimum context requirement	64K+ context window recommended — poor fit with smaller models	Recommend using large models like Claude, GPT-4o
Single external provider activation	Only one external provider can be active at a time	Watch plugin ecosystem developments for multi-provider support
Inconsistent skill quality	Quality of auto-generated skills depends on the success rate of the original workflow	Manually review and refine important skills

The Most Common Mistakes in Practice

Trying to put everything in MEMORY.md and hitting the capacity limit — The 2,200-character limit fills up faster than you'd think. It's best to select based on "would it be catastrophic if the agent didn't know this?" Detailed background explanations are better moved to Tier 2 or linked to a separate document.
Trusting auto-generated skills without review — For critical workflows like production deployments, it's strongly recommended to open the auto-generated skill file and verify each step. If there were mistakes in the original workflow, they get hardened in as-is.
Not backing up Tier 2 data — The entire conversation history is stored in a single ~/.hermes/state.db file. A disk problem or OS reinstall can wipe it all out, so for important projects consider periodic backups or external provider integration.

Closing Thoughts

What you place in which tier determines the agent's actual usefulness.

Three steps you can start right now:

Start by writing MEMORY.md — List the 5 things "most inconvenient when the agent didn't know them" about your current project — package manager, DB migration rules, deployment caveats, etc. — within 2,200 characters.
After a month of use, try leveraging Tier 2 search — Try firing a question you've actually wondered about at work — like "what was that X issue we discussed last month?" — via session_search, and you'll get a feel for the real value of the episodic archive.
When a recurring workflow emerges, try writing a skill file yourself — Rather than waiting for auto-generation, write tasks you repeat 3+ times a week — deployments, tests, PR creation — directly in YAML under ~/.hermes/skills/ for immediate reuse.

References

#AI에이전트#메모리아키텍처#SQLite#FTS5#LLM#컨텍스트윈도우#LazyLoading#벡터검색#워크플로우자동화#Mem0

Core Concepts

Why "3 Tiers"? — Different Kinds of Memory

Tier 1 — Frozen System Prompt Memory

Tier 2 — Episodic Archive (SQLite FTS5)

Tier 3 — Procedural Memory (Skills)

Practical Application

Example 1: Pinning Project Onboarding Context in Tier 1

Example 2: Hardening a Recurring Deployment Workflow into a Tier 3 Skill

Example 3: Integrating an External Mem0 Provider (Advanced)

Pros and Cons

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Closing Thoughts

References

Core Concepts

Why "3 Tiers"? — Different Kinds of Memory

Tier 1 — Frozen System Prompt Memory

Tier 2 — Episodic Archive (SQLite FTS5)

Tier 3 — Procedural Memory (Skills)

Practical Application

Example 1: Pinning Project Onboarding Context in Tier 1

Example 2: Hardening a Recurring Deployment Workflow into a Tier 3 Skill

Example 3: Integrating an External Mem0 Provider (Advanced)

Pros and Cons

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Closing Thoughts

References

Recommended Posts

From tsup to tsdown: How Rust-Based Rolldown and Oxc Cut TypeScript Library Build Times by Orders of Magnitude

Why Vite 8 Cut Build Times by Up to 97%: How Rolldown Replaced the esbuild·Rollup Dual-Bundler Architecture

Reducing Micro Frontend LCP by 41% with Rolldown codeSplitting and Module Federation 2.0

Building Type-Safe Server Actions with next-safe-action — Middleware Authentication and Vitest Testing

Unit Testing Next.js Server Actions: Covering Form Handling, Authentication, and Error Handling with Vitest

`cookies()`, `headers()`, `redirect()` — Patterns for Mocking Next.js App Router Server APIs in Vitest