What Memory OS Really Means for Hermes Agent Users
Memory OS is a new 6-layer local memory stack for Hermes Agent. Here's the reality check for developers and prosumers.
Memory OS Just Dropped. Here Is What Changes.
Developer ClaudioDrews released it. Memory OS is a community-built, MIT-licensed memory stack that bolts six retrieval layers onto Hermes Agent, and it's a full local infrastructure that runs beside Hermes, not inside it. But it's not a plugin you toggle on, and if you use Hermes for anything serious, this project demands your attention.
Hermes Agent already remembers across sessions. Nous Research shipped it with curated memory files and full-text session search. Solid foundation. But the Memory OS README frames built-in memory as too shallow for real work. The project layers vector search, structured facts, trust scoring, and an auto-curated wiki on top. All of it runs on your machine.
MarkTechPost covered the release. The architecture is worth understanding even if you are not ready to spin up Docker today.
Six Layers, All Running Locally
Memory OS stacks six layers. Hermes already provides the bottom two. The project keeps those and builds four more above them. The full stack runs on Docker, Qdrant, Redis, and Python 3.11+. It works with any LLM provider Hermes supports, including OpenRouter, OpenAI, Anthropic, and Ollama.
Layer by Layer
- Layer 1 , Workspace: MEMORY.md, USER.md, and CREATIVE.md injected into the system prompt each turn.
- Layer 2 , Sessions: state.db, a SQLite database with FTS5 full-text search across conversation history.
- Layer 3 , Structured Facts: Durable facts in memory_store.db using SQLite, HRR, FTS5, and trust scoring. A feedback loop adjusts trust scores over time with entity resolution.
- Layer 4 , Fabric: A heavily forked Icarus Plugin with LLM-powered session extraction and 16 tools including fabric_recall, fabric_write, and fabric_brief.
- Layer 5 , Vector Database: Qdrant with 4096d Cosine vectors plus BM25 sparse search.
- Layer 6 , LLM Wiki: An auto-curated vault of concepts, entities, and comparisons, continuously ingested back into Qdrant.
That's a lot of moving parts. But the design logic's clear. Each layer solves a different recall problem: files catch static preferences, sessions catch recent context, facts catch durable truths, fabric catches cross-session patterns, vectors catch semantic similarity, and the wiki catches emergent knowledge.
How Retrieval Actually Works
But Memory OS doesn't stuff the context window with everything it can find; it gates every source behind a relevance threshold, and on pre_llm_call runs surgical recall at once from Fabric, Qdrant, Sessions, and Facts. Per-session deduplication stops context twice.

That's a small detail. A social-closer filter skips trivial messages, so a plain "thanks" won't clog your context, and it signals the design philosophy where token efficiency is the stated goal, not maximum recall at any cost.
On post_llm_call and on_session_end, the system automatically extracts and captures new learnings, and the agent builds its own memory over time without any manual curation from you. You don't curate manually.
Token Efficiency, Not Window Stuffing
Most memory systems throw everything at the model and hope something sticks. Memory OS does the opposite. It filters first, then recalls. If a source does not clear the relevance threshold, it never reaches the LLM. That keeps your context lean and your inference costs down.
The Fallback That Keeps It Alive
Layer 5's retrieval uses a four-level fallback cascade. Hybrid search runs first. If that returns nothing, it tries dense vectors. Then lexical search. Then SQLite as the last resort. This design keeps recall working even when Qdrant struggles with a query.
A weekly decay scanner ages out stale entries. Semantic dedup merges near-identical memories when cosine similarity exceeds 0.92. These are not flashy features. They are housekeeping. But they are the difference between a memory system that stays useful for months and one that bloats into noise after a week.
Local-First, And Deliberately So
Memory OS positions itself against cloud memory services like mem0, Zep, and Letta. Its pitch is straightforward. Memory infrastructure should run on your own machine. Your memory data stays local. There is no memory subscription. LLM calls still go to whichever provider you choose.
Memory OS isn't official. Hermes already supports eight external memory providers including mem0 and Honcho, and Memory OS is a separate community-built stack layered directly on Hermes, not one of those official providers. But for teams with data-residency rules, a local memory store matters.
Just open-sourced Memory OS ; a complete hierarchical persistent memory architecture for the Hermes Agent. 6 layers, fully local: Structured facts + trust scoring with feedback loop, Hybrid vector search (Qdrant + BM25), Self-curating LLM Wiki, Semantic… , Claudio Drews (@ClaudioDrews25) May 31, 2026
What You Give Up
Real talk. This project is brand new with few commits. The forked Icarus Plugin is explicitly not upstream-compatible. Setup is heavy. Docker, Qdrant, Redis, and an ARQ Worker are all required. There are no published benchmarks on recall quality, latency, or token savings.
ClaudioDrews built something ambitious. But you are signing up for a system with more infrastructure dependencies than most solo developers want to manage. The architecture is sound. The proof is thin.
The Verdict
Memory OS is a blueprint as much as it is a tool. The six-layer design, the gated retrieval, the fallback cascade, the local-first stance. These ideas will influence how agent memory gets built, even if this specific repo never hits 1.0. If you run Hermes Agent and you are hitting the ceiling of its built-in memory, watch this project. Better yet, read the README. The architecture alone is worth your time.
Frequently Asked Questions
What is Memory OS and how does it relate to Hermes Agent?
Memory OS is a community-built, MIT-licensed memory stack that bolts six retrieval layers onto Hermes Agent. It is a full local infrastructure that runs beside Hermes, not inside it, and is not a plugin you can toggle on.
Why did the developer create Memory OS, according to the article?
The Memory OS README frames Hermes Agent's built-in memory as too shallow for real work. The project layers additional retrieval capabilities like vector search, structured facts, trust scoring, and an auto-curated wiki on top to solve deeper recall problems.
How does Memory OS ensure token efficiency during retrieval?
Memory OS gates every source behind a relevance threshold and on pre_llm_call runs surgical recall from Fabric, Qdrant, Sessions, and Facts. It also uses per-session deduplication and a social-closer filter to skip trivial messages, keeping context lean and inference costs down.
Who released Memory OS and what are some of its key architectural layers?
Developer ClaudioDrews released Memory OS. Its six layers include Workspace (MEMORY.md etc.), Sessions (SQLite with FTS5), Structured Facts with trust scoring, Fabric (forked Icarus Plugin), Vector Database (Qdrant), and LLM Wiki (auto-curated vault).
When was Memory OS announced and what are its main infrastructure requirements?
Memory OS was announced on May 31, 2026, as per a tweet by ClaudioDrews. The infrastructure requirements include Docker, Qdrant, Redis, and an ARQ Worker, making setup heavy compared to typical plugins.
💬 Comments (0)
No comments yet. Be the first!













