Doramagic Project Pack · Human Manual
mnemo
Local, on-demand memory for AI coding agents (MCP). Strictly offline; built for 10+ agents on a 16GB machine.
Overview, Principles & System Architecture
Related topics: Memory Domain Model & MCP Tool Surface, Recall Pipeline, Embedders & LLM Kit Integration, Deployment, Configuration, Storage & Client Wiring
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Memory Domain Model & MCP Tool Surface, Recall Pipeline, Embedders & LLM Kit Integration, Deployment, Configuration, Storage & Client Wiring
Overview, Principles & System Architecture
Purpose and Scope
mnemo is a local memory store for AI coding agents. The CLI entry point is built around the tagline *"local memory for AI coding agents. Store and search typed memories locally."* (src/mnemo/adapters/cli/app.py:11-12).
The system is organized so that writes never invoke an LLM — memories are stored as plain typed records — while reads can opt in to an LLM via the recall tool, which was added in the 0.3.0 release. As stated by the MCP remember tool: *"No LLM runs on write."* (src/mnemo/adapters/mcp/server.py). When recall is exercised, an LLM may synthesize a grounded answer but must reply No relevant memories found. rather than drawing on outside knowledge.
The codebase is split between two libraries:
| Package | Role |
|---|---|
mnemo | Domain, application, infrastructure, CLI and MCP adapters |
llmkit | Model loading, residency, and capability ports (Embedder, Reranker, Generator, Nli) |
llmkit exposes only narrow capability ports so the rest of mnemo never imports an engine (src/llmkit/ports/embedder.py, src/llmkit/ports/nli.py).
Architectural Layers
mnemo follows a hexagonal (ports-and-adapters) layering. The composition root at src/mnemo/infrastructure/composition.py wires concrete adapters into use-case constructors and produces the container that the CLI and MCP server call.
flowchart TB
subgraph Adapters
CLI["CLI (Typer)<br/>src/mnemo/adapters/cli/app.py"]
MCP["MCP Server<br/>src/mnemo/adapters/mcp/server.py"]
HASH["HashEmbedder<br/>src/mnemo/adapters/embedding/hash_embedder.py"]
end
subgraph Infrastructure
COMP["Composition root<br/>src/mnemo/infrastructure/composition.py"]
end
subgraph Application
UC["Use cases<br/>(remember, search, browse, recall, project mgmt)"]
PIPE["Recall pipeline<br/>(gather → assemble → optional rerank/generate)"]
end
subgraph Domain
MEM["Memory entity<br/>src/mnemo/domain/memory.py"]
PROJ["Project entity"]
end
subgraph llmkit
EMB["Embedder port"]
RER["Reranker port"]
GEN["Generator port"]
end
CLI --> COMP
MCP --> COMP
COMP --> UC
UC --> PIPE
UC --> MEM
PIPE --> EMB
PIPE --> RER
PIPE --> GEN
HASH -.implements.-> EMBDomain layer
The Memory aggregate is the only write-side entity that matters. It is built through Memory.create(...) so invariants (non-empty content, typed enum, scope/project resolution) are enforced at the factory rather than scattered through the application layer (src/mnemo/domain/memory.py). The dataclass carries identity (id), a content hash for dedup, supersession links, and scope/project/type discriminators used for filtering.
Application layer
Use cases are protocol-based; for example CreateProjectUseCase, UpdateProjectUseCase, DeleteProjectUseCase, ListProjectsUseCase, and BrowseMemoryUseCase are defined as Protocol classes in src/mnemo/application/use_cases/interfaces/. Their concrete implementations live next to them as *UseCaseImpl classes — see UpdateProjectUseCaseImpl in src/mnemo/application/use_cases/update_project.py.
A Retrieval value object is the contract between use cases and the store. It carries the structured SearchCriteria, a page size, and a query representation: text feeds the lexical (FTS) leg, vector the dense leg. Construction rejects the illegal state where exactly one of text/vector is set — *"a search carries BOTH; a filter-only browse (recency order) carries NEITHER. Exactly one without the other is rejected at construction"* (src/mnemo/application/retrieval.py).
Infrastructure and adapters
build_container() in src/mnemo/infrastructure/composition.py is the only place where concrete classes are selected. It picks the embedder by config (hash for the offline tests skeleton; pplx for the default ONNX model), and constructs optional Reranker and Generator adapters for the recall pipeline. Selecting off for any of these yields a structured-only recall.
The Recall Pipeline
RecallProjectUseCaseImpl.execute(...) is the single entry point that runs the read pipeline (src/mnemo/application/use_cases/recall_project.py). The pipeline is built by build_recall_pipeline(...) and is composed of stages declared in mnemo.application.recall:
- Gather — the embedder encodes the query, the repository returns the top-k hybrid hits.
- Assemble — a pure, model-free stage that groups the gathered memories by
typeintoRecallSections and produces aRecallBundle(src/mnemo/application/recall/assemble_stage.py, src/mnemo/application/recall/bundle.py). The bundle is published on theRECALLslot. - Rerank *(optional)* — a
Rerankerfromllmkitreorders the top-k. - Generate *(optional)* — a
Generatorfromllmkitwrites the prose summary; otherwise the structured grouping (and reranked order) is the answer.
The CLI prints the bundle as JSON: *"recall ... the bundle prints as JSON."* (src/mnemo/adapters/cli/app.py).
Principles
A few design principles follow directly from the source:
- No LLM on write. Memories are typed records; an LLM is opt-in only on the read path (src/mnemo/adapters/mcp/server.py).
- Engine-agnostic application code. Application types import only
llmkit.ports.*and value types; concrete engines (ONNX, llama.cpp/GGUF) are confined to the composition root and thellmkitruntime (src/llmkit/ports/embedder.py, src/llmkit/config.py). - Deterministic offline mode.
HashEmbedderis a *"Deterministic, dependency-free embedder. NOT semantic — offline/tests/skeleton."* so the core can be tested without model downloads (src/mnemo/adapters/embedding/hash_embedder.py). - Two-leg hybrid ranking. A
Retrievalmust carry bothtextandvectorfor a search, or neither for a browse — half-signal requests are rejected at construction (src/mnemo/application/retrieval.py). - Grounded answers. When a generator is configured, it is constrained to the gathered memories; if none apply it must say so rather than invent (src/mnemo/application/use_cases/recall_project.py).
See Also
- Memory Model & Types —
Memory,MemoryType,Scope,topic_key, supersession. - Projects & Scoping — project gate, near-match candidates, global vs. project scope.
- Recall Pipeline Deep Dive — gather/assemble/rerank/generate stages and the
RECALLslot. - Hybrid Search: FTS + Vectors —
Retrieval,SearchCriteria, and the lexical/dense leg contract. - Embedders & llmkit —
HashEmbedder, thepplxONNX embedder, and engine residency.
Source: https://github.com/arttttt/mnemo / Human Manual
Memory Domain Model & MCP Tool Surface
Related topics: Overview, Principles & System Architecture, Recall Pipeline, Embedders & LLM Kit Integration
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview, Principles & System Architecture, Recall Pipeline, Embedders & LLM Kit Integration
Memory Domain Model & MCP Tool Surface
Overview
mnemo is a local, on-demand memory layer for AI coding agents (Claude Code, Cursor, Windsurf, any MCP client). It remembers decisions, bugs, progress, and rules across sessions without any cloud calls — embeddings and LLMs run only on the host machine. The system is built around a single shared service process that ten or more agents can talk to concurrently via the Model Context Protocol (MCP), with a thin CLI on top.
This page documents the two foundations that everything else rests on: the memory domain model (what a memory *is*) and the MCP tool surface (what an agent can *ask* the service to do). Together they define the contract between agents and the persistent store.
Source: README.md
The `Memory` Domain Model
Entity & Invariants
The Memory is a single focused unit of text. Its fields cover identity, content, classification, scoping, provenance, lifecycle, and cryptographic dedupe hashing.
Source: src/mnemo/domain/memory.py
| Field | Type | Purpose | |
|---|---|---|---|
id | str | Stable identifier generated by new_id(). | |
content | str | The memory text — must be non-empty (enforced by Memory.create). | |
type | MemoryType | Classification (e.g. working-notes, default). | |
scope | Scope | Either project or global. | |
project | `str \ | None` | Project slug; forced to GLOBAL_PROJECT when scope is global. |
related_files | list[str] | File paths the memory references. | |
tags | list[str] | Free-form tags for filtering. | |
topic_key | `str \ | None` | Stable key so a memory *evolves* instead of being duplicated. |
session_id | `str \ | None` | Originating agent session. |
status | str | Lifecycle marker (active, etc.). | |
supersedes | `str \ | None` | Pointer to the prior memory this one replaces. |
hash | str | Content hash for dedupe. | |
created_at / updated_at | str | ISO timestamps from now(). |
Construction is funneled through Memory.create(...), which validates non-empty content, coerces scope and type, and stamps created_at/updated_at. The hash field is computed from content so duplicate writes can be detected cheaply. Source: src/mnemo/domain/memory.py
Type, Scope, and Project
MemoryType is an enum of categories an agent can attach to a memory (working notes, decisions, bugs, etc.). The default is working-notes. Source: src/mnemo/domain/constants.py
Scope is binary: project (belongs to a single project) or global (applies everywhere). When Scope.GLOBAL is chosen, the project is forced to the sentinel __global__ so a global memory is never mistaken for a project memory. Source: src/mnemo/domain/memory.py
Policy Caps
Two caps protect the store from bloat and dilute retrieval:
DEFAULT_MAX_MEMORY_TOKENS = 512— the policy cap on memory length. Source: src/mnemo/domain/constants.pyDEFAULT_RECALL_LIMIT = 15— how many memoriesrecallretrieves to ground an answer. Kept small because recall synthesizes a focused answer, not a digest. Source: src/mnemo/domain/constants.py
The effective limit on a memory's size is the stricter of the two (the policy cap and the embedder's window). Source: src/mnemo/domain/constants.py
Proposed Memory (Write Path Internals)
The pipeline doesn't write Memory directly; it produces a ProposedMemory — the merged, summarized, or insight record that an executor later materializes into a real Memory. Its fields mirror the write path: content, type, project, scope, related files, tags. Source: src/mnemo/application/pipeline/proposed_memory.py
MCP Tool Surface
The MCP server is the primary agent-facing entry point. Each tool maps to one use case in the application layer. The composition root wires them together at startup. Source: src/mnemo/infrastructure/composition.py
| Tool | Use Case | Purpose |
|---|---|---|
remember | RememberMemoryUseCaseImpl | Store a single memory; validates, embeds, inserts. No LLM on the write path. |
search | SearchMemoryUseCaseImpl | Query by meaning within a scope; project gate checks permission. |
browse | (MCP browse tool) | List by filter (type, tags, files), newest first — no query, no relevance score. |
recall | RecallProjectUseCaseImpl | Answer a question from a project's memories; opt-in LLM read tool. |
delete | DeleteMemoryUseCaseImpl | Delete specific memory ids. |
clear / purge | DeleteMemoryUseCaseImpl | Whole-project clear and full reset (memories + project registry). |
| project tools | Create / Update / Delete / List | First-class project registry added in 0.3.0. |
Sources: src/mnemo/application/use_cases/remember_memory.py, src/mnemo/application/use_cases/search_memory.py, src/mnemo/application/use_cases/delete_memory.py, src/mnemo/application/use_cases/recall_project.py, src/mnemo/adapters/mcp/server.py
The `browse` Tool (Query-less Read)
browse is the newest read shape — added in 0.3.0 alongside recall. It accepts scope, project, type, tags, related_files, created_after, and limit (1–100), and returns memories ordered by recency with no relevance score. It is the right choice for category retrieval ("all type=decision in this project") where a semantic query would only bias the order. The MCP server emits {id, type, scope, project, content, related_files, created_at} per hit. Source: src/mnemo/adapters/mcp/server.py
The `recall` Tool (Opt-in LLM Read)
recall is the one opt-in LLM read tool. It runs a pipeline once: the embedder retrieves the memories most relevant to the query (the relevance step), an optional reranker re-orders them, and an optional generator (Gemma 4 E2B-it QAT GGUF) synthesizes a concise, grounded answer — refusing with No relevant memories found. when none apply, never using outside knowledge. With the generator off, recall returns the structured grouping. Source: src/mnemo/application/use_cases/recall_project.py
The pipeline has two explicit stages visible in the codebase: GatherStage (retrieve query-relevant active memories, project-scoped plus globally-scoped, capped at the request limit) and AssembleStage (group gathered memories by type into RecallSections, pure, no model). Source: src/mnemo/application/recall/gather_stage.py, src/mnemo/application/recall/assemble_stage.py
The output is a RecallBundle: project, an ordered tuple of RecallSection(type, memories), and an optional summary filled only when the generator ran. The structured grouping carries the structure on its own when no summary is present. Source: src/mnemo/application/recall/bundle.py
CLI Mirror
Each MCP tool has a CLI counterpart so the service is operable without an agent. The recall CLI command accepts project, query, and --limit (MNEMO_GENERATOR synthesizes the summary; MNEMO_RERANKER can order by query; missing extras raise an actionable RuntimeError asking to install mnemo[recall] or set the model to off). The bundle prints as JSON; model timing and RAM go to the logs via MNEMO_LOG_LEVEL. Source: src/mnemo/adapters/cli/app.py
Architecture at a Glance
flowchart LR
Agent["AI Coding Agent<br/>(MCP client)"] -->|MCP tools| Server["MCP Server<br/>(adapters/mcp/server.py)"]
CLI["CLI<br/>(adapters/cli/app.py)"] --> Container
Server --> Container["Composition Root<br/>(infrastructure/composition.py)"]
Container --> R["Remember"]
Container --> S["Search"]
Container --> B["Browse"]
Container --> Rec["Recall<br/>(pipeline)"]
Container --> D["Delete / Clear / Purge"]
Container --> P["Project Registry"]
R --> Memory[("Memory<br/>domain/memory.py")]
S --> Memory
B --> Memory
Rec --> Memory
D --> Memory
P --> MemoryThe composition root selects the embedder (hash for offline/tests, pplx as the default pplx-embed-v1-0.6b int8 ONNX model), builds the reranker and generator from llmkit ports, and assembles every use case with its dependencies. Source: src/mnemo/infrastructure/composition.py, src/llmkit/types.py
Configuration
All configuration is read from MNEMO_* environment variables at startup. Values are parsed at the config boundary with named, range-checked errors so a typo or out-of-range value surfaces immediately rather than as an opaque crash later. Integers use _int_env; floats use _float_env (with an exclusive_min mode for values that must be strictly greater than zero, e.g. idle-check intervals). Source: src/mnemo/infrastructure/config.py
Key knobs that affect the tool surface:
| Env Var | Effect |
|---|---|
MNEMO_EMBEDDER | hash (offline) or pplx (default; ONNX int8). |
MNEMO_GENERATOR | Generator model id for recall; off disables LLM synthesis. |
MNEMO_RERANKER | Reranker id; orders gathered memories by the query. |
MNEMO_RERANK_TOP_K | Cap on reranked candidates. |
MNEMO_GENERATOR_MAX_TOKENS | Token budget for the synthesized answer. |
MNEMO_MAX_MEMORY_TOKENS | Policy cap on a memory's length (default 512). |
MNEMO_RECALL_LIMIT | Default memories retrieved by recall (default 15). |
MNEMO_LOG_LEVEL | Where model timing/RAM is logged. |
Sources: src/mnemo/infrastructure/composition.py, src/mnemo/infrastructure/config.py, src/mnemo/domain/constants.py
Common Failure Modes
- Empty content —
Memory.createraisesValueError("memory content is empty; store text worth remembering"). The CLI'srecallcommand catchesValueErrorand surfaces it astyper.BadParameter. Source: src/mnemo/domain/memory.py, src/mnemo/adapters/cli/app.py - Missing recall extras — When
MNEMO_GENERATORorMNEMO_RERANKERpoints at an adapter whose dependency isn't installed, the CLI catchesRuntimeErrorand prints the actionable install message instead of a traceback, exiting with code 1. Source: src/mnemo/adapters/cli/app.py - Bad config — A non-integer or out-of-range
MNEMO_*value raises a namedValueErrorat startup, before any agent request is served. Source: src/mnemo/infrastructure/config.py - Over-window memory — The
HashEmbedderrejects inputs that exceedmax_input; tests pass a small value to exercise this path. Source: src/mnemo/adapters/embedding/hash_embedder.py
See Also
- README — high-level positioning and operational characteristics.
- Recall pipeline —
GatherStage→ optionalRerankStage→AssembleStage→ optional generator stage; output isRecallBundle. - Project registry — first-class projects added in 0.3.0, with FK cascade deleting memories on project removal.
- Embedders —
hash(offline, lexical only) vspplx(default, semantic, ONNX int8).
Sources: src/mnemo/application/use_cases/remember_memory.py, src/mnemo/application/use_cases/search_memory.py, src/mnemo/application/use_cases/delete_memory.py, src/mnemo/application/use_cases/recall_project.py, src/mnemo/adapters/mcp/server.py
Recall Pipeline, Embedders & LLM Kit Integration
Related topics: Memory Domain Model & MCP Tool Surface, Deployment, Configuration, Storage & Client Wiring
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Memory Domain Model & MCP Tool Surface, Deployment, Configuration, Storage & Client Wiring
Recall Pipeline, Embedders & LLM Kit Integration
Overview
The recall subsystem is mnemo's opt-in LLM read tool: instead of returning raw hits, it answers a question against a project's stored memories. The pipeline lives under src/mnemo/application/recall/ and is built by build_recall_pipeline at composition time. Refinements (reranker, generator) are wired in only when their adapters are configured — "each refinement earns its place when its model is configured" (builder.py). The same use case backs both the recall MCP tool (server.py) and the mnemo recall CLI command (app.py).
Pipeline Architecture
The recall pipeline is a Pipeline[RecallRequest, RecallBundle] assembled from up to four stages. Each stage declares requires and provides slots over a shared PipelineContext.
| Stage | Key | Inputs (slots) | Output | Optional? |
|---|---|---|---|---|
| Gather | gather | RECALL_REQUEST | GATHERED (tuple of Memory) | No — always present |
| Rerank | rerank | GATHERED | GATHERED (re-ordered) | Yes, when a Reranker is configured |
| Assemble | assemble | GATHERED, RECALL_REQUEST | RECALL (RecallBundle) | No — always present |
| Synthesize | synthesize | RECALL, RECALL_REQUEST | RECALL (enriched with summary) | Yes, when a Generator is configured |
Source: builder.py, gather_stage.py, assemble_stage.py, synthesize_stage.py.
flowchart LR
REQ[RecallRequest] --> G[gather]
G -->|GATHERED| R{reranker?}
R -->|yes| RR[rerank]
R -->|no| A[assemble]
RR --> A
A -->|RecallBundle| S{generator?}
S -->|yes| SY[synthesize]
S -->|no| OUT[Bundle: sections only]
SY --> OUT2[Bundle: sections + summary]Stage Details
- Gather embeds the query with the configured
TextEmbedder, then runs the same hybrid retrieval (dense + lexical) used bysearch. It is project-scoped, capped atrequest.limit, and includes globally-scoped memories. Source:gather_stage.py. - Assemble is pure (no model) — it groups
GATHEREDmemories intoRecallSectioninstances keyed byMemoryType.value, producing aRecallBundlewhosetotalis the sum of section sizes. Source:assemble_stage.py,bundle.py. - Synthesize is a no-op on an empty bundle and otherwise builds the prompt via
build_synthesis_promptand callsgenerator.generate(prompt, max_tokens=...), then returns the bundle withsummaryfilled. Source:synthesize_stage.py.
The output type, RecallBundle, is a frozen dataclass carrying project, an ordered tuple[RecallSection, ...], and the optional summary: str | None. Source: bundle.py.
Embedders
TextEmbedder is a port consumed by GatherStage. Two implementations are wired in composition.py:
pplx(default) —pplx-embed-v1-0.6bint8 ONNX, served CPU-side viallmkit. Source:composition.py.hash— a dependency-free, deterministic bag-of-tokens hashing embedder. "Lexical only: it captures token overlap, not meaning." It exposesdim,max_input,count_tokens, andencode; the offline default has effectively unlimited input, while tests pass a smallmax_inputto exercise the over-window reject. Source:hash_embedder.py.
Selection is driven by config.embedder; the container lazily imports the chosen adapter so unused backends stay out of the import graph. Source: composition.py.
LLM Kit Integration
Recall offloads the optional model steps to llmkit through narrow ports. Two ports are imported by the recall builder:
llmkit.ports.reranker.Reranker— injected when a reranker is configured; reordersGATHEREDby query relevance beforeAssembleStage. Source:builder.py.llmkit.ports.generator.Generator— injected when a generator is configured; produces the prosesummary. Source:synthesize_stage.py.
recall_project.py accepts both ports via its constructor, defaulting rerank_top_k=20 and generator_max_tokens=512, and exposes the execute(*, project, query, limit) entry point that returns a RecallBundle. Source: use_cases/recall_project.py. The Protocol surface for tests/clients is RecallProjectUseCase in interfaces/recall_project.py.
The prompt itself is the prompt-level refusal guard: it tells the model to answer using ONLY the supplied memories and to reply exactly No relevant memories found. when none apply. The prompt lays memories out grouped by type and restates the question after them so attention re-anchors on it. Source: synthesis_prompt.py. Adjacent llmkit ports include Nli for natural-language inference over a text pair (llmkit/ports/nli.py).
Configuration, Surfaces, and Failure Modes
The container builds the pipeline once at startup, threading config.rerank_top_k and config.generator_max_tokens into RecallProjectUseCaseImpl. When the optional mnemo[recall] extras are missing — or when a model is set to off — the adapter raises an RuntimeError; the CLI surfaces it as a single message and exits 1, rather than dumping a traceback. Source: composition.py, app.py.
Two user-facing surfaces call into the same use case:
- MCP —
recall(project, query)returns{project, summary, sources: [{id, type}, ...]}; only IDs and types are returned, keeping the answer light on caller context. Source:server.py. - CLI —
mnemo recall <project> <query> --limit/-l Nprints the bundle as JSON;DEFAULT_RECALL_LIMIT(cap1..200) is applied. Source:app.py.
The 0.3.0 — recall release introduces this feature: a project's memories can now be answered as a question (opt-in LLM read), with Gemma 4 E2B-it (official QAT GGUF) as the synthesizer — it replies "No relevant memories found." when nothing applies and never uses outside knowledge.
See Also
- Mnemo — Memory Domain Model —
Memory,MemoryType, andScopedefinitions (domain/memory.py). - Mnemo — Pipeline Infrastructure —
Pipeline,PipelineContext, and theSlotpattern consumed by recall stages. - Mnemo — MCP Server — tool-level surface that exposes
recallto coding agents (adapters/mcp/server.py).
Source: https://github.com/arttttt/mnemo / Human Manual
Deployment, Configuration, Storage & Client Wiring
Related topics: Overview, Principles & System Architecture, Recall Pipeline, Embedders & LLM Kit Integration
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview, Principles & System Architecture, Recall Pipeline, Embedders & LLM Kit Integration
Deployment, Configuration, Storage & Client Wiring
Overview
mnemo is a local-first memory service for AI coding agents. The runtime is composed at a single seam — a composition root that turns a parsed configuration into fully wired use cases — and is exposed through three surfaces: a Typer CLI, an MCP server, and a set of client wirings that inject the MCP connector into a host agent's configuration. This page covers how the application is configured, where state lives, and how the connector reaches the agents that consume it.
Source: src/mnemo/infrastructure/composition.py Source: src/mnemo/adapters/cli/app.py
Configuration
The Composition Root
A single build_container() function is the seam that turns a parsed Config into a fully wired use-case container. Every long-lived dependency — the repository, the embedder, the project gate, and the use cases themselves — is constructed here and exposed for the CLI and MCP server to call. The recall use case is conditionally augmented with an optional reranker and generator built from the same configuration.
recall=RecallProjectUseCaseImpl(
repository,
embedder,
reranker=_build_reranker(config),
generator=_build_generator(config),
rerank_top_k=config.rerank_top_k,
generator_max_tokens=config.generator_max_tokens,
),
Source: src/mnemo/infrastructure/composition.py
Embedder Selection
The embedder is chosen by name from the configuration. The pplx value routes through llmkit.build.build_embedder (perplexity-ai/pplx-embed-v1-0.6b int8 ONNX, the default), and a hash option is available as a dependency-free lexical fallback. The HashEmbedder is deterministic, captures only token overlap, and is described in its module docstring as "NOT semantic — offline/tests/skeleton."
Source: src/mnemo/infrastructure/composition.py Source: src/mnemo/adapters/embedding/hash_embedder.py
Model Loading Policy
A ModelConfig (in the bundled llmkit library) is a small value object that names a model source and a residency policy. The source is either an OnnxSource (encoder-only, used for the embedder) or a GgufSource (used by llama.cpp for the generator); residency controls load/unload. The choice is made in code — a consumer reads its own config and passes it in — rather than driven by the environment directly.
Source: src/llmkit/config.py
Storage Model
The Memory Entity
Memory is the persistence record, built through Memory.create() so that invariants are checked at construction. Required fields include content, type (a MemoryType), and scope (a Scope). When scope is GLOBAL, the project is forced to the GLOBAL_PROJECT sentinel — global memories are projectless by design and apply to every project.
Source: src/mnemo/domain/memory.py
Projects and the Gate
Projects are registered entities with a description (later used for tier-2 semantic near-match). Update is gated: an unregistered project raises UnknownProject with near-match candidates, mirroring the gate behaviour used on every project-scoped read or write.
Source: src/mnemo/application/use_cases/update_project.py Source: src/mnemo/adapters/cli/app.py
Recall Output
The recall path produces a RecallBundle carrying the project slug, a tuple of RecallSections grouped by memory type, and an optional summary field. The summary is filled only when a generator stage ran; otherwise the structured grouping carries the answer on its own. The synthesis stage is a no-op on an empty bundle — there is nothing to summarize.
Source: src/mnemo/application/recall/bundle.py Source: src/mnemo/application/recall/synthesize_stage.py
Client Wiring
The Installer Port
Wiring mnemo into a host agent is modeled as a ClientInstaller protocol. Every client integration — each agent has its own mcp add CLI or its own JSON config schema — implements four operations:
| Member | Purpose |
|---|---|
name (property) | The client's slug used on the command line (e.g. 'cursor') |
detect() | Returns True if the client appears installed on this machine |
describe() | A one-line description of what wiring would do (for --dry-run / prompts) |
install() | Wire the connector into the client (idempotent) |
The shared shape lets mnemo support many agents with their own integration mechanics while keeping the top-level command surface uniform.
Source: src/mnemo/adapters/setup/client_installer.py
JSON Config Utility
For clients that store wiring as a JSON file, a small utility pair reads, merges, and writes without disturbing unrelated keys. A missing or empty file yields an empty dict; writes mkdir -p the parent directory and emit indented UTF-8 JSON with a trailing newline.
def load_json(path: Path) -> dict:
if not path.exists() or path.stat().st_size == 0:
return {}
return json.loads(path.read_text())
def save_json(path: Path, data: dict) -> None:
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(json.dumps(data, indent=2, ensure_ascii=False) + "\n")
Source: src/mnemo/adapters/setup/json_config.py
Runtime Entry Points
flowchart LR
A[Config] --> B[build_container]
B --> C[Embedder]
B --> D[Repository]
B --> E[Project Gate]
C --> F[Use Cases]
D --> F
E --> F
F --> G[CLI / MCP Server]
H[ClientInstaller] --> I[Host Agent Config]
F --> HCLI
The CLI is a typer.Typer application registered with no_args_is_help=True. Visible commands include store (save a memory) and recall (the CLI view of the recall MCP tool). recall is the only command that may invoke the optional generator; the source notes that model timing and RAM go to the logs (controlled by MNEMO_LOG_LEVEL) and the bundle prints as JSON.
Source: src/mnemo/adapters/cli/app.py
MCP Server
The MCP server exposes store and search tools (and others) with Pydantic-annotated parameters. The store tool returns a dict with {id, status}, where status is one of created, duplicate (identical content already stored), or superseded (a topic_key upsert). No LLM runs on write.
Source: src/mnemo/adapters/mcp/server.py
Common Failure Modes
- Generator/reranker extras missing — the optional stages raise an actionable
RuntimeErrortelling the user to installmnemo[recall]or set the model to"off". The CLI surfaces the message and exits with code 1 rather than printing a traceback. Source: src/mnemo/adapters/cli/app.py - Invalid project — write/update use cases raise
UnknownProjectwith near-match candidates to help the caller correct the slug. Source: src/mnemo/application/use_cases/update_project.py - Empty memory content — rejected at
Memory.createwith a clearValueErrorrather than persisting an empty record. Source: src/mnemo/domain/memory.py
See Also
- Recall pipeline stages
- Embedder and model integration
- Project gate semantics
- MCP tool reference
Source: https://github.com/arttttt/mnemo / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
Doramagic Pitfall Log
Found 7 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Configuration risk - Configuration risk requires verification.
1. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.host_targets | https://github.com/arttttt/mnemo
2. Capability evidence risk: Capability evidence risk requires verification
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | https://github.com/arttttt/mnemo
3. Maintenance risk: Maintenance risk requires verification
- Severity: medium
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/arttttt/mnemo
4. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: downstream_validation.risk_items | https://github.com/arttttt/mnemo
5. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: risks.scoring_risks | https://github.com/arttttt/mnemo
6. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/arttttt/mnemo
7. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/arttttt/mnemo
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using mnemo with real data or production workflows.
- 0.3.0 — recall - github / github_release
- 0.2.0 — on-demand lifecycle - github / github_release
- Configuration risk requires verification - GitHub / issue
Source: Project Pack community evidence and pitfall evidence