Doramagic Project Pack · Human Manual

mnemo

Local, on-demand memory for AI coding agents (MCP). Strictly offline; built for 10+ agents on a 16GB machine.

Overview, Principles & System Architecture

Related topics: Memory Domain Model & MCP Tool Surface, Recall Pipeline, Embedders & LLM Kit Integration, Deployment, Configuration, Storage & Client Wiring

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Domain layer

Continue reading this section for the full explanation and source context.

Section Application layer

Continue reading this section for the full explanation and source context.

Section Infrastructure and adapters

Continue reading this section for the full explanation and source context.

Related topics: Memory Domain Model & MCP Tool Surface, Recall Pipeline, Embedders & LLM Kit Integration, Deployment, Configuration, Storage & Client Wiring

Overview, Principles & System Architecture

Purpose and Scope

mnemo is a local memory store for AI coding agents. The CLI entry point is built around the tagline *"local memory for AI coding agents. Store and search typed memories locally."* (src/mnemo/adapters/cli/app.py:11-12).

The system is organized so that writes never invoke an LLM — memories are stored as plain typed records — while reads can opt in to an LLM via the recall tool, which was added in the 0.3.0 release. As stated by the MCP remember tool: *"No LLM runs on write."* (src/mnemo/adapters/mcp/server.py). When recall is exercised, an LLM may synthesize a grounded answer but must reply No relevant memories found. rather than drawing on outside knowledge.

The codebase is split between two libraries:

PackageRole
mnemoDomain, application, infrastructure, CLI and MCP adapters
llmkitModel loading, residency, and capability ports (Embedder, Reranker, Generator, Nli)

llmkit exposes only narrow capability ports so the rest of mnemo never imports an engine (src/llmkit/ports/embedder.py, src/llmkit/ports/nli.py).

Architectural Layers

mnemo follows a hexagonal (ports-and-adapters) layering. The composition root at src/mnemo/infrastructure/composition.py wires concrete adapters into use-case constructors and produces the container that the CLI and MCP server call.

flowchart TB
    subgraph Adapters
        CLI["CLI (Typer)<br/>src/mnemo/adapters/cli/app.py"]
        MCP["MCP Server<br/>src/mnemo/adapters/mcp/server.py"]
        HASH["HashEmbedder<br/>src/mnemo/adapters/embedding/hash_embedder.py"]
    end
    subgraph Infrastructure
        COMP["Composition root<br/>src/mnemo/infrastructure/composition.py"]
    end
    subgraph Application
        UC["Use cases<br/>(remember, search, browse, recall, project mgmt)"]
        PIPE["Recall pipeline<br/>(gather → assemble → optional rerank/generate)"]
    end
    subgraph Domain
        MEM["Memory entity<br/>src/mnemo/domain/memory.py"]
        PROJ["Project entity"]
    end
    subgraph llmkit
        EMB["Embedder port"]
        RER["Reranker port"]
        GEN["Generator port"]
    end
    CLI --> COMP
    MCP --> COMP
    COMP --> UC
    UC --> PIPE
    UC --> MEM
    PIPE --> EMB
    PIPE --> RER
    PIPE --> GEN
    HASH -.implements.-> EMB

Domain layer

The Memory aggregate is the only write-side entity that matters. It is built through Memory.create(...) so invariants (non-empty content, typed enum, scope/project resolution) are enforced at the factory rather than scattered through the application layer (src/mnemo/domain/memory.py). The dataclass carries identity (id), a content hash for dedup, supersession links, and scope/project/type discriminators used for filtering.

Application layer

Use cases are protocol-based; for example CreateProjectUseCase, UpdateProjectUseCase, DeleteProjectUseCase, ListProjectsUseCase, and BrowseMemoryUseCase are defined as Protocol classes in src/mnemo/application/use_cases/interfaces/. Their concrete implementations live next to them as *UseCaseImpl classes — see UpdateProjectUseCaseImpl in src/mnemo/application/use_cases/update_project.py.

A Retrieval value object is the contract between use cases and the store. It carries the structured SearchCriteria, a page size, and a query representation: text feeds the lexical (FTS) leg, vector the dense leg. Construction rejects the illegal state where exactly one of text/vector is set — *"a search carries BOTH; a filter-only browse (recency order) carries NEITHER. Exactly one without the other is rejected at construction"* (src/mnemo/application/retrieval.py).

Infrastructure and adapters

build_container() in src/mnemo/infrastructure/composition.py is the only place where concrete classes are selected. It picks the embedder by config (hash for the offline tests skeleton; pplx for the default ONNX model), and constructs optional Reranker and Generator adapters for the recall pipeline. Selecting off for any of these yields a structured-only recall.

The Recall Pipeline

RecallProjectUseCaseImpl.execute(...) is the single entry point that runs the read pipeline (src/mnemo/application/use_cases/recall_project.py). The pipeline is built by build_recall_pipeline(...) and is composed of stages declared in mnemo.application.recall:

  • Gather — the embedder encodes the query, the repository returns the top-k hybrid hits.
  • Assemble — a pure, model-free stage that groups the gathered memories by type into RecallSections and produces a RecallBundle (src/mnemo/application/recall/assemble_stage.py, src/mnemo/application/recall/bundle.py). The bundle is published on the RECALL slot.
  • Rerank *(optional)* — a Reranker from llmkit reorders the top-k.
  • Generate *(optional)* — a Generator from llmkit writes the prose summary; otherwise the structured grouping (and reranked order) is the answer.

The CLI prints the bundle as JSON: *"recall ... the bundle prints as JSON."* (src/mnemo/adapters/cli/app.py).

Principles

A few design principles follow directly from the source:

  1. No LLM on write. Memories are typed records; an LLM is opt-in only on the read path (src/mnemo/adapters/mcp/server.py).
  2. Engine-agnostic application code. Application types import only llmkit.ports.* and value types; concrete engines (ONNX, llama.cpp/GGUF) are confined to the composition root and the llmkit runtime (src/llmkit/ports/embedder.py, src/llmkit/config.py).
  3. Deterministic offline mode. HashEmbedder is a *"Deterministic, dependency-free embedder. NOT semantic — offline/tests/skeleton."* so the core can be tested without model downloads (src/mnemo/adapters/embedding/hash_embedder.py).
  4. Two-leg hybrid ranking. A Retrieval must carry both text and vector for a search, or neither for a browse — half-signal requests are rejected at construction (src/mnemo/application/retrieval.py).
  5. Grounded answers. When a generator is configured, it is constrained to the gathered memories; if none apply it must say so rather than invent (src/mnemo/application/use_cases/recall_project.py).

See Also

  • Memory Model & Types — Memory, MemoryType, Scope, topic_key, supersession.
  • Projects & Scoping — project gate, near-match candidates, global vs. project scope.
  • Recall Pipeline Deep Dive — gather/assemble/rerank/generate stages and the RECALL slot.
  • Hybrid Search: FTS + Vectors — Retrieval, SearchCriteria, and the lexical/dense leg contract.
  • Embedders & llmkit — HashEmbedder, the pplx ONNX embedder, and engine residency.

Source: https://github.com/arttttt/mnemo / Human Manual

Memory Domain Model & MCP Tool Surface

Related topics: Overview, Principles & System Architecture, Recall Pipeline, Embedders & LLM Kit Integration

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Entity & Invariants

Continue reading this section for the full explanation and source context.

Section Type, Scope, and Project

Continue reading this section for the full explanation and source context.

Section Policy Caps

Continue reading this section for the full explanation and source context.

Related topics: Overview, Principles & System Architecture, Recall Pipeline, Embedders & LLM Kit Integration

Memory Domain Model & MCP Tool Surface

Overview

mnemo is a local, on-demand memory layer for AI coding agents (Claude Code, Cursor, Windsurf, any MCP client). It remembers decisions, bugs, progress, and rules across sessions without any cloud calls — embeddings and LLMs run only on the host machine. The system is built around a single shared service process that ten or more agents can talk to concurrently via the Model Context Protocol (MCP), with a thin CLI on top.

This page documents the two foundations that everything else rests on: the memory domain model (what a memory *is*) and the MCP tool surface (what an agent can *ask* the service to do). Together they define the contract between agents and the persistent store.

Source: README.md

The `Memory` Domain Model

Entity & Invariants

The Memory is a single focused unit of text. Its fields cover identity, content, classification, scoping, provenance, lifecycle, and cryptographic dedupe hashing.

Source: src/mnemo/domain/memory.py

FieldTypePurpose
idstrStable identifier generated by new_id().
contentstrThe memory text — must be non-empty (enforced by Memory.create).
typeMemoryTypeClassification (e.g. working-notes, default).
scopeScopeEither project or global.
project`str \None`Project slug; forced to GLOBAL_PROJECT when scope is global.
related_fileslist[str]File paths the memory references.
tagslist[str]Free-form tags for filtering.
topic_key`str \None`Stable key so a memory *evolves* instead of being duplicated.
session_id`str \None`Originating agent session.
statusstrLifecycle marker (active, etc.).
supersedes`str \None`Pointer to the prior memory this one replaces.
hashstrContent hash for dedupe.
created_at / updated_atstrISO timestamps from now().

Construction is funneled through Memory.create(...), which validates non-empty content, coerces scope and type, and stamps created_at/updated_at. The hash field is computed from content so duplicate writes can be detected cheaply. Source: src/mnemo/domain/memory.py

Type, Scope, and Project

MemoryType is an enum of categories an agent can attach to a memory (working notes, decisions, bugs, etc.). The default is working-notes. Source: src/mnemo/domain/constants.py

Scope is binary: project (belongs to a single project) or global (applies everywhere). When Scope.GLOBAL is chosen, the project is forced to the sentinel __global__ so a global memory is never mistaken for a project memory. Source: src/mnemo/domain/memory.py

Policy Caps

Two caps protect the store from bloat and dilute retrieval:

  • DEFAULT_MAX_MEMORY_TOKENS = 512 — the policy cap on memory length. Source: src/mnemo/domain/constants.py
  • DEFAULT_RECALL_LIMIT = 15 — how many memories recall retrieves to ground an answer. Kept small because recall synthesizes a focused answer, not a digest. Source: src/mnemo/domain/constants.py

The effective limit on a memory's size is the stricter of the two (the policy cap and the embedder's window). Source: src/mnemo/domain/constants.py

Proposed Memory (Write Path Internals)

The pipeline doesn't write Memory directly; it produces a ProposedMemory — the merged, summarized, or insight record that an executor later materializes into a real Memory. Its fields mirror the write path: content, type, project, scope, related files, tags. Source: src/mnemo/application/pipeline/proposed_memory.py

MCP Tool Surface

The MCP server is the primary agent-facing entry point. Each tool maps to one use case in the application layer. The composition root wires them together at startup. Source: src/mnemo/infrastructure/composition.py

ToolUse CasePurpose
rememberRememberMemoryUseCaseImplStore a single memory; validates, embeds, inserts. No LLM on the write path.
searchSearchMemoryUseCaseImplQuery by meaning within a scope; project gate checks permission.
browse(MCP browse tool)List by filter (type, tags, files), newest first — no query, no relevance score.
recallRecallProjectUseCaseImplAnswer a question from a project's memories; opt-in LLM read tool.
deleteDeleteMemoryUseCaseImplDelete specific memory ids.
clear / purgeDeleteMemoryUseCaseImplWhole-project clear and full reset (memories + project registry).
project toolsCreate / Update / Delete / ListFirst-class project registry added in 0.3.0.

Sources: src/mnemo/application/use_cases/remember_memory.py, src/mnemo/application/use_cases/search_memory.py, src/mnemo/application/use_cases/delete_memory.py, src/mnemo/application/use_cases/recall_project.py, src/mnemo/adapters/mcp/server.py

The `browse` Tool (Query-less Read)

browse is the newest read shape — added in 0.3.0 alongside recall. It accepts scope, project, type, tags, related_files, created_after, and limit (1–100), and returns memories ordered by recency with no relevance score. It is the right choice for category retrieval ("all type=decision in this project") where a semantic query would only bias the order. The MCP server emits {id, type, scope, project, content, related_files, created_at} per hit. Source: src/mnemo/adapters/mcp/server.py

The `recall` Tool (Opt-in LLM Read)

recall is the one opt-in LLM read tool. It runs a pipeline once: the embedder retrieves the memories most relevant to the query (the relevance step), an optional reranker re-orders them, and an optional generator (Gemma 4 E2B-it QAT GGUF) synthesizes a concise, grounded answer — refusing with No relevant memories found. when none apply, never using outside knowledge. With the generator off, recall returns the structured grouping. Source: src/mnemo/application/use_cases/recall_project.py

The pipeline has two explicit stages visible in the codebase: GatherStage (retrieve query-relevant active memories, project-scoped plus globally-scoped, capped at the request limit) and AssembleStage (group gathered memories by type into RecallSections, pure, no model). Source: src/mnemo/application/recall/gather_stage.py, src/mnemo/application/recall/assemble_stage.py

The output is a RecallBundle: project, an ordered tuple of RecallSection(type, memories), and an optional summary filled only when the generator ran. The structured grouping carries the structure on its own when no summary is present. Source: src/mnemo/application/recall/bundle.py

CLI Mirror

Each MCP tool has a CLI counterpart so the service is operable without an agent. The recall CLI command accepts project, query, and --limit (MNEMO_GENERATOR synthesizes the summary; MNEMO_RERANKER can order by query; missing extras raise an actionable RuntimeError asking to install mnemo[recall] or set the model to off). The bundle prints as JSON; model timing and RAM go to the logs via MNEMO_LOG_LEVEL. Source: src/mnemo/adapters/cli/app.py

Architecture at a Glance

flowchart LR
    Agent["AI Coding Agent<br/>(MCP client)"] -->|MCP tools| Server["MCP Server<br/>(adapters/mcp/server.py)"]
    CLI["CLI<br/>(adapters/cli/app.py)"] --> Container
    Server --> Container["Composition Root<br/>(infrastructure/composition.py)"]
    Container --> R["Remember"]
    Container --> S["Search"]
    Container --> B["Browse"]
    Container --> Rec["Recall<br/>(pipeline)"]
    Container --> D["Delete / Clear / Purge"]
    Container --> P["Project Registry"]
    R --> Memory[("Memory<br/>domain/memory.py")]
    S --> Memory
    B --> Memory
    Rec --> Memory
    D --> Memory
    P --> Memory

The composition root selects the embedder (hash for offline/tests, pplx as the default pplx-embed-v1-0.6b int8 ONNX model), builds the reranker and generator from llmkit ports, and assembles every use case with its dependencies. Source: src/mnemo/infrastructure/composition.py, src/llmkit/types.py

Configuration

All configuration is read from MNEMO_* environment variables at startup. Values are parsed at the config boundary with named, range-checked errors so a typo or out-of-range value surfaces immediately rather than as an opaque crash later. Integers use _int_env; floats use _float_env (with an exclusive_min mode for values that must be strictly greater than zero, e.g. idle-check intervals). Source: src/mnemo/infrastructure/config.py

Key knobs that affect the tool surface:

Env VarEffect
MNEMO_EMBEDDERhash (offline) or pplx (default; ONNX int8).
MNEMO_GENERATORGenerator model id for recall; off disables LLM synthesis.
MNEMO_RERANKERReranker id; orders gathered memories by the query.
MNEMO_RERANK_TOP_KCap on reranked candidates.
MNEMO_GENERATOR_MAX_TOKENSToken budget for the synthesized answer.
MNEMO_MAX_MEMORY_TOKENSPolicy cap on a memory's length (default 512).
MNEMO_RECALL_LIMITDefault memories retrieved by recall (default 15).
MNEMO_LOG_LEVELWhere model timing/RAM is logged.

Sources: src/mnemo/infrastructure/composition.py, src/mnemo/infrastructure/config.py, src/mnemo/domain/constants.py

Common Failure Modes

  • Empty contentMemory.create raises ValueError("memory content is empty; store text worth remembering"). The CLI's recall command catches ValueError and surfaces it as typer.BadParameter. Source: src/mnemo/domain/memory.py, src/mnemo/adapters/cli/app.py
  • Missing recall extras — When MNEMO_GENERATOR or MNEMO_RERANKER points at an adapter whose dependency isn't installed, the CLI catches RuntimeError and prints the actionable install message instead of a traceback, exiting with code 1. Source: src/mnemo/adapters/cli/app.py
  • Bad config — A non-integer or out-of-range MNEMO_* value raises a named ValueError at startup, before any agent request is served. Source: src/mnemo/infrastructure/config.py
  • Over-window memory — The HashEmbedder rejects inputs that exceed max_input; tests pass a small value to exercise this path. Source: src/mnemo/adapters/embedding/hash_embedder.py

See Also

  • README — high-level positioning and operational characteristics.
  • Recall pipeline — GatherStage → optional RerankStageAssembleStage → optional generator stage; output is RecallBundle.
  • Project registry — first-class projects added in 0.3.0, with FK cascade deleting memories on project removal.
  • Embedders — hash (offline, lexical only) vs pplx (default, semantic, ONNX int8).

Sources: src/mnemo/application/use_cases/remember_memory.py, src/mnemo/application/use_cases/search_memory.py, src/mnemo/application/use_cases/delete_memory.py, src/mnemo/application/use_cases/recall_project.py, src/mnemo/adapters/mcp/server.py

Recall Pipeline, Embedders & LLM Kit Integration

Related topics: Memory Domain Model & MCP Tool Surface, Deployment, Configuration, Storage & Client Wiring

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Stage Details

Continue reading this section for the full explanation and source context.

Related topics: Memory Domain Model & MCP Tool Surface, Deployment, Configuration, Storage & Client Wiring

Recall Pipeline, Embedders & LLM Kit Integration

Overview

The recall subsystem is mnemo's opt-in LLM read tool: instead of returning raw hits, it answers a question against a project's stored memories. The pipeline lives under src/mnemo/application/recall/ and is built by build_recall_pipeline at composition time. Refinements (reranker, generator) are wired in only when their adapters are configured — "each refinement earns its place when its model is configured" (builder.py). The same use case backs both the recall MCP tool (server.py) and the mnemo recall CLI command (app.py).

Pipeline Architecture

The recall pipeline is a Pipeline[RecallRequest, RecallBundle] assembled from up to four stages. Each stage declares requires and provides slots over a shared PipelineContext.

StageKeyInputs (slots)OutputOptional?
GathergatherRECALL_REQUESTGATHERED (tuple of Memory)No — always present
RerankrerankGATHEREDGATHERED (re-ordered)Yes, when a Reranker is configured
AssembleassembleGATHERED, RECALL_REQUESTRECALL (RecallBundle)No — always present
SynthesizesynthesizeRECALL, RECALL_REQUESTRECALL (enriched with summary)Yes, when a Generator is configured

Source: builder.py, gather_stage.py, assemble_stage.py, synthesize_stage.py.

flowchart LR
    REQ[RecallRequest] --> G[gather]
    G -->|GATHERED| R{reranker?}
    R -->|yes| RR[rerank]
    R -->|no| A[assemble]
    RR --> A
    A -->|RecallBundle| S{generator?}
    S -->|yes| SY[synthesize]
    S -->|no| OUT[Bundle: sections only]
    SY --> OUT2[Bundle: sections + summary]

Stage Details

  • Gather embeds the query with the configured TextEmbedder, then runs the same hybrid retrieval (dense + lexical) used by search. It is project-scoped, capped at request.limit, and includes globally-scoped memories. Source: gather_stage.py.
  • Assemble is pure (no model) — it groups GATHERED memories into RecallSection instances keyed by MemoryType.value, producing a RecallBundle whose total is the sum of section sizes. Source: assemble_stage.py, bundle.py.
  • Synthesize is a no-op on an empty bundle and otherwise builds the prompt via build_synthesis_prompt and calls generator.generate(prompt, max_tokens=...), then returns the bundle with summary filled. Source: synthesize_stage.py.

The output type, RecallBundle, is a frozen dataclass carrying project, an ordered tuple[RecallSection, ...], and the optional summary: str | None. Source: bundle.py.

Embedders

TextEmbedder is a port consumed by GatherStage. Two implementations are wired in composition.py:

  • pplx (default) — pplx-embed-v1-0.6b int8 ONNX, served CPU-side via llmkit. Source: composition.py.
  • hash — a dependency-free, deterministic bag-of-tokens hashing embedder. "Lexical only: it captures token overlap, not meaning." It exposes dim, max_input, count_tokens, and encode; the offline default has effectively unlimited input, while tests pass a small max_input to exercise the over-window reject. Source: hash_embedder.py.

Selection is driven by config.embedder; the container lazily imports the chosen adapter so unused backends stay out of the import graph. Source: composition.py.

LLM Kit Integration

Recall offloads the optional model steps to llmkit through narrow ports. Two ports are imported by the recall builder:

  • llmkit.ports.reranker.Reranker — injected when a reranker is configured; reorders GATHERED by query relevance before AssembleStage. Source: builder.py.
  • llmkit.ports.generator.Generator — injected when a generator is configured; produces the prose summary. Source: synthesize_stage.py.

recall_project.py accepts both ports via its constructor, defaulting rerank_top_k=20 and generator_max_tokens=512, and exposes the execute(*, project, query, limit) entry point that returns a RecallBundle. Source: use_cases/recall_project.py. The Protocol surface for tests/clients is RecallProjectUseCase in interfaces/recall_project.py.

The prompt itself is the prompt-level refusal guard: it tells the model to answer using ONLY the supplied memories and to reply exactly No relevant memories found. when none apply. The prompt lays memories out grouped by type and restates the question after them so attention re-anchors on it. Source: synthesis_prompt.py. Adjacent llmkit ports include Nli for natural-language inference over a text pair (llmkit/ports/nli.py).

Configuration, Surfaces, and Failure Modes

The container builds the pipeline once at startup, threading config.rerank_top_k and config.generator_max_tokens into RecallProjectUseCaseImpl. When the optional mnemo[recall] extras are missing — or when a model is set to off — the adapter raises an RuntimeError; the CLI surfaces it as a single message and exits 1, rather than dumping a traceback. Source: composition.py, app.py.

Two user-facing surfaces call into the same use case:

  • MCPrecall(project, query) returns {project, summary, sources: [{id, type}, ...]}; only IDs and types are returned, keeping the answer light on caller context. Source: server.py.
  • CLImnemo recall <project> <query> --limit/-l N prints the bundle as JSON; DEFAULT_RECALL_LIMIT (cap 1..200) is applied. Source: app.py.

The 0.3.0 — recall release introduces this feature: a project's memories can now be answered as a question (opt-in LLM read), with Gemma 4 E2B-it (official QAT GGUF) as the synthesizer — it replies "No relevant memories found." when nothing applies and never uses outside knowledge.

See Also

Source: https://github.com/arttttt/mnemo / Human Manual

Deployment, Configuration, Storage & Client Wiring

Related topics: Overview, Principles & System Architecture, Recall Pipeline, Embedders & LLM Kit Integration

Section Related Pages

Continue reading this section for the full explanation and source context.

Section The Composition Root

Continue reading this section for the full explanation and source context.

Section Embedder Selection

Continue reading this section for the full explanation and source context.

Section Model Loading Policy

Continue reading this section for the full explanation and source context.

Related topics: Overview, Principles & System Architecture, Recall Pipeline, Embedders & LLM Kit Integration

Deployment, Configuration, Storage & Client Wiring

Overview

mnemo is a local-first memory service for AI coding agents. The runtime is composed at a single seam — a composition root that turns a parsed configuration into fully wired use cases — and is exposed through three surfaces: a Typer CLI, an MCP server, and a set of client wirings that inject the MCP connector into a host agent's configuration. This page covers how the application is configured, where state lives, and how the connector reaches the agents that consume it.

Source: src/mnemo/infrastructure/composition.py Source: src/mnemo/adapters/cli/app.py

Configuration

The Composition Root

A single build_container() function is the seam that turns a parsed Config into a fully wired use-case container. Every long-lived dependency — the repository, the embedder, the project gate, and the use cases themselves — is constructed here and exposed for the CLI and MCP server to call. The recall use case is conditionally augmented with an optional reranker and generator built from the same configuration.

recall=RecallProjectUseCaseImpl(
    repository,
    embedder,
    reranker=_build_reranker(config),
    generator=_build_generator(config),
    rerank_top_k=config.rerank_top_k,
    generator_max_tokens=config.generator_max_tokens,
),

Source: src/mnemo/infrastructure/composition.py

Embedder Selection

The embedder is chosen by name from the configuration. The pplx value routes through llmkit.build.build_embedder (perplexity-ai/pplx-embed-v1-0.6b int8 ONNX, the default), and a hash option is available as a dependency-free lexical fallback. The HashEmbedder is deterministic, captures only token overlap, and is described in its module docstring as "NOT semantic — offline/tests/skeleton."

Source: src/mnemo/infrastructure/composition.py Source: src/mnemo/adapters/embedding/hash_embedder.py

Model Loading Policy

A ModelConfig (in the bundled llmkit library) is a small value object that names a model source and a residency policy. The source is either an OnnxSource (encoder-only, used for the embedder) or a GgufSource (used by llama.cpp for the generator); residency controls load/unload. The choice is made in code — a consumer reads its own config and passes it in — rather than driven by the environment directly.

Source: src/llmkit/config.py

Storage Model

The Memory Entity

Memory is the persistence record, built through Memory.create() so that invariants are checked at construction. Required fields include content, type (a MemoryType), and scope (a Scope). When scope is GLOBAL, the project is forced to the GLOBAL_PROJECT sentinel — global memories are projectless by design and apply to every project.

Source: src/mnemo/domain/memory.py

Projects and the Gate

Projects are registered entities with a description (later used for tier-2 semantic near-match). Update is gated: an unregistered project raises UnknownProject with near-match candidates, mirroring the gate behaviour used on every project-scoped read or write.

Source: src/mnemo/application/use_cases/update_project.py Source: src/mnemo/adapters/cli/app.py

Recall Output

The recall path produces a RecallBundle carrying the project slug, a tuple of RecallSections grouped by memory type, and an optional summary field. The summary is filled only when a generator stage ran; otherwise the structured grouping carries the answer on its own. The synthesis stage is a no-op on an empty bundle — there is nothing to summarize.

Source: src/mnemo/application/recall/bundle.py Source: src/mnemo/application/recall/synthesize_stage.py

Client Wiring

The Installer Port

Wiring mnemo into a host agent is modeled as a ClientInstaller protocol. Every client integration — each agent has its own mcp add CLI or its own JSON config schema — implements four operations:

MemberPurpose
name (property)The client's slug used on the command line (e.g. 'cursor')
detect()Returns True if the client appears installed on this machine
describe()A one-line description of what wiring would do (for --dry-run / prompts)
install()Wire the connector into the client (idempotent)

The shared shape lets mnemo support many agents with their own integration mechanics while keeping the top-level command surface uniform.

Source: src/mnemo/adapters/setup/client_installer.py

JSON Config Utility

For clients that store wiring as a JSON file, a small utility pair reads, merges, and writes without disturbing unrelated keys. A missing or empty file yields an empty dict; writes mkdir -p the parent directory and emit indented UTF-8 JSON with a trailing newline.

def load_json(path: Path) -> dict:
    if not path.exists() or path.stat().st_size == 0:
        return {}
    return json.loads(path.read_text())

def save_json(path: Path, data: dict) -> None:
    path.parent.mkdir(parents=True, exist_ok=True)
    path.write_text(json.dumps(data, indent=2, ensure_ascii=False) + "\n")

Source: src/mnemo/adapters/setup/json_config.py

Runtime Entry Points

flowchart LR
    A[Config] --> B[build_container]
    B --> C[Embedder]
    B --> D[Repository]
    B --> E[Project Gate]
    C --> F[Use Cases]
    D --> F
    E --> F
    F --> G[CLI / MCP Server]
    H[ClientInstaller] --> I[Host Agent Config]
    F --> H

CLI

The CLI is a typer.Typer application registered with no_args_is_help=True. Visible commands include store (save a memory) and recall (the CLI view of the recall MCP tool). recall is the only command that may invoke the optional generator; the source notes that model timing and RAM go to the logs (controlled by MNEMO_LOG_LEVEL) and the bundle prints as JSON.

Source: src/mnemo/adapters/cli/app.py

MCP Server

The MCP server exposes store and search tools (and others) with Pydantic-annotated parameters. The store tool returns a dict with {id, status}, where status is one of created, duplicate (identical content already stored), or superseded (a topic_key upsert). No LLM runs on write.

Source: src/mnemo/adapters/mcp/server.py

Common Failure Modes

  • Generator/reranker extras missing — the optional stages raise an actionable RuntimeError telling the user to install mnemo[recall] or set the model to "off". The CLI surfaces the message and exits with code 1 rather than printing a traceback. Source: src/mnemo/adapters/cli/app.py
  • Invalid project — write/update use cases raise UnknownProject with near-match candidates to help the caller correct the slug. Source: src/mnemo/application/use_cases/update_project.py
  • Empty memory content — rejected at Memory.create with a clear ValueError rather than persisting an empty record. Source: src/mnemo/domain/memory.py

See Also

  • Recall pipeline stages
  • Embedder and model integration
  • Project gate semantics
  • MCP tool reference

Source: https://github.com/arttttt/mnemo / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

medium Configuration risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Capability evidence risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Maintenance risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 7 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Configuration risk - Configuration risk requires verification.

1. Configuration risk: Configuration risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: capability.host_targets | https://github.com/arttttt/mnemo

2. Capability evidence risk: Capability evidence risk requires verification

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: capability.assumptions | https://github.com/arttttt/mnemo

3. Maintenance risk: Maintenance risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | https://github.com/arttttt/mnemo

4. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: no_demo
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: downstream_validation.risk_items | https://github.com/arttttt/mnemo

5. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: no_demo
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: risks.scoring_risks | https://github.com/arttttt/mnemo

6. Maintenance risk: Maintenance risk requires verification

  • Severity: low
  • Finding: issue_or_pr_quality=unknown。
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | https://github.com/arttttt/mnemo

7. Maintenance risk: Maintenance risk requires verification

  • Severity: low
  • Finding: release_recency=unknown。
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | https://github.com/arttttt/mnemo

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 3

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using mnemo with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence