mnem Manual - Doramagic.ai

Doramagic Project Pack · Human Manual

mnem

mnem serves as a personal knowledge graph for AI agents, enabling them to:

Introduction to mnem

Related topics: System Architecture, Installation Guide

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Crate Responsibilities

Continue reading this section for the full explanation and source context.

Section Nodes

Continue reading this section for the full explanation and source context.

Section Edges

Continue reading this section for the full explanation and source context.

Related topics: System Architecture, Installation Guide

Introduction to mnem

mnem is a Rust-based knowledge management system designed for AI agents. It provides a structured approach to storing, retrieving, and managing information using a DAG-based (Directed Acyclic Graph) storage architecture with content-addressed data structures.

Overview

mnem serves as a personal knowledge graph for AI agents, enabling them to:

Store structured information with nodes, edges, and properties in a version-controlled repository
Ingest various document formats including Markdown, PDF, plain text, code files, and conversation logs
Retrieve relevant context using vector search, sparse ranking, and token-budget packing
Track changes through a commit-based operation log with cryptographic signatures
Support branching for experimental or temporary state management

Sources: crates/mnem-core/src/lib.rs:1-30

Architecture

mnem is organized as a monorepo with multiple Rust crates:

graph TD
    subgraph "mnem Repository Structure"
        CLI["mnem-cli<br/>Command Line Interface"]
        HTTP["mnem-http<br/>HTTP API Server"]
        INGEST["mnem-ingest<br/>Document Ingestion"]
        CORE["mnem-core<br/>Core Data Model & Retrieval"]
    end
    
    CLI --> CORE
    HTTP --> CORE
    INGEST --> CORE
    
    INGEST --> |"parse/chunk/extract"| RAW[("Raw Source<br/>.md .pdf .txt .json")]
    CORE --> |"store/retrieve"| GRAPH[("Knowledge<br/>Graph")]

Crate Responsibilities

Crate	Purpose
`mnem-core`	Core data models (`Node`, `Edge`, `Commit`, `Operation`), DAG-CBOR codec, prolly trees, vector/sparse indexing, agent-facing retrieval
`mnem-ingest`	Document parsing, chunking strategies, entity extraction (rule-based, KeyBERT, or LLM)
`mnem-cli`	Terminal interface for all operations
`mnem-http`	REST API for remote agent access

Sources: crates/mnem-core/src/lib.rs:15-25

Core Data Model

Nodes

Nodes are the fundamental unit of information storage. Each node contains:

graph LR
    subgraph "Node Structure"
        NTYPE["ntype<br/>Node Type Label"]
        CTX["context_sentence<br/>Positional Cue"]
        SUM["summary<br/>LLM-facing Text"]
        PROPS["props<br/>Property Map"]
        CONTENT["content<br/>Opaque Payload"]
    end

Field	Type	Description
`ntype`	`String`	Semantic label (e.g., `Fact`, `Doc`, `Person`)
`context_sentence`	`Option<String>`	LLM-generated placement cue for contextual retrieval
`summary`	`Option<String>`	Primary text for embedding and retrieval
`props`	`BTreeMap<String, Ipld>`	Structured key-value metadata
`content`	`Option<Bytes>`	Opaque payload (document body, file data)

The context_sentence field implements Anthropic's 2024 contextual retrieval approach, storing an LLM-generated one-sentence placement cue that captures positional and relational context. Sources: crates/mnem-core/src/objects/node.rs:30-75

Edges

Edges represent relationships between nodes. They are typed links with source and target references.

Operations and Commits

Concept	Description
`Operation`	A single atomic change to the repository state
`Commit`	A snapshot referencing a sequence of operations
`View`	Current head state with references and tombstones

Operations include metadata for provenance:

pub struct Operation {
    pub author: String,
    pub agent_id: Option<String>,
    pub task_id: Option<String>,
    pub host: Option<String>,
    pub time: u64,
    pub description: String,
    pub signature: Option<Signature>,
}

Sources: crates/mnem-core/src/objects/operation.rs:15-35

Document Ingestion Pipeline

The ingestion system (mnem-ingest) handles the transformation of raw documents into chunked, indexed nodes:

flowchart LR
    RAW["Raw Source<br/>.md .pdf .txt"] --> PARSE["Parse"]
    PARSE --> SECTION["Sections"]
    SECTION --> CHUNK["Chunk"]
    CHUNK --> EXTRACT["Extract Entities<br/>Relations"]
    EXTRACT --> NODE["Nodes + Edges"]
    NODE --> STORE["Commit to Store"]

Sources: crates/mnem-ingest/src/lib.rs:20-45

Supported Source Types

Source	Extensions	Strategy	Default Chunker
Markdown	`.md`, `.markdown`	CommonMark + GFM	`Paragraph`
Text	`.txt`, unknown	Plain text	`SentenceRecursive`
PDF	`.pdf`	Text layer extraction	`SentenceRecursive`
Conversation	`.json`, `.jsonl`	Chat export formats	`Session`
Code	`.rs`, `.py`, `.js`, `.ts`, `.go`, `.java`, `.c`, `.cpp`, `.rb`, `.cs`	Tree-sitter parsing	`Structural`

Sources: crates/mnem-ingest/src/pipeline.rs:45-60

Chunker Strategies

Five chunking strategies are available:

Strategy	Description	Use Case
`Paragraph`	Splits on double-newlines	Markdown documents
`Recursive`	Token-budgeted word-window sliding	Backwards compatibility
`SentenceRecursive`	Sentence-aware token packing using Unicode boundaries	Prose (Text, PDF)
`Session`	Groups messages up to `max_messages`	Conversation logs
`Structural`	One chunk per section	Code (function/class level)

The SentenceRecursive chunker is the preferred strategy for prose as it prevents cutting mid-sentence and produces more uniform chunk sizes. Token counts are estimated via whitespace split for speed and determinism. Sources: crates/mnem-ingest/src/chunk.rs:1-45

Entity Extraction

mnem supports multiple extraction providers:

Provider	Method
`rule` (default)	Capitalized phrase heuristic
`keybert`	Statistical keyword extraction (requires feature flag)
`ollama`	LLM-based extraction (requires feature flag)
`none`	Suppress entity extraction

Sources: crates/mnem-cli/src/commands/ingest.rs:25-40

Retrieval System

The retrieval layer composes multiple ranking strategies to deliver relevant context to agents under a token budget:

graph TD
    QUERY["Query"] --> VEC["Vector Search"]
    QUERY --> SPARSE["Sparse Ranking"]
    VEC --> RERANK["Rerank"]
    SPARSE --> RERANK
    RERANK --> PACK["Token Budget Packing"]
    PACK --> RESULT["Context for Agent"]

The retriever renders nodes in a YAML-like format:

ntype: <ntype>
id: <uuid>
context: <context_sentence>
summary: <summary>
<prop_key>: <prop_value>

ntype and id are always present
context appears before summary (per Anthropic's contextual-retrieval recipe)
summary is clipped at 8192 chars by default
Scalar props are emitted in BTreeMap order; non-scalar props are skipped

Sources: crates/mnem-core/src/retrieve/mod.rs:1-50

CLI Interface

The mnem CLI provides commands for repository management:

Command	Description
`mnem ingest <path>`	Parse and commit documents to the graph
`mnem tag`	Manage versioned references (create, list, delete)
`mnem branch`	Create and manage branches

Ingest Command Options

Option	Default	Description
`--chunker`	`auto`	Strategy: `auto`, `paragraph`, `recursive`, `sentence_recursive`, `session`, `structural`
`--max-tokens`	`512`	Target tokens per chunk
`--overlap`	`32`	Overlap tokens between chunks
`--recursive`	false	Walk directory trees
`--extractor`	`none`	Entity extraction provider
`--ner-provider`	`rule`	NER method: `rule`, `none`

Sources: crates/mnem-cli/src/commands/ingest.rs:50-70

HTTP API

The HTTP server exposes REST endpoints for remote agent access:

Endpoint	Method	Description
`/v1/ingest`	POST	Ingest documents with JSON or multipart payload
`/v1/branches`	GET	List all branches
`/v1/branches`	POST	Create a new branch

Ingest Request Parameters

Parameter	Type	Description
`chunker`	String	Strategy: `auto`, `paragraph`, `recursive`, `session`
`max_tokens`	u32	Target tokens per chunk
`overlap`	u32	Overlap tokens between chunks
`author`	String	Required commit author
`message`	String	Optional commit message
`extractor`	String	Extraction provider
`ner_provider`	String	NER method override

Sources: crates/mnem-http/src/handlers_ingest.rs:15-45

Key Design Principles

No unsafe code: The entire mnem-core crate enforces #![forbid(unsafe_code)] Sources: crates/mnem-core/src/lib.rs:30

Canonical encoding: Every object type preserves byte-exact round-trip property (decode(encode(x)) == x)

Deterministic retrieval: Node props use BTreeMap for consistent iteration order

Extensible architecture: Sidecar support for external tools (docling, unstructured) via feature flags

Branch support: Tags and branches enable experimental state management without losing history

Sources: crates/mnem-core/src/lib.rs:1-30

Installation Guide

Overview

The mnem project is a Git-like version control system designed specifically for AI Agent Knowledge management. It provides versioned storage, retrieval, and synchronization of structured knowledge nodes. This guide covers all supported installation methods, system requirements, and configuration steps to get mnem running on your platform.

The project is organized as a Rust monorepo with multiple crates and language bindings. Installation options include native binaries via multiple package managers, Python packages, Docker containers, and prebuilt releases. Sources: crates/mnem-core/src/lib.rs:1-20

System Requirements

Supported Platforms

mnem supports the following platforms and architectures:

Platform	Architecture	Notes
Linux	x86_64, aarch64	Full support
macOS	arm64 (Apple Silicon), x86_64	Rosetta 2 compatible
Windows	x86_64	Full support

Sources: py-packages/mnem-cli/README.md:1-20

Runtime Dependencies

Component	Requirement	Purpose
Rust toolchain	1.70+ (stable)	Building from source
Python	3.9+	Python bindings (`mnem-py`)
Node.js	18+	npm package
Docker	20.10+	Container deployment

Installation Methods

CLI Installation via pip

The simplest method to install the mnem CLI is through Python's package manager:

pip install mnem-cli
mnem --version

On first run, mnem automatically downloads the correct prebuilt binary for your platform from the GitHub release assets and caches it in ~/.mnem_cli/. Subsequent calls run the cached binary directly. Sources: py-packages/mnem-cli/README.md:1-15

CLI Installation via Cargo

For users with the Rust toolchain installed, install from crates.io:

cargo install --locked mnem-cli --features bundled-embedder

The --features bundled-embedder flag compiles the embedder dependency into the binary, making it self-contained without external embedding services. Sources: crates/mnem-cli/src/main.rs:1-50

CLI Installation via npm

Node.js users can install globally via npm:

npm install -g mnem-cli

Sources: py-packages/mnem-cli/README.md:1-20

Prebuilt Binaries

Download prebuilt binaries directly from the GitHub Releases page. Binaries are available for all supported platforms in the release assets.

After downloading, make the binary executable:

chmod +x mnem-*-x86_64-unknown-linux-gnu
./mnem-*-x86_64-unknown-linux-gnu --version

Docker Installation

Container-based deployment is available via Docker. The project includes both a Dockerfile and docker-compose.yml for containerized deployments.

To build the Docker image:

docker build -t mnem:latest .

For orchestrated deployments using docker-compose:

docker-compose up -d

Sources: Dockerfile, docker-compose.yml

Python Bindings

For programmatic access from Python applications, install the Python bindings package:

pip install mnem-py

This package provides the import pymnem interface for Python applications to interact with mnem repositories. Sources: py-packages/mnem-cli/README.md:1-30

Build from Source

Prerequisites

Rust 1.70 or later (stable toolchain)
Cargo (included with Rust)
Git

Build Steps

# Clone the repository
git clone https://github.com/Uranid/mnem.git
cd mnem

# Build the CLI
cargo build --release --bin mnem

# Build all crates
cargo build --release

Feature Flags

The project supports several feature flags to customize the build:

Feature	Description
`bundled-embedder`	Embedder for local vector storage
`keybert`	Statistical keyphrase extraction
`ollama`	LLM-based extraction via Ollama
`sidecar-docling`	PDF extraction via docling CLI
`sidecar-unstructured`	PDF extraction via unstructured

Sources: crates/mnem-ingest/src/lib.rs:1-60

Initial Configuration

Repository Initialization

After installation, initialize a new mnem repository:

mnem init

This creates the .mnem/ directory with the repository database (repo.redb). Sources: crates/mnem-cli/src/main.rs:1-80

Configuration File

The CLI reads configuration from .mnem/config.toml in the repository root. Configuration includes:

[user]
name = "Your Name"
email = "[email protected]"
agent_id = "agent-identifier"

[llm]
provider = "ollama"  # or "openai", "anthropic"
model = "llama3.2"
base_url = "http://localhost:11434"
timeout_secs = 120

The author string for commits follows the format name <email> when both are present. If only one is available, it uses that value alone. When neither is configured, it falls back to the agent_id or defaults to "mnem-cli". Sources: crates/mnem-cli/src/config.rs:1-50

Repository Path Resolution

The CLI automatically searches for the .mnem/ directory by walking up from the current working directory, similar to Git's behavior. You can override this with the -R / --repo flag:

mnem -R ~/notes status

Sources: crates/mnem-cli/src/main.rs:1-80

HTTP Server Deployment

Starting the Server

The HTTP server provides REST API access to mnem repositories:

mnem serve --port 8080

API Endpoints

Endpoint	Method	Purpose
`/v1/ingest`	POST	Ingest documents
`/v1/branches`	GET	List branches
`/v1/branches`	POST	Create branch
`/v1/retrieve`	POST	Query knowledge

Ingest Configuration

The ingest endpoint accepts JSON payloads with the following parameters:

Parameter	Type	Required	Default	Description
`content`	String	Yes	-	Content to ingest
`chunker`	String	No	`auto`	Chunking strategy
`max_tokens`	u32	No	512	Target tokens per chunk
`overlap`	u32	No	32	Overlap tokens
`author`	String	Yes	-	Commit author
`message`	String	No	`"mnem http ingest"`	Commit message
`extractor`	String	No	`"none"`	Entity extractor
`ner_provider`	String	No	`"rule"`	NER provider

Sources: crates/mnem-http/src/handlers_ingest.rs:1-50

Verification

Verify Installation

After installation, verify the CLI is working:

mnem --version
mnem status

First-Run Wizard

On first run with no repository present, mnem launches a first-run wizard to help configure the basic settings. Returning users see the mnem status output directly. Sources: crates/mnem-cli/src/main.rs:1-80

Alternative Installation Summary

Method	Command	Notes
pip	`pip install mnem-cli`	Auto-downloads binary
cargo	`cargo install --locked mnem-cli --features bundled-embedder`	Self-contained binary
npm	`npm install -g mnem-cli`	Node.js integration
Docker	`docker-compose up -d`	Containerized deployment
Binary	Download from Releases	Manual installation

Sources: py-packages/mnem-cli/README.md:1-20

Next Steps

After installation, consult these related guides:

Quick Start - Create your first repository and add content
Configuration Reference - Complete configuration options
Ingest Guide - Document ingestion and chunking strategies
Retrieve Guide - Query your knowledge base

Sources: py-packages/mnem-cli/README.md:1-20

System Architecture

Related topics: Core Components, Storage Backend

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Core Components, Storage Backend

System Architecture

Overview

mnem is a content-addressed, CRDT-based (Conflict-free Replicated Data Types) knowledge management system designed for agentic workflows. The system provides immutable content-addressed storage with a secondary vector index for retrieval-augmented generation (RAG) applications.

The architecture follows a modular design with distinct crates handling different concerns:

Crate	Purpose
`mnem-core`	Core data types, CRDT operations, storage, indexing, retrieval
`mnem-ingest`	Source parsing, chunking, and entity extraction
`mnem-cli`	Command-line interface
`mnem-http`	HTTP API server

Sources: crates/mnem-core/src/lib.rs:1-30

Core Components

Related topics: System Architecture, Hybrid Retrieval System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Node

Continue reading this section for the full explanation and source context.

Section Edge

Continue reading this section for the full explanation and source context.

Section Commit

Continue reading this section for the full explanation and source context.

Core Components

The mnem system is built around a set of core components that work together to provide a versioned, graph-based knowledge management system. The core is implemented entirely in Rust with #![forbid(unsafe_code)], ensuring memory safety throughout the codebase. Every object type preserves byte-exact canonical-encoding round-trip properties (decode(encode(x)) == x and encode(decode(b)) == b). Sources: crates/mnem-core/src/lib.rs

System Architecture Overview

mnem implements a content-addressed graph database with prolly trees for efficient storage and retrieval. The architecture separates concerns between data structures (objects), storage (store), repository management (repo), and retrieval (retrieve).

graph TD
    subgraph "mnem-core"
        OBJ[objects: Node, Edge, Commit, View]
        PRO[prolly: TreeChunk, Builder, Cursor]
        STORE[store: Blockstore, OpHeadsStore]
        REPO[repo: ReadonlyRepo, Transaction]
        IDX[index: Query, BruteForceVectorIndex]
        RET[retrieve: Retriever]
        CODEC[codec: DAG-CBOR, DAG-JSON]
    end
    
    subgraph "mnem-ingest"
        ING[Ingester Pipeline]
        CHUNK[Chunking Strategies]
        PARSE[Parsers: MD, PDF, Code, JSON]
    end
    
    subgraph "External Interfaces"
        CLI[mnem-cli]
        HTTP[mnem-http]
        MCP[MCP Server]
    end
    
    ING -->|adds nodes/edges| REPO
    REPO -->|reads/writes| STORE
    RET -->|queries| IDX
    IDX -->|indexes| OBJ
    CODEC -->|encodes/decodes| OBJ
    CLI --> REPO
    HTTP --> REPO

Data Objects

The fundamental building blocks of the mnem knowledge graph are the core object types defined in crates/mnem-core/src/objects/. Each object is serializable via DAG-CBOR for canonical encoding. Sources: crates/mnem-core/src/lib.rs

Node

The Node is the primary unit of knowledge storage. It represents a single fact, entity, or chunk of content within the graph.

// Simplified structure from crates/mnem-core/src/objects/node.rs
pub struct Node {
    pub id: NodeId,                                    // Unique identifier
    pub ntype: String,                                 // Node type label (e.g., "Fact", "Doc")
    pub summary: Option<String>,                       // LLM-facing retrieval text
    pub props: BTreeMap<String, Ipld>,                 // Property map
    pub content: Option<Bytes>,                        // Optional opaque payload
    pub context_sentence: Option<String>,              // Positional chunk prefix
    pub ext: Option<BTreeMap<String, Ipld>>,           // Forward-compat extension map
}

Field	Type	Description
`id`	`NodeId`	Unique content-addressed identifier
`ntype`	`String`	Free-form type label for the node
`summary`	`Option<String>`	Text summary for LLM retrieval under token budget
`props`	`BTreeMap<String, Ipld>`	Structured metadata with any DAG-CBOR value
`content`	`Option<Bytes>`	Opaque payload (document body, file data)
`context_sentence`	`Option<String>`	LLM-generated placement cue per Anthropic's contextual retrieval recipe
`ext`	`Option<BTreeMap>`	Forward-compat extension map preserving unknown fields

The summary field is designed for LLM consumption—the field agents read when assembling context under a token budget. It is distinct from props (structured) and content (opaque payload). Sources: crates/mnem-core/src/objects/node.rs

The context_sentence implements Anthropic's 2024 Contextual Retrieval paper approach, which reports -49% to -67% retrieval-failure reduction when present. mnem stores it on the node so the render path can surface it back to the agent for faithful source attribution. Sources: crates/mnem-core/src/objects/node.rs

Edge

Edges connect nodes and represent relationships between entities.

// From crates/mnem-core/src/objects/edge.rs
pub struct Edge {
    pub src: NodeId,           // Source node ID
    pub rel: String,           // Relation label (e.g., "works_at", "extracted_from")
    pub dst: NodeId,           // Destination node ID
    pub props: BTreeMap<String, Ipld>,  // Optional edge properties
}

Edges are used to create graph relationships like works_at, lives_in, traveling_with, has_preference, and extracted_from. The mnem-cli integration guidelines recommend using the compound mnem_commit_relation tool when both endpoints are entities—it resolve-or-creates both nodes and adds the edge in one call. Sources: crates/mnem-cli/src/integrate.rs

Commit

The Commit object represents a point-in-time snapshot of the repository state.

// From crates/mnem-core/src/objects/commit.rs
pub struct Commit {
    pub message: String,           // Commit message
    pub author: Author,            // Author information
    pub timestamp: Timestamp,       // Commit timestamp
    pub root: NodeId,              // Root of the node tree
    pub ops: Vec<Operation>,        // Operations applied in this commit
}

View

The View contains repository metadata including branch references and commit heads.

// Referenced in crates/mnem-http/src/handlers.rs
pub struct View {
    pub heads: Vec<Cid>,           // Current head commit CIDs
    pub refs: BTreeMap<String, RefTarget>,  // Named references
}

The View exposes branch information via the HTTP API with the schema mnem.v1.branches. Sources: crates/mnem-http/src/handlers.rs

Operation

Operations represent individual changes applied to the repository. They are collected within commits to provide a complete audit trail.

Repository Layer

The repository layer provides the main interface for interacting with the knowledge graph. It is defined across several modules in crates/mnem-core/src/repo/. Sources: crates/mnem-core/src/repo/mod.rs

ReadonlyRepo

ReadonlyRepo provides a read-only view into the repository state.

// Simplified from crates/mnem-core/src/repo/mod.rs
pub trait ReadonlyRepo {
    fn view(&self) -> &View;
    fn blockstore(&self) -> &dyn Blockstore;
}

Transaction

Transaction enables write operations to the repository. All changes are staged until explicitly committed.

// From crates/mnem-core/src/repo/transaction.rs
pub struct Transaction {
    // Internal state managing pending operations
}

impl Transaction {
    pub fn add_node(&mut self, node: Node) -> Result<NodeId, Error>;
    pub fn add_edge(&mut self, edge: Edge) -> Result<EdgeId, Error>;
    pub fn commit(self, author: Author, message: String) -> Result<ReadonlyRepo, Error>;
}

The ingest method on the Ingester pipeline uses Transaction to add nodes and edges:

Parse, chunk, extract, and write into tx. Does not commit. bytes is the raw source payload; kind says how to parse it. Returns an IngestResult with counts and elapsed time. The commit_cid field is left None - callers who want a CID should call tx.commit(...) afterwards and stash the returned ReadonlyRepo's head commit CID. Sources: crates/mnem-ingest/src/pipeline.rs

Merge and Conflict Detection

The merge system handles combining divergent repository states.

// From crates/mnem-core/src/repo/merge.rs
pub fn detect_conflicts(
    repo: &ReadonlyRepo,
    left: Cid,
    right: Cid,
    lca: Option<Cid>,
) -> Result<MergeConflicts, Error>;

Conflict detection supports an explicit ConflictPolicy for customizing merge behavior. The detector loads tombstone sets via the Views attached to each commit's operation. Sources: crates/mnem-core/src/repo/conflict.rs

graph LR
    A[Commit A] -->|diverged| B[Common Ancestor]
    C[Commit B] -->|diverged| B
    B --> D[Detect Conflicts]
    D --> E{MergeConflicts?}
    E -->|Yes| F[Surface conflicts to caller]
    E -->|No| G[Auto-merge possible]

Prolly Trees

mnem uses prolly trees (probabilistic trees) for efficient storage and lookup of the node graph. This is implemented in crates/mnem-core/src/prolly/. Sources: crates/mnem-core/src/prolly/tree.rs

graph TD
    subgraph "Prolly Tree Structure"
        ROOT[Root Node / TreeChunk] --> LEFT[Left Child TreeChunk]
        ROOT --> RIGHT[Right Child TreeChunk]
        LEFT --> LL[Leaf TreeChunk]
        LEFT --> LR[Leaf TreeChunk]
        RIGHT --> RL[Leaf TreeChunk]
        RIGHT --> RR[Leaf TreeChunk]
    end
    
    style ROOT fill:#e1f5fe
    style LL fill:#f3e5f5
    style LR fill:#f3e5f5
    style RL fill:#f3e5f5
    style RR fill:#f3e5f5

The prolly tree implementation includes:

Component	Purpose
`TreeChunk`	Immutable chunk containing sorted entries
`Builder`	Constructs new trees from operations
`Cursor`	Navigates tree structure for lookups
`diff`	Computes differences between trees
`merge`	Merges divergent tree versions

Prolly trees provide logarithmic-time lookups and efficient diffing for collaborative editing scenarios. Sources: crates/mnem-core/src/lib.rs

Storage Layer

The storage layer abstracts over different backend implementations.

Blockstore

// From crates/mnem-core/src/lib.rs
pub trait Blockstore {
    fn get(&self, cid: &Cid) -> Result<Option<Vec<u8>>, Error>;
    fn put(&self, cid: &Cid, data: &[u8]) -> Result<(), Error>;
}

OpHeadsStore

// From crates/mnem-core/src/lib.rs
pub trait OpHeadsStore {
    fn get_heads(&self) -> Result<Vec<Cid>, Error>;
    fn set_heads(&mut self, heads: &[Cid]) -> Result<(), Error>;
}

The codebase includes in-memory reference implementations of both traits for testing and development. Sources: crates/mnem-core/src/lib.rs

Index System

Secondary indexes enable efficient querying of the knowledge graph.

Query

The primary query interface for searching nodes and edges.

BruteForceVectorIndex

A vector index implementation for semantic search capabilities. This works in conjunction with the retrieve module to provide dense + sparse retrieval lanes that capture positional and relational context. Sources: crates/mnem-core/src/lib.rs

Retrieval System

The retrieve module provides the agent-facing interface for context assembly.

// From crates/mnem-core/src/retrieve/mod.rs
pub struct Retriever { /* ... */ }

The retriever composes:

Filters - Pre-filter nodes by type, properties, or time range
Vector ranking - Dense embeddings from the configured embedder
Sparse ranking - BM25-style keyword matching
Token-budget packing - Assembles context within LLM token limits

Node Rendering

Nodes are rendered to a compact, deterministic YAML-like format suitable for LLM consumption:

ntype: <ntype>
id: <uuid>
context: <context_sentence>
summary: <summary>
<prop_key>: <prop_value>

ntype and id are always present
context is emitted if node.context_sentence is Some (sits BEFORE summary per Anthropic's contextual-retrieval recipe)
summary is emitted if node.summary is Some, clipped at DEFAULT_RENDER_SUMMARY_CAP_CHARS (8192) chars
Scalar props (String, Integer, Float, Bool) are emitted in BTreeMap order
Non-scalar props (Link, Map, List, Bytes, Null) are skipped
Opaque content bytes are never rendered Sources: crates/mnem-core/src/retrieve/mod.rs

Identification System

mnem uses phantom-typed identifiers for type safety:

Type	Description
`NodeId`	Identifies a node in the graph
`EdgeId`	Identifies an edge
`ChangeId`	Identifies a change operation
`OperationId`	Identifies an operation
`Link<T>`	Phantom-typed link to any type

All CIDs are content-addressed, ensuring that the same content always produces the same identifier. Sources: crates/mnem-core/src/lib.rs

Codec System

The codec system provides canonical encoding and decoding:

// From crates/mnem-core/src/lib.rs
pub mod codec {
    pub fn encode<T: Encode>(&self, value: &T) -> Vec<u8>;
    pub fn decode<T: Decode>(&self, bytes: &[u8]) -> Result<T, Error>;
}

DAG-CBOR - Primary serialization format with canonical encoding guarantees
DAG-JSON - Debug export format for human inspection

Every object type preserves the byte-exact canonical-encoding round-trip property. Sources: crates/mnem-core/src/lib.rs

Signing System

The sign module provides Ed25519 signing and revocation-list verification for trust and integrity:

// From crates/mnem-core/src/lib.rs
pub mod sign {
    // Ed25519 signing operations
    // Revocation-list verification
}

Chunking Integration

While chunking is primarily handled by the mnem-ingest crate, the core objects are designed to work seamlessly with chunked content:

The Chunk type is used throughout the system:

// Referenced from crates/mnem-ingest/src/chunk.rs
pub struct Chunk {
    pub content: String,
    pub tokens_estimate: usize,  // Fast whitespace-split estimation
}

Chunks preserve source order: section 0's chunks come before section 1's. Empty sections are skipped silently. Sources: crates/mnem-ingest/src/chunk.rs

Summary

The mnem core components form a layered architecture:

Layer	Components	Responsibility
Objects	Node, Edge, Commit, View, Operation	Core data structures
Storage	Blockstore, OpHeadsStore	Persistence abstraction
Trees	ProllyTree, TreeChunk, Builder	Efficient ordered storage
Repository	ReadonlyRepo, Transaction	Access control and mutation
Index	Query, VectorIndex	Secondary access paths
Retrieval	Retriever	Agent-facing context assembly
Codec	DAG-CBOR, DAG-JSON	Canonical serialization
Crypto	Ed25519 signing	Integrity and trust

This architecture enables mnem to serve as a versioned, collaborative knowledge graph with strong consistency guarantees and efficient retrieval capabilities for LLM integration.

Source: https://github.com/Uranid/mnem / Human Manual

Hybrid Retrieval System

Overview

The Hybrid Retrieval System in mnem is an agent-facing retrieval subsystem that composes multiple ranking strategies—vector (dense), sparse, and graph-based expansion—into a unified token-budgeted context assembly pipeline. It is designed for LLM consumption, enabling autonomous agents to fetch relevant nodes from the repository under strict token budgets.

The system lives in crates/mnem-core/src/retrieve/ and is exposed via HTTP API (crates/mnem-http/src/handlers.rs) and CLI (crates/mnem-cli/).

Sources: crates/mnem-core/src/lib.rs:18-23

Architecture

graph TD
    subgraph "Retrieval Entry Points"
        HTTP[HTTP API: POST /v1/retrieve]
        CLI[CLI: mnem retrieve]
    end
    
    subgraph "Hybrid Retrieval Core"
        RT[Retriever]
        HF[Hybrid Fuser]
        VQ[Vector Query]
        SQ[Sparse Query]
        GQ[Graph Expansion]
        TB[Token Budget Packer]
    end
    
    subgraph "Indexes"
        VI[Vector Index]
        SI[Sparse Index]
        GI[Graph Index]
    end
    
    HTTP --> RT
    CLI --> RT
    RT --> HF
    HF --> VQ
    HF --> SQ
    HF --> GQ
    VQ --> VI
    SQ --> SI
    GQ --> GI
    HF --> TB --> Output[LLM Context]

Core Components

Retriever

The Retriever struct is the main facade for retrieval operations. It orchestrates query planning, index selection, and result fusion.

Key Responsibilities:

Accept a query string and configuration parameters
Dispatch parallel queries to vector, sparse, and graph indexes
Fuse ranked results using configurable strategies
Pack results into token budgets suitable for LLM context windows

Sources: crates/mnem-core/src/retrieve/mod.rs:1-50

Node Rendering

Before results reach the LLM, nodes are rendered to a compact, deterministic YAML-like text representation:

ntype: <ntype>
id: <uuid>
context: <context_sentence>
summary: <summary>
<prop_key>: <prop_value>
...

Rendering Rules:

Field	Condition	Notes
`ntype`	Always	Node type identifier
`id`	Always	UUID
`context`	If `node.context_sentence` is `Some`	Position cue, emitted BEFORE summary
`summary`	If `node.summary` is `Some`	Clipped at 8192 chars by default
Scalar props	Always	Strings, integers, floats, booleans in BTreeMap order
Non-scalar props	Skipped	Links, Maps, Lists, Bytes, Null

Sources: crates/mnem-core/src/retrieve/mod.rs:60-95

Context Sentence (Anthropic Contextual Retrieval)

mnem implements Anthropic's 2024 Contextual Retrieval recipe. Each node may carry an optional context_sentence field—an LLM-generated one-sentence placement cue.

"This paragraph is from Section 3 of a legal contract between Alice and Bob's employer..."

The ingest pipeline prepends this to summary before embedding so both dense and sparse lanes capture positional and relational context.

Sources: crates/mnem-core/src/objects/node.rs:95-115

Retrieval Configuration

CLI Configuration Keys

Key	Type	Default	Description
`retrieve.limit`	`usize`	—	Maximum results to return
`retrieve.budget`	`u32`	—	Token budget for result packing
`retrieve.vector_cap`	`usize`	—	Vector index candidate cap
`retrieve.graph_expand`	`usize`	—	Graph neighbor expansion count
`retrieve.graph_depth`	`usize`	—	Graph traversal depth
`retrieve.graph_decay`	`u32`	—	Decay factor for graph scores
`retrieve.rerank_top_k`	`usize`	—	Top-K for re-ranking
`retrieve.hyde_max_tokens`	`usize`	—	Max tokens for HyDE hypothesis
`rerank.model`	`String`	—	Re-ranker model identifier
`rerank.base_url`	`String`	—	Re-ranker service base URL

Sources: crates/mnem-cli/src/config.rs:1-100

HTTP API Parameters

The POST /v1/retrieve endpoint accepts the following JSON body:

Field	Type	Default	Description
`query`	`String`	Required	Search query
`limit`	`usize`	20	Result limit (clamped to `MAX_RETRIEVE_LIMIT`)
`vector_cap`	`usize`	—	Vector candidate cap (clamped to `MAX_VECTOR_CAP`)
`rerank_top_k`	`usize`	—	Re-rank candidate count (clamped to `MAX_RERANK_TOP_K`)
`hyde`	`bool`	false	Enable HyDE extractive summarization
`summarize`	`bool`	false	Enable centroid + MMR summarization
`summarize_k`	`usize`	3	Summary sentences count

Clamping Constants:

MAX_RETRIEVE_LIMIT — Prevents unbounded result sets
MAX_VECTOR_CAP — Bounds vector search candidates
MAX_RERANK_TOP_K — Limits re-ranking computation

Sources: crates/mnem-http/src/handlers.rs:200-280

Chunking Strategies

The retrieval system operates on pre-chunked content. The ingest pipeline supports five chunking strategies, selectable per source kind:

Strategy	Source Kind	Configuration	Behavior
`Paragraph`	Markdown	None	Splits on double-newline boundaries
`SentenceRecursive`	Text	`max_tokens`, `overlap`	Sentence-aware token-budgeted packing using Unicode UAX #29 boundaries
`SentenceRecursive`	PDF	`max_tokens=512`, `overlap=64`	Same as above with larger defaults
`Session`	Conversation	`max_messages=10`	Groups messages until role returns to `user` or max reached
`Structural`	Code	None	One chunk per section (function/class body from tree-sitter parser)
`Recursive`	(legacy)	`max_tokens`, `overlap`	Token-budgeted word-window sliding window

Sources: crates/mnem-ingest/src/chunk.rs:1-100

Auto-Chunking

The auto_chunker(kind, heuristics) function selects optimal strategies:

match kind {
    SourceKind::Markdown => ChunkerKind::Paragraph,
    SourceKind::Text => ChunkerKind::SentenceRecursive { max_tokens: 256, overlap: 32 },
    SourceKind::Pdf => ChunkerKind::SentenceRecursive { max_tokens: 512, overlap: 64 },
    SourceKind::Conversation => ChunkerKind::Session { max_messages: 10 },
    SourceKind::Code(_) => ChunkerKind::Structural,
}

Sources: crates/mnem-ingest/src/chunk.rs:40-65

Source Kind Taxonomy

Kind	Extensions	Parser	Index Type
`Markdown`	`.md`, `.markdown`	`parse_markdown`	Hybrid
`Pdf`	`.pdf`	Sidecar (docling/unstructured)	Hybrid
`Conversation`	`.json`, `.jsonl`	Session parser	Session
`Text`	Other/unspecified	Raw text	Hybrid
`Code(Rust)`	`.rs`	Tree-sitter	Structural
`Code(Python)`	`.py`, `.pyi`	Tree-sitter	Structural
`Code(JavaScript)`	`.js`, `.mjs`, `.cjs`	Tree-sitter	Structural
`Code(TypeScript)`	`.ts`, `.tsx`, `.mts`, `.cts`	Tree-sitter	Structural
`Code(Go)`	`.go`	Tree-sitter	Structural
`Code(Java)`	`.java`	Tree-sitter	Structural
`Code(C)`	`.c`, `.h`	Tree-sitter	Structural
`Code(Cpp)`	`.cpp`, `.cc`, `.cxx`, `.hpp`	Tree-sitter	Structural
`Code(Ruby)`	`.rb`, `.gemspec`, `.rake`, `.erb`	Tree-sitter	Structural
`Code(CSharp)`	`.cs`, `.csx`	Tree-sitter	Structural

Sources: crates/mnem-ingest/src/types.rs:1-80

Retrieval Flow

sequenceDiagram
    participant Client
    participant Retriever
    participant VectorIndex
    participant SparseIndex
    participant GraphIndex
    participant Fuser
    participant TokenBudgetPacker
    participant LLM

    Client->>Retriever: query + config
    Retriever->>VectorIndex: vector_search(query)
    Retriever->>SparseIndex: sparse_search(query)
    Retriever->>GraphIndex: graph_expand(seed_nodes)
    VectorIndex-->>Fuser: ranked_candidates
    SparseIndex-->>Fuser: ranked_candidates
    GraphIndex-->>Fuser: ranked_candidates
    Fuser->>Fuser: reciprocal_rank_fusion
    Fuser->>TokenBudgetPacker: fused_results
    alt summarize=true
        TokenBudgetPacker->>TokenBudgetPacker: centroid_MMR_extraction
    end
    TokenBudgetPacker-->>LLM: token_budgeted_context

HyDE (Hypothetical Document Embeddings)

When hyde=true, the system generates extractive summaries of top-M candidate nodes before final ranking. This follows the HyDE (Hypothetical Document Embeddings) pattern where:

Initial candidates are retrieved
Extractive summarization produces hypotheses
Hypotheses are re-embedded and ranked
Final top-K are packed into the context budget

Sources: crates/mnem-http/src/handlers.rs:250-270

Branch Name Validation

The HTTP API validates branch names before creating commit references during ingest operations:

Invalid characters: space, tab, newline, null, ~, ^, :, ?, *, [, \, @{, .., //
Invalid patterns: leading /, trailing /, trailing ., trailing .lock

Sources: crates/mnem-http/src/handlers.rs:180-210

Extractor Integration

The retrieval system works in conjunction with the entity extraction pipeline. Extractors produce entity spans and relation spans that populate the graph index:

Extractor	Provider	Features
`RuleExtractor`	Default (NER)	Capitalized phrase heuristic, verb-window regex relations
`KeyBertAdapter`	Statistical	Requires `keybert` feature flag
`LLM`	Ollama	Requires `ollama` feature flag

Sources: crates/mnem-ingest/src/extract.rs:1-100

Configuration Example

[retrieve]
limit = 20
budget = 4096
vector_cap = 100
graph_expand = 5
graph_depth = 2
graph_decay = 80
rerank_top_k = 10
hyde_max_tokens = 256

[rerank]
model = "cross-encoder/ms-marco-MiniLM-L-6-v2"
base_url = "http://localhost:8080"

[ner]
provider = "rule"  # or "none"

Sources: crates/mnem-cli/src/config.rs:50-120

Embedding Providers

Embedding Providers is a pluggable subsystem in the mnem monorepo that abstracts the generation of vector embeddings for text content. It lives in the crates/mnem-embed-providers crate and is consumed by mnem-cli, mnem-http, and mnem-mcp to support dense vector indexing and semantic retrieval.

Architecture Overview

The provider system follows a strategy pattern with runtime-configurable backends. Each provider implements the same Embedder trait, returning Vec<f32> vectors regardless of the underlying implementation (HTTP API, local model, ONNX runtime).

graph TD
    A["mnem-cli / mnem-http / mnem-mcp"] --> B["mnem-embed-providers"]
    B --> C["ProviderConfig"]
    C --> D["OpenAI Provider"]
    C --> E["Ollama Provider"]
    C --> F["ONNX Provider"]
    D --> G["REST API / OpenAI Compatible"]
    E --> H["Local Ollama Server"]
    F --> I["Local ONNX Runtime"]
    
    J["config.toml / ENV vars"] --> B

Sources: crates/mnem-cli/src/commands/mod.rs:1-50

Supported Providers

Provider	Backend Type	Model Selection	Configuration
OpenAI	Remote REST API	Via `model` field	`base_url`, `api_key`, `timeout_secs`
Ollama	Local REST API	Via `model` field	`base_url` (default: `http://localhost:11434`), `timeout_secs`
ONNX	Local ONNX Runtime	Bundled `all-MiniLM-L6-v2`	No network required

Sources: crates/mnem-cli/src/config.rs:1-80

OpenAI Provider

Sends text to OpenAI's embedding API or any OpenAI-compatible endpoint. Requires:

base_url: API endpoint (default: https://api.openai.com/v1)
api_key: Authentication token
model: Embedding model identifier

Ollama Provider

Connects to a local Ollama server for running open-source embedding models. Default endpoint is http://localhost:11434. The provider sets a 120-second timeout by default.

ONNX Provider

Runs inference entirely offline using the ONNX Runtime with the all-MiniLM-L6-v2 model. This is the bundled default when mnem is compiled with the bundled-embedder feature, providing zero-configuration embeddings for single-machine deployments.

Sources: crates/mnem-mcp/src/tools/embed.rs:1-60

Configuration Resolution

Embedding providers are configured through a precedence chain that varies slightly between consumer applications.

mnem-cli Precedence

Priority	Source	Fields
1	Environment variables	`MNEM_EMBED_PROVIDER`, `MNEM_EMBED_MODEL`, `MNEM_EMBED_API_KEY_ENV`, `MNEM_EMBED_BASE_URL`, `MNEM_EMBED_DIM`
2	`~/.mnem/config.toml`	`[embed]` section
3	`<repo>/config.toml`	`[embed]` section
4	Bundled ONNX fallback	When compiled with `bundled-embedder` feature

Sources: crates/mnem-cli/src/config.rs:80-120

mnem-http Precedence

Priority	Source	Behavior
1	`POST /v1/embed` request body	Per-request model override
2	`<data_dir>/config.toml`	Server-wide `[embed]` section

The HTTP server loads embed configuration lazily at startup. A malformed [embed] section logs a warning but does not prevent server startup—auto-embed simply remains disabled.

fn load_embed_config(data_dir: &Path) -> Option<mnem_embed_providers::ProviderConfig> {
    #[derive(serde::Deserialize)]
    struct MiniCfg {
        embed: Option<mnem_embed_providers::ProviderConfig>,
    }
    let path = data_dir.join("config.toml");
    let s = std::fs::read_to_string(&path).ok()?;
    match toml::from_str::<MiniCfg>(&s) {
        Ok(parsed) => parsed.embed,
        Err(e) => {
            tracing::warn!(path = %path.display(), error = %e,
                "config.toml [embed] parse failed; auto-embed disabled"
            );
            None
        }
    }
}

Sources: crates/mnem-http/src/lib.rs:1-50

mnem-mcp Precedence

The MCP server uses a simplified three-tier chain without the global ~/.mnem/config.toml lookup (design point: per-repo isolation):

MNEM_EMBED_* environment variables
<repo>/config.toml [embed] section
Bundled ONNX fallback (only when bundled-embedder feature is compiled)

Sources: crates/mnem-mcp/src/tools/embed.rs:20-40

ProviderConfig Schema

The configuration is parsed from TOML into a discriminated union:

pub enum ProviderConfig {
    Openai(OpenaiConfig),
    Ollama(OllamaConfig),
    Onnx(OnnxConfig),
}

Each variant carries only the parameters relevant to that provider, keeping the configuration minimal.

Error Handling

All embedding operations return EmbedError, which is mapped from transport failures into actionable diagnostics:

graph LR
    A["ureq::Error"] --> B{"EmbedError"}
    B --> C["RateLimited"]
    B --> D["BadRequest<br/>status + body"]
    B --> E["Server<br/>status + body"]
    B --> F["Network<br/>transport message"]
    B --> G["Decode<br/>JSON parse failure"]

Sources: crates/mnem-embed-providers/src/http.rs:1-50

Error Display for Users

When embedding fails, mnem-cli formats the error into a short, actionable one-liner suitable for eprintln!:

Provider	Common Cause	Suggestion
OpenAI	Invalid API key	Check `MNEM_EMBED_API_KEY_ENV`
Ollama	Server not running	Verify `ollama serve` is active
ONNX	Missing model file	Ensure `all-MiniLM-L6-v2` is bundled

The format_embed_failure function accepts a context parameter ("embedding" for writes, "query embedding" for retrieval) to tailor suggestions.

Sources: crates/mnem-cli/src/commands/mod.rs:50-100

Integration with Node Storage

Embedding vectors are stored on Node objects for use during semantic retrieval:

pub struct Node {
    pub id: NodeId,
    pub label: String,
    pub summary: Option<String>,           // LLM-facing retrieval text
    pub content: Option<Bytes>,           // Opaque payload
    pub context_sentence: Option<String>,  // Anthropic contextual retrieval prefix
    pub props: BTreeMap<String, Ipld>,    // Structured properties
}

The summary field is the primary text indexed by the dense embedder. The context_sentence (per Anthropic's 2024 Contextual Retrieval paper) is prepended to summary before embedding to capture positional context, reducing retrieval failure by 49-67%.

Sources: crates/mnem-core/src/objects/node.rs:1-50

Bundled Embedder Feature

The bundled-embedder Cargo feature compiles in an ONNX provider with all-MiniLM-L6-v2. When enabled:

mnem embed works out-of-the-box without external services
The MCP mnem_retrieve tool has a tier-3 fallback when no explicit vector provider is configured
Ideal for air-gapped environments or local-first workflows

When not enabled, missing embedder configuration results in a warning during ingest; nodes are created without vectors, and a recovery path via mnem reindex is promoted.

Summary

Embedding Providers abstracts vector generation behind a common interface, supporting three backends with distinct deployment profiles:

OpenAI: Cloud-hosted, highest quality, requires API credentials
Ollama: Self-hosted, flexible model selection, local compute
ONNX: Offline-capable, bundled model, zero-configuration

Configuration flows from environment variables through TOML files, with graceful fallback behavior that never prevents core operations from functioning.

Sources: crates/mnem-cli/src/commands/mod.rs:1-50

Storage Backend

The storage backend is a critical subsystem in mnem that provides persistent storage for the content-addressable object graph. It abstracts storage operations behind well-defined traits, enabling pluggable storage implementations while maintaining a consistent API for the core data layer.

Architecture Overview

The storage backend follows a trait-based abstraction pattern where mnem-core defines the storage interfaces and concrete implementations are provided by backend crates. This separation allows the core logic to remain independent of specific storage technologies.

graph TD
    subgraph "Application Layer"
        CLI[mnem-cli]
        HTTP[mnem-http]
    end
    
    subgraph "mnem-core"
        Repo[Repository]
        Transaction[Transaction]
        Objects[Node / Edge / Commit]
    end
    
    subgraph "Storage Traits"
        Blockstore[BlockStore Trait]
        OpHeadsStore[OpHeadsStore Trait]
        KnnEdgesStore[KnnEdgesStore Trait]
    end
    
    subgraph "Backend Implementations"
        RedbBackend[mnem-backend-redb]
    end
    
    CLI --> Repo
    HTTP --> Repo
    Repo --> Transaction
    Transaction --> Blockstore
    Transaction --> OpHeadsStore
    Repo --> KnnEdgesStore
    Blockstore --> RedbBackend
    OpHeadsStore --> RedbBackend
    KnnEdgesStore --> RedbBackend

Core Storage Traits

The storage layer is built on three fundamental traits that define the contract between the core library and storage implementations.

BlockStore Trait

The BlockStore trait provides low-level operations for storing and retrieving binary data blocks identified by Content Identifiers (CIDs).

// crates/mnem-core/src/store/blockstore.rs
pub trait Blockstore: Send + Sync {
    fn get(&self, cid: &Cid) -> Result<Option<Vec<u8>>>;
    fn put(&self, block: &[u8]) -> Result<Cid>;
    fn put_many<I>(&self, blocks: I) -> Result<Vec<Cid>>
    where
        I: IntoIterator<Item = Vec<u8>>,
        I::IntoIter: Send + Sync;
}

Method	Purpose	Return Type
`get(cid)`	Retrieve a block by its CID	`Result<Option<Vec<u8>>>`
`put(block)`	Store a single block, returning its CID	`Result<Cid>`
`put_many(blocks)`	Batch insert multiple blocks	`Result<Vec<Cid>>`

The trait implements the CAR (Content Addressable Archive) storage pattern where data integrity is verified through content hashing. Sources: crates/mnem-core/src/store/blockstore.rs:1-20

OpHeadsStore Trait

The OpHeadsStore trait manages operation heads—references to the latest operations in the operational transform system. It supports both single-head and multi-head scenarios with conflict detection.

// crates/mnem-core/src/store/op_heads.rs
pub trait OpHeadsStore: Send + Sync {
    fn get_heads(&self) -> Result<Vec<Cid>>;
    fn put_head(&self, op: &Op) -> Result<()>;
    fn put_heads(&self, ops: &[Op]) -> Result<()>;
    fn merge_heads(&self, merged: Vec<Cid>) -> Result<()>;
}

Method	Purpose
`get_heads()`	Retrieve all current operation head CIDs
`put_head(op)`	Atomically update the single head
`put_heads(ops)`	Set multiple operation heads
`merge_heads(merged)`	Replace heads with merged result after conflict resolution

Sources: crates/mnem-core/src/store/op_heads.rs:1-50

KnnEdgesStore Trait

The KnnEdgesStore trait provides specialized storage for k-nearest-neighbor graph edges, enabling efficient vector similarity searches.

// Backend interface for KNN edge storage
pub trait KnnEdgesStore: Send + Sync {
    fn insert(&self, source_id: NodeId, embedding: &[f32]) -> Result<()>;
    fn search(&self, query: &[f32], k: usize) -> Result<Vec<(NodeId, f32)>>;
}

Sources: crates/mnem-backend-redb/src/knn_edges_store.rs:1-30

Redb Backend Implementation

The mnem-backend-redb crate provides the reference implementation using the Redb embedded database, a fast, lightweight key-value store written in Rust.

Module Structure

mnem-backend-redb/
├── src/
│   ├── lib.rs           # Main entry point and configuration
│   ├── blockstore.rs    # BlockStore implementation
│   └── knn_edges_store.rs # KNN edge storage with HNSW

Initialization

The backend initializes by opening or creating a Redb database file:

// crates/mnem-backend-redb/src/lib.rs
pub struct Backend {
    db: redb::Database,
    path: PathBuf,
}

impl Backend {
    pub fn open(path: &Path) -> Result<Self> {
        let db = redb::Database::create(path)?;
        Ok(Self { db, path: path.to_path_buf() })
    }
}

Sources: crates/mnem-backend-redb/src/lib.rs:1-50

BlockStore Implementation

The Redb blockstore implementation wraps the database with CAR-compatible semantics:

// crates/mnem-backend-redb/src/blockstore.rs
impl Blockstore for RedbBlockstore {
    fn get(&self, cid: &Cid) -> Result<Option<Vec<u8>>> {
        let key = cid.to_bytes();
        let guard = self.db.begin()?;
        let table = guard.open_table(BLOCKS_TABLE)?;
        Ok(table.get(key)?.map(|v| v.value().as_bytes().to_vec()))
    }
    
    fn put(&self, block: &[u8]) -> Result<Cid> {
        let hash = multihash::Sha256::digest(block);
        let cid = Cid::new_v1(DAG_CBOR, hash);
        // Store with CID bytes as key
    }
}

Sources: crates/mnem-backend-redb/src/blockstore.rs:1-100

Storage Format

The redb backend organizes data into multiple tables:

Table Name	Key Type	Value Type	Purpose
`blocks`	CID bytes	Raw block data	Content-addressed storage
`op_heads`	Fixed key	CID bytes	Operation head references
`knn_edges`	NodeId	Serialized edges	Vector similarity graph

Transaction Model

mnem implements a transactional write model through the Transaction type in the repository layer. Transactions provide ACID-like semantics for graph modifications.

graph LR
    A[Begin Transaction] --> B[Add Nodes]
    B --> C[Add Edges]
    C --> D[Commit]
    D --> E[Update OpHeads]
    E --> F[Success]
    
    D --> G[Abort]
    G --> H[Rollback]

Write Operations

Transactions support atomic batch operations:

// Conceptual transaction interface
impl Transaction {
    pub fn add_node(&mut self, node: Node) -> Result<NodeId>;
    pub fn add_edge(&mut self, source: NodeId, target: NodeId, relation: &str) -> Result<()>;
    pub fn commit(self) -> Result<Commit>;
}

Sources: crates/mnem-core/src/repo/mod.rs:1-100

Commit Structure

Each commit creates an immutable snapshot of the repository state:

// crates/mnem-core/src/objects/commit.rs
pub struct Commit {
    pub operation: Operation,
    pub parent: Option<Cid>,
    pub author: Author,
    pub message: String,
    pub timestamp: DateTime<Utc>,
}

Sources: crates/mnem-core/src/objects/commit.rs:1-50

Data Persistence Flow

sequenceDiagram
    participant App as Application
    participant Tx as Transaction
    participant BS as BlockStore
    participant OHS as OpHeadsStore
    participant Redb as Redb DB

    App->>Tx: begin()
    Tx->>Tx: add_node(node)
    Tx->>BS: put(block)
    BS->>Redb: write(cid, data)
    Tx->>Tx: add_edge(src, dst)
    Tx->>BS: put(block)
    Tx->>Tx: commit()
    Tx->>BS: put(commit_block)
    Tx->>OHS: put_head(new_op)
    OHS->>Redb: update_heads()
    Redb-->>Tx: success
    Tx-->>App: commit_cid

Configuration

Storage backend behavior is configured through the Config structure:

// crates/mnem-cli/src/config.rs
pub struct Config {
    pub store: Option<StoreConfig>,
    pub data_dir: PathBuf,
    // ...
}

pub struct StoreConfig {
    pub path: PathBuf,
    pub flush_interval_ms: Option<u64>,
}

Sources: crates/mnem-cli/src/config.rs:1-100

Configuration Options

Option	Type	Default	Description
`path`	`PathBuf`	`data/`	Base directory for storage files
`flush_interval_ms`	`u64`	`1000`	Periodic flush interval in milliseconds

Object Types and Serialization

Node Storage

Nodes are serialized using DAG-CBOR and stored as blocks:

// crates/mnem-core/src/objects/node.rs
pub struct Node {
    pub id: NodeId,
    pub ntype: NodeType,
    pub summary: Option<String>,
    pub props: BTreeMap<String, Ipld>,
    pub content: Option<Bytes>,
    pub context_sentence: Option<String>,
}

Sources: crates/mnem-core/src/objects/node.rs:1-100

Edge Storage

Edges link nodes with labeled relationships:

// crates/mnem-core/src/objects/edge.rs
pub struct Edge {
    pub source: NodeId,
    pub target: NodeId,
    pub relation: String,
    pub confidence: Option<f32>,
}

Sources: crates/mnem-core/src/objects/edge.rs:1-50

Error Handling

Storage operations return the Error type defined in the core crate:

// crates/mnem-core/src/store/mod.rs
pub enum Error {
    #[error("block not found: {0}")]
    BlockNotFound(Cid),
    #[error("serialization failed: {0}")]
    SerializationFailed(String),
    #[error("database error: {0}")]
    DatabaseError(String),
}

Sources: crates/mnem-core/src/store/mod.rs:1-100

Error Recovery

Error Type	Recovery Strategy
`BlockNotFound`	Indicates data corruption; repository repair required
`SerializationFailed`	Check data integrity; may indicate schema mismatch
`DatabaseError`	Retry operation; check disk space and permissions

Indexes and Secondary Storage

Vector Index

The KNN edges store maintains a vector index for similarity search operations:

graph TD
    A[Query Vector] --> B[KnnEdgesStore]
    B --> C[HNSW Index]
    C --> D[Approximate KNN Search]
    D --> E[Top-K Results]

The implementation uses HNSW (Hierarchical Navigable Small World) algorithm for efficient approximate nearest neighbor search. Sources: crates/mnem-backend-redb/src/knn_edges_store.rs:1-100

Query Interface

The retrieve module composes vector search with graph traversal:

// crates/mnem-core/src/retrieve/mod.rs
pub struct Retriever {
    blockstore: Arc<dyn Blockstore>,
    knn_edges: Arc<dyn KnnEdgesStore>,
    // ...
}

Repository Model - Transaction and commit management
Object Schema - Node, Edge, and Commit structures
Retrieval System - Query and retrieval workflows
Configuration Guide - Storage configuration options

Sources: crates/mnem-core/src/store/op_heads.rs:1-50

Ingestion Pipeline

The Ingestion Pipeline is the core system in mnem responsible for transforming external source documents (Markdown, PDFs, code files, conversations) into structured graph nodes within the repository. It handles parsing, chunking, entity extraction, and writing to the graph transaction—all without committing, allowing callers to control transaction boundaries.

Overview

The pipeline orchestrates a multi-stage process:

Detection — Determine source kind from file extension or explicit configuration
Parsing — Convert raw bytes into a list of Section objects
Chunking — Split sections into semantically meaningful Chunk objects
Extraction — Optionally identify entities and relations via rule-based or LLM providers
Writing — Add nodes and edges to a borrowed Transaction

graph TD
    A[Raw Bytes] --> B[SourceKind Detection]
    B --> C[Parser Selection]
    C --> D[Parse to Sections]
    D --> E[Chunker Strategy]
    E --> F[Extract Entities & Relations]
    F --> G[Transaction Write]
    G --> H[IngestResult]
    
    C -->|md| C1[Markdown Parser]
    C -->|pdf| C2[PDF Parser]
    C -->|code| C3[Tree-sitter Parser]
    C -->|json/jsonl| C4[Conversation Parser]
    C -->|text| C5[Plain Text]

Source Kind Detection

The Ingester automatically detects the source kind based on file extension. This determines both the parser and the default chunking strategy.

Extension(s)	SourceKind	Default Chunker
`.md`, `.markdown`	`Markdown`	`Paragraph`
`.txt`	`Text`	`SentenceRecursive` (256 tokens, 32 overlap)
`.pdf`	`Pdf`	`SentenceRecursive` (512 tokens, 64 overlap)
`.json`, `.jsonl`	`Conversation`	`Session` (max 10 messages)
`.rs`	`Code(Rust)`	`Structural`
`.py`, `.pyi`	`Code(Python)`	`Structural`
`.js`, `.mjs`, `.cjs`	`Code(JavaScript)`	`Structural`
`.ts`, `.tsx`, `.mts`, `.cts`	`Code(TypeScript)`	`Structural`
`.go`	`Code(Go)`	`Structural`
`.java`	`Code(Java)`	`Structural`
`.c`, `.h`	`Code(C)`	`Structural`
`.cpp`, `.cc`, `.cxx`, `.hpp`	`Code(Cpp)`	`Structural`
`.rb`, `.gemspec`, `.rake`	`Code(Ruby)`	`Structural`
`.cs`, `.csx`	`Code(CSharp)`	`Structural`
Unknown/ext none	`Text`	`SentenceRecursive`

Sources: pipeline.rs:source_kind_from_ext() types.rs:SourceKind types.rs:CodeLanguage::from_extension()

Supported File Formats

Markdown (`.md`, `.markdown`)

Parsed using CommonMark + GitHub Flavored Markdown (GFM) support. The parser extracts headings with depth information, creating section boundaries that respect document structure. Each heading becomes a section boundary.

PDF (`.pdf`)

Pure-Rust text-layer extraction using pdf-extract. One section per page is created, with heading set to "Page {n}" at depth 1. PDFs with fewer than 100 text characters per page are flagged as potentially scanned. Malformed PDFs return Error::ParseFailed. Sources: pdf.rs:MIN_TEXT_PER_PAGE pdf.rs:parse_pdf()

Code Files

Parsed using tree-sitter for supported languages (Rust, Python, JavaScript, TypeScript, Go, Java, C, Cpp, Ruby, CSharp). The parser extracts function and class bodies as sections, preserving structural boundaries. Sources: code.rs

Conversations (`.json`, `.jsonl`)

Supports chat exports from ChatGPT, Claude, and generic conversation formats. Messages are extracted with role (user/assistant/system), content, and timestamps when available. Sources: conversation.rs

Plain Text (`.txt` and others)

Falls back to plain text parsing for unknown extensions, including files without extensions like README.

Chunker Strategies

The ChunkerKind enum defines five chunking strategies. Callers can override the auto-selected strategy via CLI or API.

Strategy Selection

graph TD
    A[SourceKind] --> B[auto_chunker]
    B -->|Markdown| C[Paragraph]
    B -->|Text| D[SentenceRecursive<br/>256 tokens, 32 overlap]
    B -->|Pdf| E[SentenceRecursive<br/>512 tokens, 64 overlap]
    B -->|Conversation| F[Session<br/>max 10 messages]
    B -->|Code| G[Structural]

Paragraph Chunker

Splits each section's body on double-newline boundaries. Fast and deterministic, ideal for Markdown where authoring structure already matches desired chunk boundaries. Sources: chunk.rs:ChunkerKind::Paragraph

Recursive Chunker

Token-budgeted word-window sliding window with configurable overlap. Kept for backwards compatibility. Sources: chunk.rs:ChunkerKind::Recursive

SentenceRecursive Chunker

Sentence-aware token-budgeted packing using Unicode sentence boundaries (UAX #29). Preferred for prose:

Chunks never cut mid-sentence
Overlap measured at sentence granularity
Average chunk size is more uniform

Default for Text (256 tokens, 32 overlap) and Pdf (512 tokens, 64 overlap) source kinds. Sources: chunk.rs:ChunkerKind::SentenceRecursive chunk.rs:auto_chunker()

Session Chunker

Groups contiguous conversation messages into session chunks. Boundaries fire on:

Role returning to user, OR
Reaching max_messages (default: 10)

Preserves turn ordering. Default for Conversation source kind. Sources: chunk.rs:ChunkerKind::Session

Structural Chunker

One chunk per section. Used for code sources where each section is already a function or class body extracted by the tree-sitter parser. Sources: chunk.rs:ChunkerKind::Structural

Entity Extraction

The pipeline optionally extracts entities and relations using configured extractors.

RuleExtractor (Default)

Delegates entity detection to a NerProvider (default: capitalized-phrase heuristic) and proximity-based relation detection via verb-window regex. Supported relation patterns include: joined, founded, acquired, owns, hired, etc. Sources: extract.rs:RuleExtractor extract.rs:verb_window

Optional: OllamaExtractor

Schema-constrained NER via a local Ollama server (gated behind ollama feature). Hallucinated spans are verified against section text and rejected. Failures degrade gracefully to empty results, keeping the rule-based baseline as the load-bearing path. Sources: lib.rs:extract_llm

Optional: KeyBertAdapter

Statistical entity extraction adapter driven by the server's configured embedder (gated behind keybert feature). Sources: lib.rs:extract_keybert

Pipeline API

Ingester Configuration

pub struct IngestConfig {
    pub chunker: ChunkerKind,
    pub extractor: ExtractorKind,
    pub ner_provider: Option<NerProviderKind>,
    pub include_text: bool,
}

Sources: pipeline.rs:IngestConfig lib.rs:IngestConfig

Core Method

pub fn ingest(
    &self,
    tx: &mut Transaction,
    bytes: &[u8],
    kind: SourceKind,
) -> Result<IngestResult, Error>

Returns an IngestResult with counts and elapsed time. The commit_cid field is left None—callers who want a CID should call tx.commit(...) afterwards.

Errors:

Error::ParseFailed — parser rejects the input
Error::UnsupportedSource — source kind not covered
Error::Commit — upstream codec/blockstore failures from Transaction::add_node/add_edge

Sources: pipeline.rs:Ingester::ingest()

CLI Integration

The mnem ingest command provides CLI access to the pipeline.

mnem ingest notes.md
mnem ingest --text "The quick brown fox"
mnem ingest --chunker recursive --max-tokens 1024 book.pdf
mnem ingest --recursive docs/

CLI Options

Flag	Description	Default
`--chunker`	Strategy selection	`auto`
`--max-tokens`	Target tokens per chunk	512
`--overlap`	Overlap tokens (recursive)	32
`--recursive`	Walk directory trees	false
`--ntype`	Root Doc node label	`Doc`
`-m`, `--message`	Commit message	Auto-generated

Sources: commands/ingest.rs:Args

Output Nodes

The pipeline writes three node types to the graph:

Doc node — Root node representing the ingested document
Chunk nodes — Smaller content pieces with summary, content, context_sentence, and props fields
Entity nodes — Extracted entities with span information
Relation edges — Connections between entities based on relation extraction

Contextual Retrieval

Each chunk optionally stores a context_sentence—an LLM-generated one-sentence placement cue (e.g., "This paragraph is from Section 3 of a legal contract..."). This is prepended to the summary before embedding, following Anthropic's 2024 Contextual Retrieval recipe, which reports -49% to -67% retrieval-failure reduction. Sources: node.rs:Node.context_sentence

Sidecar Support

For PDFs with poor text-layer extraction, the pipeline supports escalation to external tools:

docling (gated behind sidecar-docling feature)
unstructured-ingest (gated behind sidecar-unstructured feature)

Sidecars are invoked when built-in PDF extraction quality is insufficient. Sources: lib.rs:sidecar

Token Estimation

Token counts are estimated via whitespace split (tokens_estimate field on Chunk). This is intentionally fast and deterministic. Cl100k accuracy is a documented future improvement. Sources: chunk.rs:token estimation comment

Error Handling

Error Type	Cause	Recovery
`ParseFailed`	Malformed input, encryption	Return error, don't create nodes
`UnsupportedSource`	Unknown source kind	Return error
`Commit`	Blockstore failure	Return error
Sidecar errors	Missing binary, CLI failure	Return `Error::Sidecar`
LLM extraction failure	Timeout, schema mismatch	Degrade to empty Vec

Sources: pipeline.rs:ingest errors lib.rs:Error types

Sources: pipeline.rs:source_kind_from_ext() types.rs:SourceKind types.rs:CodeLanguage::from_extension()

CLI Commands Reference

This page documents all command-line interface commands available in mnem-cli, the primary user-facing tool for interacting with mnem repositories.

Section init

Continue reading this section for the full explanation and source context.

Section status

Continue reading this section for the full explanation and source context.

Section ingest

Continue reading this section for the full explanation and source context.

Section add node

Continue reading this section for the full explanation and source context.

Overview

The mnem CLI provides a unified interface for managing a local knowledge graph repository. It supports operations including repository initialization, content ingestion, node/edge manipulation, branching, tagging, retrieval, and third-party tool integration.

graph TD
    A[mnem CLI] --> B[Repository Operations]
    A --> C[Content Ingestion]
    A --> D[Graph Manipulation]
    A --> E[Version Control]
    A --> F[Retrieval]
    A --> G[Integration]
    
    B --> B1[init]
    B --> B2[status]
    
    C --> C1[ingest]
    
    D --> D1[add node]
    D --> D2[add edge]
    
    E --> E1[log]
    E --> E2[show]
    E --> E3[refs]
    E --> E4[tag]
    E --> E5[branches]
    
    F --> F1[retrieve]
    
    G --> G1[integrate]

Sources: crates/mnem-cli/src/main.rs:40-90

Global Options

The following options are available for all commands:

Option	Short	Description
`--repo <PATH>`	`-R`	Path to the repository directory (`.mnem/`). Defaults to walking up from the current directory, like `git` does.

Sources: crates/mnem-cli/src/main.rs:33-36

Repository Operations

init

Initializes a new mnem repository.

mnem init [OPTIONS]

Option	Description
`--path <PATH>`	Custom repository path
`--name <NAME>`	Repository name
`--author <NAME>`	Default author name
`--email <EMAIL>`	Default author email

status

Prints current op-head, head commit, ref summary, and label counts.

mnem status [OPTIONS]

# Examples:
mnem status                    # current op + head commit + ref count
mnem -R ~/notes status         # explicit repo path

Content Ingestion

ingest

Parses external source files into the graph, creating Doc + Chunk + Entity nodes.

mnem ingest <PATH> [OPTIONS]

#### Supported Source Types

Extension	Source Kind	Chunker Strategy
`.md`, `.markdown`	Markdown	Paragraph
`.txt`	Plain Text	SentenceRecursive (256 tokens, 32 overlap)
`.pdf`	PDF	SentenceRecursive (512 tokens, 64 overlap)
`.json`, `.jsonl`	Conversation	Session (10 messages max)
`.rs`	Rust Code	Structural
`.py`, `.pyi`	Python Code	Structural
`.js`, `.mjs`, `.cjs`	JavaScript Code	Structural
`.ts`, `.tsx`, `.mts`, `.cts`	TypeScript Code	Structural
`.go`	Go Code	Structural
`.java`	Java Code	Structural
`.c`, `.h`	C Code	Structural
`.cpp`, `.cc`, `.cxx`, `.hpp`, `.hxx`	C++ Code	Structural
`.rb`, `.gemspec`, `.rake`, `.erb`	Ruby Code	Structural
`.cs`, `.csx`	C# Code	Structural
Other	Text	SentenceRecursive

Sources: crates/mnem-cli/src/commands/ingest.rs:1-50

#### ingest Options

Option	Description	Default
`<PATH>`	File or directory to ingest	Required (unless `--text`)
`--text`	Inline text to ingest	-
`--ntype <LABEL>`	Root Doc node label	`Doc`
`--chunker <STRATEGY>`	Chunker strategy	`auto`
`--max-tokens <N>`	Target tokens per chunk	`512`
`--overlap <N>`	Overlap tokens between chunks	`32`
`--recursive`	Walk directory trees	`false`
`-m`, `--message <MSG>`	Commit message	Auto-generated

#### Chunker Strategies

Strategy	Description
`auto`	Picks strategy based on source kind
`paragraph`	Splits on double-newline (Markdown)
`recursive`	Token-budgeted sliding window
`sentence_recursive`	Sentence-aware token packing
`session`	Groups conversation messages
`structural`	One chunk per section (code)

Sources: crates/mnem-cli/src/commands/ingest.rs:60-80

Graph Manipulation

add node

Creates a new node in the graph.

mnem add node [OPTIONS]

Option	Description
`-s`, `--summary <TEXT>`	Node summary
`--label <LABEL>`	Node type label
`--prop <KEY=VALUE>`	Property (can be repeated)
`--context-sentence <TEXT>`	Positional context for retrieval

# Examples:
mnem add node -s "Alice lives in Berlin"
mnem add node --label Person --prop name=Alice --prop city=Berlin -s "Alice is a climber"

add edge

Creates a directed edge between two nodes.

mnem add edge [OPTIONS]

Option	Description
`--from <UUID>`	Source node UUID
`--to <UUID>`	Target node UUID
`--label <LABEL>`	Edge type label
`--prop <KEY=VALUE>`	Property (can be repeated)

# Examples:
mnem add edge --from <src-uuid> --to <dst-uuid> --label knows

Sources: crates/mnem-cli/src/main.rs:65-80

Version Control

log

Walks the op-log backwards from the current head.

mnem log [OPTIONS]

Option	Description
`--limit <N>`	Maximum number of operations to show
`--format <FORMAT>`	Output format (`short`, `full`, `json`)

show

Shows the full detail of one operation.

mnem show <OPERATION_ID>

# Examples:
mnem show 01HZ...

refs

Manages symbolic references to commits.

mnem refs <SUBCOMMAND>

#### refs Subcommands

Subcommand	Description
`list`	List every ref in the current view
`set <name> <target>`	Set ref to point at a target CID
`delete <name>`	Delete a ref

# Examples:
mnem refs list
mnem refs set feature_branch 01HXYZ...
mnem refs delete old_branch

Sources: crates/mnem-cli/src/commands/refs.rs:1-45

tag

Manages named tags that point to commits.

mnem tag <SUBCOMMAND>

#### tag Subcommands

Subcommand	Description
`list`	List every `refs/tags/<name>` ref with their target CIDs
`create <name>`	Create a new tag
`delete <name>`	Delete a tag

#### tag create Options

Option	Description
`<name>`	Tag name (stored as `refs/tags/<name>`)
`target`	Optional commit CID, ref name, branch shortname, or `HEAD`
`--from <CID>`	Commit CID / ref / branch to point the tag at

# Examples:
mnem tag list
mnem tag create v0.9
mnem tag create release-2024 --from 01HZ...
mnem tag delete v0.9

Sources: crates/mnem-cli/src/commands/tag.rs:1-60

branches

Manages named branches in the repository.

mnem branches [OPTIONS]

Option	Description
`--list`	List all branches
`--create <NAME>`	Create a new branch
`--delete <NAME>`	Delete a branch
`--switch <NAME>`	Switch to a branch

#### Branch Output Format

{
  "schema": "mnem.v1.branches",
  "branches": [
    {"name": "main", "head": "<commit-cid>", "is_current": true},
    ...
  ]
}

Retrieval

retrieve

Searches the graph for nodes matching a query.

mnem retrieve [OPTIONS] <QUERY>

Option	Description	Default
`--top-k <N>`	Number of results to return	`10`
`--max-tokens <N>`	Maximum tokens in response	`4096`
`--include <FIELD>`	Fields to include (summary, context, props)	All
`--format <FORMAT>`	Output format (`text`, `json`)	`text`

# Examples:
mnem retrieve "query"
mnem retrieve --top-k 5 --max-tokens 2048 "machine learning"

Integration

integrate

Integrates mnem system prompts with third-party AI tools.

mnem integrate <HOST> [OPTIONS]

#### Supported Hosts

Host	System Prompt Path
`claude-code`	`~/.claude/CLAUDE.md`
`gemini-cli`	`~/.gemini/GEMINI.md`
`cursor`	`~/.cursor/rules/mnem.mdc`
`continue`	`~/.continue/config.json`
`zed`	`~/.config/zed/settings.json` (Linux) or `~/Library/Application Support/Zed/settings.json` (macOS)

#### integrate Options

Option	Description
`--install`	Install system prompt to host
`--uninstall`	Remove system prompt from host
`--status`	Show integration status

graph LR
    A[mnem integrate] --> B{Host Selection}
    B --> C[Claude Code]
    B --> D[Cursor]
    B --> E[Continue]
    B --> F[Zed]
    B --> G[Gemini CLI]
    
    C --> H[Markdown Marker]
    D --> H
    E --> I[JSON Field: systemMessage]
    F --> J[JSON Field: assistant.system_prompt]
    G --> H

Sources: crates/mnem-cli/src/integrate.rs:1-60

Configuration

The CLI loads configuration from ~/.config/mnem/config.toml or .mnem/config.toml in the repository root.

Setting	Description
`user.name`	Author name for commits
`user.email`	Author email for commits
`user.agent_id`	Agent identifier fallback
`llm.provider`	LLM provider (`ollama`, `openai`, `anthropic`)
`llm.model`	Model name
`llm.base_url`	API base URL (default: `http://localhost:11434`)
`llm.timeout_secs`	Request timeout (default: `120`)

#### Author String Format

The author string for commits follows this precedence:

name <email> if both present
name if only name present
email if only email present
agent_id if only that present
mnem-cli as fallback

Sources: crates/mnem-cli/src/config.rs:1-80

Command Pipeline

The following diagram shows how commands interact with the repository:

graph TD
    subgraph "CLI Layer"
        A[mnem CLI] --> B[Commands]
        B --> C[Ingest]
        B --> D[Add]
        B --> E[Retrieve]
        B --> F[Refs/Tags]
    end
    
    subgraph "Core Layer"
        C --> G[Ingester Pipeline]
        G --> H[Parser]
        H --> I[Chunker]
        I --> J[Extractor]
        J --> K[Transaction]
        
        D --> K
        F --> K
        E --> L[Retriever]
    end
    
    subgraph "Storage Layer"
        K --> M[Transaction]
        M --> N[Blockstore]
        M --> O[OpHeadsStore]
        L --> P[VectorIndex]
        P --> N
    end

Exit Codes

Code	Meaning
`0`	Success
`1`	General error
`2`	Invalid arguments
`3`	Repository not found
`4`	Object not found
`5`	Conflict detected

Sources: crates/mnem-cli/src/main.rs:40-90

GraphRAG Implementation

Related topics: Hybrid Retrieval System, Core Components

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Module Structure

Continue reading this section for the full explanation and source context.

Section Community Detection (community.rs)

Continue reading this section for the full explanation and source context.

Section Confidence Scoring (confidence.rs)

Continue reading this section for the full explanation and source context.

Related topics: Hybrid Retrieval System, Core Components

GraphRAG Implementation

GraphRAG (Graph-based Retrieval Augmented Generation) is a hybrid retrieval approach that combines vector similarity search with graph-structured knowledge representation. In mnem, the GraphRAG implementation provides community-based entity extraction, confidence scoring, and intelligent graph traversal for enhanced context retrieval.

Overview

The mnem GraphRAG system operates as a layered architecture that:

Extracts entities and relationships from ingested documents
Builds a knowledge graph with typed edges and communities
Enables community-aware retrieval that goes beyond simple vector similarity
Provides confidence-calibrated results suitable for agentic workflows

The implementation lives in crates/mnem-graphrag/ and integrates with the core retrieval pipeline in crates/mnem-core/src/retrieve/.

Core Components

Module Structure

Module	Purpose
`lib.rs`	Main entry point and public API exports
`community.rs`	Community detection and hierarchy management
`calibration.rs`	Confidence score calibration utilities
`confidence.rs`	Confidence scoring algorithms
`summarize.rs`	Community and entity summarization

Community Detection (`community.rs`)

Community detection partitions the knowledge graph into semantically coherent clusters. The implementation supports hierarchical community structures where:

Leaf communities contain tightly interconnected entities
Parent communities aggregate related sub-communities
Cross-community edges connect related concepts across boundaries

Communities are used during retrieval to:

Expand candidate sets by including related entities within the same community
Filter results to the most relevant community cluster
Enable "zoom-in" and "zoom-out" traversal patterns

Sources: crates/mnem-graphrag/src/community.rs

Confidence Scoring (`confidence.rs`)

Every extracted entity and relationship receives a confidence score based on:

Extraction evidence: Frequency and clarity of mentions in source documents
Graph connectivity: Number and strength of edges connecting to other entities
Source reliability: Document-level trust signals from the ingest pipeline

Confidence scores are normalized to a [0.0, 1.0] range and drive downstream filtering decisions.

Sources: crates/mnem-graphrag/src/confidence.rs

Calibration (`calibration.rs`)

Calibration ensures that confidence scores accurately reflect true extraction quality. The module provides:

Score distribution analysis: Histogram-based validation of score distributions
Threshold tuning: Per-use-case threshold adjustment for precision/recall tradeoffs
Calibration curves: Tools for evaluating score reliability

Sources: crates/mnem-graphrag/src/calibration.rs

Summarization (`summarize.rs`)

The summarization module generates concise descriptions for:

Individual entities: One-sentence summaries capturing core identity
Relationships: Edge labels and descriptions explaining connections
Communities: Multi-sentence overviews of community purpose and membership

Summaries are stored as context_sentence on Node objects, enabling contextual retrieval patterns described in the Anthropic Contextual Retrieval paper.

Sources: crates/mnem-graphrag/src/summarize.rs

Retrieval Integration

Community Filter (`community_filter.rs`)

The retrieval pipeline integrates GraphRAG through the community filter stage. When enabled, the retriever:

Identifies the community containing the top-scoring candidate
Expands the candidate set to include other high-confidence entities in that community
Re-ranks the expanded set using the configured reranker

graph TD
    A[Query Embedding] --> B[Vector Search]
    B --> C[Initial Candidates]
    C --> D[Community Detection]
    D --> E[Community Expansion]
    E --> F[Reranker]
    F --> G[Final Results]

Sources: crates/mnem-core/src/retrieve/community_filter.rs

Retriever Configuration

The Retriever struct in crates/mnem-core/src/retrieve/retriever.rs exposes GraphRAG-related options:

Parameter	Type	Default	Description
`graph_expand`	`Option<usize>`	`None`	Expansion radius for community-based retrieval
`graph_decay`	`Option<f32>`	`None`	Decay factor for graph traversal weights
`graph_depth`	`Option<usize>`	`None`	Maximum traversal depth
`community_filter_enabled`	`bool`	`false`	Enable community-based filtering
`ppr_size_gate`	`Option<usize>`	`None`	PPR personalization size threshold

PPR-Based Expansion

For larger graphs, mnem supports Personalized PageRank (PPR) based expansion using the adjacency index:

adjacency_index: Option<Arc<dyn AdjacencyIndex + Send + Sync>>

When the adjacency index is available, PPR mode provides:

Personalized scoring based on seed nodes
Cohesive community member inclusion
Falls back to historical decay walk when index is unavailable

Sources: crates/mnem-core/src/retrieve/retriever.rs:10-50

Data Flow

Ingest Pipeline to GraphRAG

graph LR
    A[Source File] --> B[Parser]
    B --> C[Chunker]
    C --> D[Entity Extractor]
    D --> E[Graph Builder]
    E --> F[Community Detection]
    F --> G[Confidence Scoring]
    G --> H[Committed Nodes/Edges]

The ingest pipeline in crates/mnem-ingest/src/pipeline.rs coordinates:

Parsing: Detect source type and extract raw content
Chunking: Split into manageable units using auto-selected chunker
Extraction: Rule-based or LLM-powered entity extraction
Graph building: Create nodes and edges in the transaction
Commit: Persist to the IPLD-based object store

Sources: crates/mnem-ingest/src/pipeline.rs

Chunk Strategy by Source Type

Source Kind	Chunker	Tokens	Overlap
Markdown	Paragraph	-	-
Text	SentenceRecursive	256	32
PDF	SentenceRecursive	512	64
Conversation	Session	10 messages	-
Code	Structural	-	-

Sources: crates/mnem-ingest/src/chunk.rs

Configuration

TOML Configuration

[retrieve]
limit = 20              # Maximum results
budget = 8192           # Token budget
vector_cap = 10         # Vector search candidates
graph_expand = 5        # Community expansion size
graph_depth = 3         # Traversal depth
rerank_top_k = 5        # Final reranking pool

[community]
enabled = true          # Enable community filtering
min_community_size = 3  # Minimum entities per community

CLI Configuration

# Ingest with community extraction
mnem ingest --extractor keybert docs/

# Retrieve with community expansion
mnem retrieve "query" --graph-expand 10

# Configure via config command
mnem config set retrieve.graph_expand 5

Sources: crates/mnem-cli/src/config.rs

API Reference

Core Types

#### Community

pub struct Community {
    pub id: CommunityId,
    pub parent: Option<CommunityId>,
    pub members: Vec<EntityId>,
    pub summary: Option<String>,
    pub depth: u32,
}

#### Entity

pub struct Entity {
    pub id: EntityId,
    pub ntype: String,
    pub summary: Option<String>,
    pub context_sentence: Option<String>,
    pub confidence: f32,
    pub community: Option<CommunityId>,
}

Public API (`lib.rs`)

Function	Signature	Description
`detect_communities`	`(graph: &Graph) -> Vec<Community>`	Run community detection
`score_entity`	`(entity: &Entity, graph: &Graph) -> f32`	Calculate confidence
`calibrate_scores`	`(scores: Vec<f32>) -> Vec<f32>`	Apply calibration
`summarize_community`	`(community: &Community) -> String`	Generate summary
`expand_from_seed`	`(seed: &[NodeId], depth: usize) -> Vec<NodeId>`	Graph expansion

Architecture Diagram

graph TD
    subgraph "Ingest Layer"
        I1[Markdown]
        I2[PDF]
        I3[Code]
        I4[Conversation]
    end
    
    subgraph "Extract Layer"
        E1[Rule Extractor]
        E2[LLM Extractor]
        E3[KeyBERT Adapter]
    end
    
    subgraph "Graph Layer"
        G1[Node Builder]
        G2[Edge Builder]
        G3[Community Detector]
    end
    
    subgraph "Score Layer"
        S1[Confidence Scorer]
        S2[Calibrator]
    end
    
    subgraph "Retrieve Layer"
        R1[Vector Index]
        R2[Community Filter]
        R3[Reranker]
    end
    
    I1 --> E1
    I2 --> E2
    I3 --> E1
    I4 --> E3
    
    E1 --> G1
    E2 --> G1
    E3 --> G1
    
    G1 --> G2
    G2 --> G3
    
    G3 --> S1
    S1 --> S2
    
    S2 --> R1
    R1 --> R2
    R2 --> R3

Experimental Features

E1: Community Expander

Experiment E1 enables community-expansion during retrieval:

When cfg.enabled is false (default): Stage is a no-op
When enabled: Top-N seeds' communities pull in additional cohesive members
Additive only: Never drops existing candidates
Matrix v4 showed -29pp R@10 regression with the old drop-filter semantic

E2: PPR Graph Expansion

Experiment E2 introduces Personalized PageRank for graph expansion:

Uses optional AdjacencyIndex for efficient neighborhood queries
Falls back to historical decay walk when index unavailable
Maintains byte-identical retrieval for default configuration

Sources: crates/mnem-core/src/retrieve/retriever.rs

Warnings and Diagnostics

The retrieval system emits warnings when GraphRAG features encounter issues:

Warning Code	Feature	Description
`community_filter`	Community Filter	No-op community filter triggered
`graph_mode`	PPR	PPR ran without substrate graph
`graph_expand`	Expansion	Authored adjacency list was empty
`min_confidence`	Confidence	Results fell below confidence floor
`warnings_truncated`	Diagnostics	Warning list was truncated

Sources: crates/mnem-core/src/retrieve/warnings.rs

Best Practices

Enable community filtering for queries requiring holistic context
Tune graph_expand based on graph density—larger graphs need smaller expansion radii
Calibrate confidence thresholds per use case using the calibration module
Use structural chunking for codebases to capture function-level entity granularity
Set context_sentence on high-value nodes to improve contextual retrieval

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high [feature] hermes support

The project may affect permissions, credentials, data exposure, or host boundaries.

medium README/documentation is current enough for a first validation pass.

The project should not be treated as fully validated until this signal is reviewed.

medium [bug] Broken docs links: SPEC.md, ROADMAP.md, and Architecture page

Users cannot judge support quality until recent activity, releases, and issue response are checked.

medium Maintainer activity is unknown

Users cannot judge support quality until recent activity, releases, and issue response are checked.

Doramagic Pitfall Log

Doramagic extracted 8 source-linked risk signals. Review them before installing or handing real data to the project.

1. Security or permission risk: [feature] hermes support

Severity: high
Finding: Security or permission risk is backed by a source signal: [feature] hermes support. Treat it as a review item until the current version is checked.
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/Uranid/mnem/issues/27

2. Capability assumption: README/documentation is current enough for a first validation pass.

Severity: medium
Finding: README/documentation is current enough for a first validation pass.
User impact: The project should not be treated as fully validated until this signal is reviewed.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: capability.assumptions | github_repo:1221867246 | https://github.com/Uranid/mnem | README/documentation is current enough for a first validation pass.

3. Maintenance risk: [bug] Broken docs links: SPEC.md, ROADMAP.md, and Architecture page

Severity: medium
Finding: Maintenance risk is backed by a source signal: [bug] Broken docs links: SPEC.md, ROADMAP.md, and Architecture page. Treat it as a review item until the current version is checked.
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/Uranid/mnem/issues/23

4. Maintenance risk: Maintainer activity is unknown

Severity: medium
Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: evidence.maintainer_signals | github_repo:1221867246 | https://github.com/Uranid/mnem | last_activity_observed missing

5. Security or permission risk: no_demo

Severity: medium
Finding: no_demo
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: downstream_validation.risk_items | github_repo:1221867246 | https://github.com/Uranid/mnem | no_demo; severity=medium

6. Security or permission risk: no_demo

Severity: medium
Finding: no_demo
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: risks.scoring_risks | github_repo:1221867246 | https://github.com/Uranid/mnem | no_demo; severity=medium

7. Maintenance risk: issue_or_pr_quality=unknown

Severity: low
Finding: issue_or_pr_quality=unknown。
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: evidence.maintainer_signals | github_repo:1221867246 | https://github.com/Uranid/mnem | issue_or_pr_quality=unknown

8. Maintenance risk: release_recency=unknown

Severity: low
Finding: release_recency=unknown。
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: evidence.maintainer_signals | github_repo:1221867246 | https://github.com/Uranid/mnem | release_recency=unknown

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 3

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using mnem with real data or production workflows.

[[feature] hermes support](https://github.com/Uranid/mnem/issues/27) - github / github_issue
[[bug] Broken docs links: SPEC.md, ROADMAP.md, and Architecture page](https://github.com/Uranid/mnem/issues/23) - github / github_issue
README/documentation is current enough for a first validation pass. - GitHub / issue

Source: Project Pack community evidence and pitfall evidence