Doramagic Project Pack · Human Manual

mnem

mnem serves as a personal knowledge graph for AI agents, enabling them to:

Introduction to mnem

Related topics: System Architecture, Installation Guide

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Crate Responsibilities

Continue reading this section for the full explanation and source context.

Section Nodes

Continue reading this section for the full explanation and source context.

Section Edges

Continue reading this section for the full explanation and source context.

Related topics: System Architecture, Installation Guide

Introduction to mnem

mnem is a Rust-based knowledge management system designed for AI agents. It provides a structured approach to storing, retrieving, and managing information using a DAG-based (Directed Acyclic Graph) storage architecture with content-addressed data structures.

Overview

mnem serves as a personal knowledge graph for AI agents, enabling them to:

  • Store structured information with nodes, edges, and properties in a version-controlled repository
  • Ingest various document formats including Markdown, PDF, plain text, code files, and conversation logs
  • Retrieve relevant context using vector search, sparse ranking, and token-budget packing
  • Track changes through a commit-based operation log with cryptographic signatures
  • Support branching for experimental or temporary state management

Sources: crates/mnem-core/src/lib.rs:1-30

Architecture

mnem is organized as a monorepo with multiple Rust crates:

graph TD
    subgraph "mnem Repository Structure"
        CLI["mnem-cli<br/>Command Line Interface"]
        HTTP["mnem-http<br/>HTTP API Server"]
        INGEST["mnem-ingest<br/>Document Ingestion"]
        CORE["mnem-core<br/>Core Data Model & Retrieval"]
    end
    
    CLI --> CORE
    HTTP --> CORE
    INGEST --> CORE
    
    INGEST --> |"parse/chunk/extract"| RAW[("Raw Source<br/>.md .pdf .txt .json")]
    CORE --> |"store/retrieve"| GRAPH[("Knowledge<br/>Graph")]

Crate Responsibilities

CratePurpose
mnem-coreCore data models (Node, Edge, Commit, Operation), DAG-CBOR codec, prolly trees, vector/sparse indexing, agent-facing retrieval
mnem-ingestDocument parsing, chunking strategies, entity extraction (rule-based, KeyBERT, or LLM)
mnem-cliTerminal interface for all operations
mnem-httpREST API for remote agent access

Sources: crates/mnem-core/src/lib.rs:15-25

Core Data Model

Nodes

Nodes are the fundamental unit of information storage. Each node contains:

graph LR
    subgraph "Node Structure"
        NTYPE["ntype<br/>Node Type Label"]
        CTX["context_sentence<br/>Positional Cue"]
        SUM["summary<br/>LLM-facing Text"]
        PROPS["props<br/>Property Map"]
        CONTENT["content<br/>Opaque Payload"]
    end
FieldTypeDescription
ntypeStringSemantic label (e.g., Fact, Doc, Person)
context_sentenceOption<String>LLM-generated placement cue for contextual retrieval
summaryOption<String>Primary text for embedding and retrieval
propsBTreeMap<String, Ipld>Structured key-value metadata
contentOption<Bytes>Opaque payload (document body, file data)

The context_sentence field implements Anthropic's 2024 contextual retrieval approach, storing an LLM-generated one-sentence placement cue that captures positional and relational context. Sources: crates/mnem-core/src/objects/node.rs:30-75

Edges

Edges represent relationships between nodes. They are typed links with source and target references.

Operations and Commits

ConceptDescription
OperationA single atomic change to the repository state
CommitA snapshot referencing a sequence of operations
ViewCurrent head state with references and tombstones

Operations include metadata for provenance:

pub struct Operation {
    pub author: String,
    pub agent_id: Option<String>,
    pub task_id: Option<String>,
    pub host: Option<String>,
    pub time: u64,
    pub description: String,
    pub signature: Option<Signature>,
}

Sources: crates/mnem-core/src/objects/operation.rs:15-35

Document Ingestion Pipeline

The ingestion system (mnem-ingest) handles the transformation of raw documents into chunked, indexed nodes:

flowchart LR
    RAW["Raw Source<br/>.md .pdf .txt"] --> PARSE["Parse"]
    PARSE --> SECTION["Sections"]
    SECTION --> CHUNK["Chunk"]
    CHUNK --> EXTRACT["Extract Entities<br/>Relations"]
    EXTRACT --> NODE["Nodes + Edges"]
    NODE --> STORE["Commit to Store"]

Sources: crates/mnem-ingest/src/lib.rs:20-45

Supported Source Types

SourceExtensionsStrategyDefault Chunker
Markdown.md, .markdownCommonMark + GFMParagraph
Text.txt, unknownPlain textSentenceRecursive
PDF.pdfText layer extractionSentenceRecursive
Conversation.json, .jsonlChat export formatsSession
Code.rs, .py, .js, .ts, .go, .java, .c, .cpp, .rb, .csTree-sitter parsingStructural

Sources: crates/mnem-ingest/src/pipeline.rs:45-60

Chunker Strategies

Five chunking strategies are available:

StrategyDescriptionUse Case
ParagraphSplits on double-newlinesMarkdown documents
RecursiveToken-budgeted word-window slidingBackwards compatibility
SentenceRecursiveSentence-aware token packing using Unicode boundariesProse (Text, PDF)
SessionGroups messages up to max_messagesConversation logs
StructuralOne chunk per sectionCode (function/class level)

The SentenceRecursive chunker is the preferred strategy for prose as it prevents cutting mid-sentence and produces more uniform chunk sizes. Token counts are estimated via whitespace split for speed and determinism. Sources: crates/mnem-ingest/src/chunk.rs:1-45

Entity Extraction

mnem supports multiple extraction providers:

ProviderMethod
rule (default)Capitalized phrase heuristic
keybertStatistical keyword extraction (requires feature flag)
ollamaLLM-based extraction (requires feature flag)
noneSuppress entity extraction

Sources: crates/mnem-cli/src/commands/ingest.rs:25-40

Retrieval System

The retrieval layer composes multiple ranking strategies to deliver relevant context to agents under a token budget:

graph TD
    QUERY["Query"] --> VEC["Vector Search"]
    QUERY --> SPARSE["Sparse Ranking"]
    VEC --> RERANK["Rerank"]
    SPARSE --> RERANK
    RERANK --> PACK["Token Budget Packing"]
    PACK --> RESULT["Context for Agent"]

The retriever renders nodes in a YAML-like format:

ntype: <ntype>
id: <uuid>
context: <context_sentence>
summary: <summary>
<prop_key>: <prop_value>
  • ntype and id are always present
  • context appears before summary (per Anthropic's contextual-retrieval recipe)
  • summary is clipped at 8192 chars by default
  • Scalar props are emitted in BTreeMap order; non-scalar props are skipped

Sources: crates/mnem-core/src/retrieve/mod.rs:1-50

CLI Interface

The mnem CLI provides commands for repository management:

CommandDescription
mnem ingest <path>Parse and commit documents to the graph
mnem tagManage versioned references (create, list, delete)
mnem branchCreate and manage branches

Ingest Command Options

OptionDefaultDescription
--chunkerautoStrategy: auto, paragraph, recursive, sentence_recursive, session, structural
--max-tokens512Target tokens per chunk
--overlap32Overlap tokens between chunks
--recursivefalseWalk directory trees
--extractornoneEntity extraction provider
--ner-providerruleNER method: rule, none

Sources: crates/mnem-cli/src/commands/ingest.rs:50-70

HTTP API

The HTTP server exposes REST endpoints for remote agent access:

EndpointMethodDescription
/v1/ingestPOSTIngest documents with JSON or multipart payload
/v1/branchesGETList all branches
/v1/branchesPOSTCreate a new branch

Ingest Request Parameters

ParameterTypeDescription
chunkerStringStrategy: auto, paragraph, recursive, session
max_tokensu32Target tokens per chunk
overlapu32Overlap tokens between chunks
authorStringRequired commit author
messageStringOptional commit message
extractorStringExtraction provider
ner_providerStringNER method override

Sources: crates/mnem-http/src/handlers_ingest.rs:15-45

Key Design Principles

  1. No unsafe code: The entire mnem-core crate enforces #![forbid(unsafe_code)] Sources: crates/mnem-core/src/lib.rs:30
  1. Canonical encoding: Every object type preserves byte-exact round-trip property (decode(encode(x)) == x)
  1. Deterministic retrieval: Node props use BTreeMap for consistent iteration order
  1. Extensible architecture: Sidecar support for external tools (docling, unstructured) via feature flags
  1. Branch support: Tags and branches enable experimental state management without losing history

Sources: crates/mnem-core/src/lib.rs:1-30

Installation Guide

Related topics: Introduction to mnem

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Supported Platforms

Continue reading this section for the full explanation and source context.

Section Runtime Dependencies

Continue reading this section for the full explanation and source context.

Section CLI Installation via pip

Continue reading this section for the full explanation and source context.

Related topics: Introduction to mnem

Installation Guide

Overview

The mnem project is a Git-like version control system designed specifically for AI Agent Knowledge management. It provides versioned storage, retrieval, and synchronization of structured knowledge nodes. This guide covers all supported installation methods, system requirements, and configuration steps to get mnem running on your platform.

The project is organized as a Rust monorepo with multiple crates and language bindings. Installation options include native binaries via multiple package managers, Python packages, Docker containers, and prebuilt releases. Sources: crates/mnem-core/src/lib.rs:1-20

System Requirements

Supported Platforms

mnem supports the following platforms and architectures:

PlatformArchitectureNotes
Linuxx86_64, aarch64Full support
macOSarm64 (Apple Silicon), x86_64Rosetta 2 compatible
Windowsx86_64Full support

Sources: py-packages/mnem-cli/README.md:1-20

Runtime Dependencies

ComponentRequirementPurpose
Rust toolchain1.70+ (stable)Building from source
Python3.9+Python bindings (mnem-py)
Node.js18+npm package
Docker20.10+Container deployment

Installation Methods

CLI Installation via pip

The simplest method to install the mnem CLI is through Python's package manager:

pip install mnem-cli
mnem --version

On first run, mnem automatically downloads the correct prebuilt binary for your platform from the GitHub release assets and caches it in ~/.mnem_cli/. Subsequent calls run the cached binary directly. Sources: py-packages/mnem-cli/README.md:1-15

CLI Installation via Cargo

For users with the Rust toolchain installed, install from crates.io:

cargo install --locked mnem-cli --features bundled-embedder

The --features bundled-embedder flag compiles the embedder dependency into the binary, making it self-contained without external embedding services. Sources: crates/mnem-cli/src/main.rs:1-50

CLI Installation via npm

Node.js users can install globally via npm:

npm install -g mnem-cli

Sources: py-packages/mnem-cli/README.md:1-20

Prebuilt Binaries

Download prebuilt binaries directly from the GitHub Releases page. Binaries are available for all supported platforms in the release assets.

After downloading, make the binary executable:

chmod +x mnem-*-x86_64-unknown-linux-gnu
./mnem-*-x86_64-unknown-linux-gnu --version

Docker Installation

Container-based deployment is available via Docker. The project includes both a Dockerfile and docker-compose.yml for containerized deployments.

To build the Docker image:

docker build -t mnem:latest .

For orchestrated deployments using docker-compose:

docker-compose up -d

Sources: Dockerfile, docker-compose.yml

Python Bindings

For programmatic access from Python applications, install the Python bindings package:

pip install mnem-py

This package provides the import pymnem interface for Python applications to interact with mnem repositories. Sources: py-packages/mnem-cli/README.md:1-30

Build from Source

Prerequisites

  • Rust 1.70 or later (stable toolchain)
  • Cargo (included with Rust)
  • Git

Build Steps

# Clone the repository
git clone https://github.com/Uranid/mnem.git
cd mnem

# Build the CLI
cargo build --release --bin mnem

# Build all crates
cargo build --release

Feature Flags

The project supports several feature flags to customize the build:

FeatureDescription
bundled-embedderEmbedder for local vector storage
keybertStatistical keyphrase extraction
ollamaLLM-based extraction via Ollama
sidecar-doclingPDF extraction via docling CLI
sidecar-unstructuredPDF extraction via unstructured

Sources: crates/mnem-ingest/src/lib.rs:1-60

Initial Configuration

Repository Initialization

After installation, initialize a new mnem repository:

mnem init

This creates the .mnem/ directory with the repository database (repo.redb). Sources: crates/mnem-cli/src/main.rs:1-80

Configuration File

The CLI reads configuration from .mnem/config.toml in the repository root. Configuration includes:

[user]
name = "Your Name"
email = "[email protected]"
agent_id = "agent-identifier"

[llm]
provider = "ollama"  # or "openai", "anthropic"
model = "llama3.2"
base_url = "http://localhost:11434"
timeout_secs = 120

The author string for commits follows the format name <email> when both are present. If only one is available, it uses that value alone. When neither is configured, it falls back to the agent_id or defaults to "mnem-cli". Sources: crates/mnem-cli/src/config.rs:1-50

Repository Path Resolution

The CLI automatically searches for the .mnem/ directory by walking up from the current working directory, similar to Git's behavior. You can override this with the -R / --repo flag:

mnem -R ~/notes status

Sources: crates/mnem-cli/src/main.rs:1-80

HTTP Server Deployment

Starting the Server

The HTTP server provides REST API access to mnem repositories:

mnem serve --port 8080

API Endpoints

EndpointMethodPurpose
/v1/ingestPOSTIngest documents
/v1/branchesGETList branches
/v1/branchesPOSTCreate branch
/v1/retrievePOSTQuery knowledge

Ingest Configuration

The ingest endpoint accepts JSON payloads with the following parameters:

ParameterTypeRequiredDefaultDescription
contentStringYes-Content to ingest
chunkerStringNoautoChunking strategy
max_tokensu32No512Target tokens per chunk
overlapu32No32Overlap tokens
authorStringYes-Commit author
messageStringNo"mnem http ingest"Commit message
extractorStringNo"none"Entity extractor
ner_providerStringNo"rule"NER provider

Sources: crates/mnem-http/src/handlers_ingest.rs:1-50

Verification

Verify Installation

After installation, verify the CLI is working:

mnem --version
mnem status

First-Run Wizard

On first run with no repository present, mnem launches a first-run wizard to help configure the basic settings. Returning users see the mnem status output directly. Sources: crates/mnem-cli/src/main.rs:1-80

Alternative Installation Summary

MethodCommandNotes
pippip install mnem-cliAuto-downloads binary
cargocargo install --locked mnem-cli --features bundled-embedderSelf-contained binary
npmnpm install -g mnem-cliNode.js integration
Dockerdocker-compose up -dContainerized deployment
BinaryDownload from ReleasesManual installation

Sources: py-packages/mnem-cli/README.md:1-20

Next Steps

After installation, consult these related guides:

  • Quick Start - Create your first repository and add content
  • Configuration Reference - Complete configuration options
  • Ingest Guide - Document ingestion and chunking strategies
  • Retrieve Guide - Query your knowledge base

Sources: py-packages/mnem-cli/README.md:1-20

System Architecture

Related topics: Core Components, Storage Backend

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Core Components, Storage Backend

System Architecture

Overview

mnem is a content-addressed, CRDT-based (Conflict-free Replicated Data Types) knowledge management system designed for agentic workflows. The system provides immutable content-addressed storage with a secondary vector index for retrieval-augmented generation (RAG) applications.

The architecture follows a modular design with distinct crates handling different concerns:

CratePurpose
mnem-coreCore data types, CRDT operations, storage, indexing, retrieval
mnem-ingestSource parsing, chunking, and entity extraction
mnem-cliCommand-line interface
mnem-httpHTTP API server

Sources: crates/mnem-core/src/lib.rs:1-30

Sources: crates/mnem-core/src/lib.rs:1-30

Core Components

Related topics: System Architecture, Hybrid Retrieval System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Node

Continue reading this section for the full explanation and source context.

Section Edge

Continue reading this section for the full explanation and source context.

Section Commit

Continue reading this section for the full explanation and source context.

Related topics: System Architecture, Hybrid Retrieval System

Core Components

The mnem system is built around a set of core components that work together to provide a versioned, graph-based knowledge management system. The core is implemented entirely in Rust with #![forbid(unsafe_code)], ensuring memory safety throughout the codebase. Every object type preserves byte-exact canonical-encoding round-trip properties (decode(encode(x)) == x and encode(decode(b)) == b). Sources: crates/mnem-core/src/lib.rs

System Architecture Overview

mnem implements a content-addressed graph database with prolly trees for efficient storage and retrieval. The architecture separates concerns between data structures (objects), storage (store), repository management (repo), and retrieval (retrieve).

graph TD
    subgraph "mnem-core"
        OBJ[objects: Node, Edge, Commit, View]
        PRO[prolly: TreeChunk, Builder, Cursor]
        STORE[store: Blockstore, OpHeadsStore]
        REPO[repo: ReadonlyRepo, Transaction]
        IDX[index: Query, BruteForceVectorIndex]
        RET[retrieve: Retriever]
        CODEC[codec: DAG-CBOR, DAG-JSON]
    end
    
    subgraph "mnem-ingest"
        ING[Ingester Pipeline]
        CHUNK[Chunking Strategies]
        PARSE[Parsers: MD, PDF, Code, JSON]
    end
    
    subgraph "External Interfaces"
        CLI[mnem-cli]
        HTTP[mnem-http]
        MCP[MCP Server]
    end
    
    ING -->|adds nodes/edges| REPO
    REPO -->|reads/writes| STORE
    RET -->|queries| IDX
    IDX -->|indexes| OBJ
    CODEC -->|encodes/decodes| OBJ
    CLI --> REPO
    HTTP --> REPO

Data Objects

The fundamental building blocks of the mnem knowledge graph are the core object types defined in crates/mnem-core/src/objects/. Each object is serializable via DAG-CBOR for canonical encoding. Sources: crates/mnem-core/src/lib.rs

Node

The Node is the primary unit of knowledge storage. It represents a single fact, entity, or chunk of content within the graph.

// Simplified structure from crates/mnem-core/src/objects/node.rs
pub struct Node {
    pub id: NodeId,                                    // Unique identifier
    pub ntype: String,                                 // Node type label (e.g., "Fact", "Doc")
    pub summary: Option<String>,                       // LLM-facing retrieval text
    pub props: BTreeMap<String, Ipld>,                 // Property map
    pub content: Option<Bytes>,                        // Optional opaque payload
    pub context_sentence: Option<String>,              // Positional chunk prefix
    pub ext: Option<BTreeMap<String, Ipld>>,           // Forward-compat extension map
}
FieldTypeDescription
idNodeIdUnique content-addressed identifier
ntypeStringFree-form type label for the node
summaryOption<String>Text summary for LLM retrieval under token budget
propsBTreeMap<String, Ipld>Structured metadata with any DAG-CBOR value
contentOption<Bytes>Opaque payload (document body, file data)
context_sentenceOption<String>LLM-generated placement cue per Anthropic's contextual retrieval recipe
extOption<BTreeMap>Forward-compat extension map preserving unknown fields

The summary field is designed for LLM consumption—the field agents read when assembling context under a token budget. It is distinct from props (structured) and content (opaque payload). Sources: crates/mnem-core/src/objects/node.rs

The context_sentence implements Anthropic's 2024 Contextual Retrieval paper approach, which reports -49% to -67% retrieval-failure reduction when present. mnem stores it on the node so the render path can surface it back to the agent for faithful source attribution. Sources: crates/mnem-core/src/objects/node.rs

Edge

Edges connect nodes and represent relationships between entities.

// From crates/mnem-core/src/objects/edge.rs
pub struct Edge {
    pub src: NodeId,           // Source node ID
    pub rel: String,           // Relation label (e.g., "works_at", "extracted_from")
    pub dst: NodeId,           // Destination node ID
    pub props: BTreeMap<String, Ipld>,  // Optional edge properties
}

Edges are used to create graph relationships like works_at, lives_in, traveling_with, has_preference, and extracted_from. The mnem-cli integration guidelines recommend using the compound mnem_commit_relation tool when both endpoints are entities—it resolve-or-creates both nodes and adds the edge in one call. Sources: crates/mnem-cli/src/integrate.rs

Commit

The Commit object represents a point-in-time snapshot of the repository state.

// From crates/mnem-core/src/objects/commit.rs
pub struct Commit {
    pub message: String,           // Commit message
    pub author: Author,            // Author information
    pub timestamp: Timestamp,       // Commit timestamp
    pub root: NodeId,              // Root of the node tree
    pub ops: Vec<Operation>,        // Operations applied in this commit
}

View

The View contains repository metadata including branch references and commit heads.

// Referenced in crates/mnem-http/src/handlers.rs
pub struct View {
    pub heads: Vec<Cid>,           // Current head commit CIDs
    pub refs: BTreeMap<String, RefTarget>,  // Named references
}

The View exposes branch information via the HTTP API with the schema mnem.v1.branches. Sources: crates/mnem-http/src/handlers.rs

Operation

Operations represent individual changes applied to the repository. They are collected within commits to provide a complete audit trail.

Repository Layer

The repository layer provides the main interface for interacting with the knowledge graph. It is defined across several modules in crates/mnem-core/src/repo/. Sources: crates/mnem-core/src/repo/mod.rs

ReadonlyRepo

ReadonlyRepo provides a read-only view into the repository state.

// Simplified from crates/mnem-core/src/repo/mod.rs
pub trait ReadonlyRepo {
    fn view(&self) -> &View;
    fn blockstore(&self) -> &dyn Blockstore;
}

Transaction

Transaction enables write operations to the repository. All changes are staged until explicitly committed.

// From crates/mnem-core/src/repo/transaction.rs
pub struct Transaction {
    // Internal state managing pending operations
}

impl Transaction {
    pub fn add_node(&mut self, node: Node) -> Result<NodeId, Error>;
    pub fn add_edge(&mut self, edge: Edge) -> Result<EdgeId, Error>;
    pub fn commit(self, author: Author, message: String) -> Result<ReadonlyRepo, Error>;
}

The ingest method on the Ingester pipeline uses Transaction to add nodes and edges:

Parse, chunk, extract, and write into tx. Does not commit. bytes is the raw source payload; kind says how to parse it. Returns an IngestResult with counts and elapsed time. The commit_cid field is left None - callers who want a CID should call tx.commit(...) afterwards and stash the returned ReadonlyRepo's head commit CID. Sources: crates/mnem-ingest/src/pipeline.rs

Merge and Conflict Detection

The merge system handles combining divergent repository states.

// From crates/mnem-core/src/repo/merge.rs
pub fn detect_conflicts(
    repo: &ReadonlyRepo,
    left: Cid,
    right: Cid,
    lca: Option<Cid>,
) -> Result<MergeConflicts, Error>;

Conflict detection supports an explicit ConflictPolicy for customizing merge behavior. The detector loads tombstone sets via the Views attached to each commit's operation. Sources: crates/mnem-core/src/repo/conflict.rs

graph LR
    A[Commit A] -->|diverged| B[Common Ancestor]
    C[Commit B] -->|diverged| B
    B --> D[Detect Conflicts]
    D --> E{MergeConflicts?}
    E -->|Yes| F[Surface conflicts to caller]
    E -->|No| G[Auto-merge possible]

Prolly Trees

mnem uses prolly trees (probabilistic trees) for efficient storage and lookup of the node graph. This is implemented in crates/mnem-core/src/prolly/. Sources: crates/mnem-core/src/prolly/tree.rs

graph TD
    subgraph "Prolly Tree Structure"
        ROOT[Root Node / TreeChunk] --> LEFT[Left Child TreeChunk]
        ROOT --> RIGHT[Right Child TreeChunk]
        LEFT --> LL[Leaf TreeChunk]
        LEFT --> LR[Leaf TreeChunk]
        RIGHT --> RL[Leaf TreeChunk]
        RIGHT --> RR[Leaf TreeChunk]
    end
    
    style ROOT fill:#e1f5fe
    style LL fill:#f3e5f5
    style LR fill:#f3e5f5
    style RL fill:#f3e5f5
    style RR fill:#f3e5f5

The prolly tree implementation includes:

ComponentPurpose
TreeChunkImmutable chunk containing sorted entries
BuilderConstructs new trees from operations
CursorNavigates tree structure for lookups
diffComputes differences between trees
mergeMerges divergent tree versions

Prolly trees provide logarithmic-time lookups and efficient diffing for collaborative editing scenarios. Sources: crates/mnem-core/src/lib.rs

Storage Layer

The storage layer abstracts over different backend implementations.

Blockstore

// From crates/mnem-core/src/lib.rs
pub trait Blockstore {
    fn get(&self, cid: &Cid) -> Result<Option<Vec<u8>>, Error>;
    fn put(&self, cid: &Cid, data: &[u8]) -> Result<(), Error>;
}

OpHeadsStore

// From crates/mnem-core/src/lib.rs
pub trait OpHeadsStore {
    fn get_heads(&self) -> Result<Vec<Cid>, Error>;
    fn set_heads(&mut self, heads: &[Cid]) -> Result<(), Error>;
}

The codebase includes in-memory reference implementations of both traits for testing and development. Sources: crates/mnem-core/src/lib.rs

Index System

Secondary indexes enable efficient querying of the knowledge graph.

Query

The primary query interface for searching nodes and edges.

BruteForceVectorIndex

A vector index implementation for semantic search capabilities. This works in conjunction with the retrieve module to provide dense + sparse retrieval lanes that capture positional and relational context. Sources: crates/mnem-core/src/lib.rs

Retrieval System

The retrieve module provides the agent-facing interface for context assembly.

// From crates/mnem-core/src/retrieve/mod.rs
pub struct Retriever { /* ... */ }

The retriever composes:

  1. Filters - Pre-filter nodes by type, properties, or time range
  2. Vector ranking - Dense embeddings from the configured embedder
  3. Sparse ranking - BM25-style keyword matching
  4. Token-budget packing - Assembles context within LLM token limits

Node Rendering

Nodes are rendered to a compact, deterministic YAML-like format suitable for LLM consumption:

ntype: <ntype>
id: <uuid>
context: <context_sentence>
summary: <summary>
<prop_key>: <prop_value>
  • ntype and id are always present
  • context is emitted if node.context_sentence is Some (sits BEFORE summary per Anthropic's contextual-retrieval recipe)
  • summary is emitted if node.summary is Some, clipped at DEFAULT_RENDER_SUMMARY_CAP_CHARS (8192) chars
  • Scalar props (String, Integer, Float, Bool) are emitted in BTreeMap order
  • Non-scalar props (Link, Map, List, Bytes, Null) are skipped
  • Opaque content bytes are never rendered Sources: crates/mnem-core/src/retrieve/mod.rs

Identification System

mnem uses phantom-typed identifiers for type safety:

TypeDescription
NodeIdIdentifies a node in the graph
EdgeIdIdentifies an edge
ChangeIdIdentifies a change operation
OperationIdIdentifies an operation
Link<T>Phantom-typed link to any type

All CIDs are content-addressed, ensuring that the same content always produces the same identifier. Sources: crates/mnem-core/src/lib.rs

Codec System

The codec system provides canonical encoding and decoding:

// From crates/mnem-core/src/lib.rs
pub mod codec {
    pub fn encode<T: Encode>(&self, value: &T) -> Vec<u8>;
    pub fn decode<T: Decode>(&self, bytes: &[u8]) -> Result<T, Error>;
}
  • DAG-CBOR - Primary serialization format with canonical encoding guarantees
  • DAG-JSON - Debug export format for human inspection

Every object type preserves the byte-exact canonical-encoding round-trip property. Sources: crates/mnem-core/src/lib.rs

Signing System

The sign module provides Ed25519 signing and revocation-list verification for trust and integrity:

// From crates/mnem-core/src/lib.rs
pub mod sign {
    // Ed25519 signing operations
    // Revocation-list verification
}

Chunking Integration

While chunking is primarily handled by the mnem-ingest crate, the core objects are designed to work seamlessly with chunked content:

The Chunk type is used throughout the system:

// Referenced from crates/mnem-ingest/src/chunk.rs
pub struct Chunk {
    pub content: String,
    pub tokens_estimate: usize,  // Fast whitespace-split estimation
}

Chunks preserve source order: section 0's chunks come before section 1's. Empty sections are skipped silently. Sources: crates/mnem-ingest/src/chunk.rs

Summary

The mnem core components form a layered architecture:

LayerComponentsResponsibility
ObjectsNode, Edge, Commit, View, OperationCore data structures
StorageBlockstore, OpHeadsStorePersistence abstraction
TreesProllyTree, TreeChunk, BuilderEfficient ordered storage
RepositoryReadonlyRepo, TransactionAccess control and mutation
IndexQuery, VectorIndexSecondary access paths
RetrievalRetrieverAgent-facing context assembly
CodecDAG-CBOR, DAG-JSONCanonical serialization
CryptoEd25519 signingIntegrity and trust

This architecture enables mnem to serve as a versioned, collaborative knowledge graph with strong consistency guarantees and efficient retrieval capabilities for LLM integration.

Source: https://github.com/Uranid/mnem / Human Manual

Hybrid Retrieval System

Related topics: Embedding Providers

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Retriever

Continue reading this section for the full explanation and source context.

Section Node Rendering

Continue reading this section for the full explanation and source context.

Section Context Sentence (Anthropic Contextual Retrieval)

Continue reading this section for the full explanation and source context.

Related topics: Embedding Providers

Hybrid Retrieval System

Overview

The Hybrid Retrieval System in mnem is an agent-facing retrieval subsystem that composes multiple ranking strategies—vector (dense), sparse, and graph-based expansion—into a unified token-budgeted context assembly pipeline. It is designed for LLM consumption, enabling autonomous agents to fetch relevant nodes from the repository under strict token budgets.

The system lives in crates/mnem-core/src/retrieve/ and is exposed via HTTP API (crates/mnem-http/src/handlers.rs) and CLI (crates/mnem-cli/).

Sources: crates/mnem-core/src/lib.rs:18-23

Architecture

graph TD
    subgraph "Retrieval Entry Points"
        HTTP[HTTP API: POST /v1/retrieve]
        CLI[CLI: mnem retrieve]
    end
    
    subgraph "Hybrid Retrieval Core"
        RT[Retriever]
        HF[Hybrid Fuser]
        VQ[Vector Query]
        SQ[Sparse Query]
        GQ[Graph Expansion]
        TB[Token Budget Packer]
    end
    
    subgraph "Indexes"
        VI[Vector Index]
        SI[Sparse Index]
        GI[Graph Index]
    end
    
    HTTP --> RT
    CLI --> RT
    RT --> HF
    HF --> VQ
    HF --> SQ
    HF --> GQ
    VQ --> VI
    SQ --> SI
    GQ --> GI
    HF --> TB --> Output[LLM Context]

Core Components

Retriever

The Retriever struct is the main facade for retrieval operations. It orchestrates query planning, index selection, and result fusion.

Key Responsibilities:

  • Accept a query string and configuration parameters
  • Dispatch parallel queries to vector, sparse, and graph indexes
  • Fuse ranked results using configurable strategies
  • Pack results into token budgets suitable for LLM context windows

Sources: crates/mnem-core/src/retrieve/mod.rs:1-50

Node Rendering

Before results reach the LLM, nodes are rendered to a compact, deterministic YAML-like text representation:

ntype: <ntype>
id: <uuid>
context: <context_sentence>
summary: <summary>
<prop_key>: <prop_value>
...

Rendering Rules:

FieldConditionNotes
ntypeAlwaysNode type identifier
idAlwaysUUID
contextIf node.context_sentence is SomePosition cue, emitted BEFORE summary
summaryIf node.summary is SomeClipped at 8192 chars by default
Scalar propsAlwaysStrings, integers, floats, booleans in BTreeMap order
Non-scalar propsSkippedLinks, Maps, Lists, Bytes, Null

Sources: crates/mnem-core/src/retrieve/mod.rs:60-95

Context Sentence (Anthropic Contextual Retrieval)

mnem implements Anthropic's 2024 Contextual Retrieval recipe. Each node may carry an optional context_sentence field—an LLM-generated one-sentence placement cue.

"This paragraph is from Section 3 of a legal contract between Alice and Bob's employer..."

The ingest pipeline prepends this to summary before embedding so both dense and sparse lanes capture positional and relational context.

Sources: crates/mnem-core/src/objects/node.rs:95-115

Retrieval Configuration

CLI Configuration Keys

KeyTypeDefaultDescription
retrieve.limitusizeMaximum results to return
retrieve.budgetu32Token budget for result packing
retrieve.vector_capusizeVector index candidate cap
retrieve.graph_expandusizeGraph neighbor expansion count
retrieve.graph_depthusizeGraph traversal depth
retrieve.graph_decayu32Decay factor for graph scores
retrieve.rerank_top_kusizeTop-K for re-ranking
retrieve.hyde_max_tokensusizeMax tokens for HyDE hypothesis
rerank.modelStringRe-ranker model identifier
rerank.base_urlStringRe-ranker service base URL

Sources: crates/mnem-cli/src/config.rs:1-100

HTTP API Parameters

The POST /v1/retrieve endpoint accepts the following JSON body:

FieldTypeDefaultDescription
queryStringRequiredSearch query
limitusize20Result limit (clamped to MAX_RETRIEVE_LIMIT)
vector_capusizeVector candidate cap (clamped to MAX_VECTOR_CAP)
rerank_top_kusizeRe-rank candidate count (clamped to MAX_RERANK_TOP_K)
hydeboolfalseEnable HyDE extractive summarization
summarizeboolfalseEnable centroid + MMR summarization
summarize_kusize3Summary sentences count

Clamping Constants:

  • MAX_RETRIEVE_LIMIT — Prevents unbounded result sets
  • MAX_VECTOR_CAP — Bounds vector search candidates
  • MAX_RERANK_TOP_K — Limits re-ranking computation

Sources: crates/mnem-http/src/handlers.rs:200-280

Chunking Strategies

The retrieval system operates on pre-chunked content. The ingest pipeline supports five chunking strategies, selectable per source kind:

StrategySource KindConfigurationBehavior
ParagraphMarkdownNoneSplits on double-newline boundaries
SentenceRecursiveTextmax_tokens, overlapSentence-aware token-budgeted packing using Unicode UAX #29 boundaries
SentenceRecursivePDFmax_tokens=512, overlap=64Same as above with larger defaults
SessionConversationmax_messages=10Groups messages until role returns to user or max reached
StructuralCodeNoneOne chunk per section (function/class body from tree-sitter parser)
Recursive(legacy)max_tokens, overlapToken-budgeted word-window sliding window

Sources: crates/mnem-ingest/src/chunk.rs:1-100

Auto-Chunking

The auto_chunker(kind, heuristics) function selects optimal strategies:

match kind {
    SourceKind::Markdown => ChunkerKind::Paragraph,
    SourceKind::Text => ChunkerKind::SentenceRecursive { max_tokens: 256, overlap: 32 },
    SourceKind::Pdf => ChunkerKind::SentenceRecursive { max_tokens: 512, overlap: 64 },
    SourceKind::Conversation => ChunkerKind::Session { max_messages: 10 },
    SourceKind::Code(_) => ChunkerKind::Structural,
}

Sources: crates/mnem-ingest/src/chunk.rs:40-65

Source Kind Taxonomy

KindExtensionsParserIndex Type
Markdown.md, .markdownparse_markdownHybrid
Pdf.pdfSidecar (docling/unstructured)Hybrid
Conversation.json, .jsonlSession parserSession
TextOther/unspecifiedRaw textHybrid
Code(Rust).rsTree-sitterStructural
Code(Python).py, .pyiTree-sitterStructural
Code(JavaScript).js, .mjs, .cjsTree-sitterStructural
Code(TypeScript).ts, .tsx, .mts, .ctsTree-sitterStructural
Code(Go).goTree-sitterStructural
Code(Java).javaTree-sitterStructural
Code(C).c, .hTree-sitterStructural
Code(Cpp).cpp, .cc, .cxx, .hppTree-sitterStructural
Code(Ruby).rb, .gemspec, .rake, .erbTree-sitterStructural
Code(CSharp).cs, .csxTree-sitterStructural

Sources: crates/mnem-ingest/src/types.rs:1-80

Retrieval Flow

sequenceDiagram
    participant Client
    participant Retriever
    participant VectorIndex
    participant SparseIndex
    participant GraphIndex
    participant Fuser
    participant TokenBudgetPacker
    participant LLM

    Client->>Retriever: query + config
    Retriever->>VectorIndex: vector_search(query)
    Retriever->>SparseIndex: sparse_search(query)
    Retriever->>GraphIndex: graph_expand(seed_nodes)
    VectorIndex-->>Fuser: ranked_candidates
    SparseIndex-->>Fuser: ranked_candidates
    GraphIndex-->>Fuser: ranked_candidates
    Fuser->>Fuser: reciprocal_rank_fusion
    Fuser->>TokenBudgetPacker: fused_results
    alt summarize=true
        TokenBudgetPacker->>TokenBudgetPacker: centroid_MMR_extraction
    end
    TokenBudgetPacker-->>LLM: token_budgeted_context

HyDE (Hypothetical Document Embeddings)

When hyde=true, the system generates extractive summaries of top-M candidate nodes before final ranking. This follows the HyDE (Hypothetical Document Embeddings) pattern where:

  1. Initial candidates are retrieved
  2. Extractive summarization produces hypotheses
  3. Hypotheses are re-embedded and ranked
  4. Final top-K are packed into the context budget

Sources: crates/mnem-http/src/handlers.rs:250-270

Branch Name Validation

The HTTP API validates branch names before creating commit references during ingest operations:

Invalid characters: space, tab, newline, null, ~, ^, :, ?, *, [, \, @{, .., //
Invalid patterns: leading /, trailing /, trailing ., trailing .lock

Sources: crates/mnem-http/src/handlers.rs:180-210

Extractor Integration

The retrieval system works in conjunction with the entity extraction pipeline. Extractors produce entity spans and relation spans that populate the graph index:

ExtractorProviderFeatures
RuleExtractorDefault (NER)Capitalized phrase heuristic, verb-window regex relations
KeyBertAdapterStatisticalRequires keybert feature flag
LLMOllamaRequires ollama feature flag

Sources: crates/mnem-ingest/src/extract.rs:1-100

Configuration Example

[retrieve]
limit = 20
budget = 4096
vector_cap = 100
graph_expand = 5
graph_depth = 2
graph_decay = 80
rerank_top_k = 10
hyde_max_tokens = 256

[rerank]
model = "cross-encoder/ms-marco-MiniLM-L-6-v2"
base_url = "http://localhost:8080"

[ner]
provider = "rule"  # or "none"

Sources: crates/mnem-cli/src/config.rs:50-120

See Also

Sources: crates/mnem-core/src/lib.rs:18-23

Embedding Providers

Related topics: Hybrid Retrieval System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section OpenAI Provider

Continue reading this section for the full explanation and source context.

Section Ollama Provider

Continue reading this section for the full explanation and source context.

Section ONNX Provider

Continue reading this section for the full explanation and source context.

Related topics: Hybrid Retrieval System

Embedding Providers

Embedding Providers is a pluggable subsystem in the mnem monorepo that abstracts the generation of vector embeddings for text content. It lives in the crates/mnem-embed-providers crate and is consumed by mnem-cli, mnem-http, and mnem-mcp to support dense vector indexing and semantic retrieval.

Architecture Overview

The provider system follows a strategy pattern with runtime-configurable backends. Each provider implements the same Embedder trait, returning Vec<f32> vectors regardless of the underlying implementation (HTTP API, local model, ONNX runtime).

graph TD
    A["mnem-cli / mnem-http / mnem-mcp"] --> B["mnem-embed-providers"]
    B --> C["ProviderConfig"]
    C --> D["OpenAI Provider"]
    C --> E["Ollama Provider"]
    C --> F["ONNX Provider"]
    D --> G["REST API / OpenAI Compatible"]
    E --> H["Local Ollama Server"]
    F --> I["Local ONNX Runtime"]
    
    J["config.toml / ENV vars"] --> B

Sources: crates/mnem-cli/src/commands/mod.rs:1-50

Supported Providers

ProviderBackend TypeModel SelectionConfiguration
OpenAIRemote REST APIVia model fieldbase_url, api_key, timeout_secs
OllamaLocal REST APIVia model fieldbase_url (default: http://localhost:11434), timeout_secs
ONNXLocal ONNX RuntimeBundled all-MiniLM-L6-v2No network required

Sources: crates/mnem-cli/src/config.rs:1-80

OpenAI Provider

Sends text to OpenAI's embedding API or any OpenAI-compatible endpoint. Requires:

  • base_url: API endpoint (default: https://api.openai.com/v1)
  • api_key: Authentication token
  • model: Embedding model identifier

Ollama Provider

Connects to a local Ollama server for running open-source embedding models. Default endpoint is http://localhost:11434. The provider sets a 120-second timeout by default.

ONNX Provider

Runs inference entirely offline using the ONNX Runtime with the all-MiniLM-L6-v2 model. This is the bundled default when mnem is compiled with the bundled-embedder feature, providing zero-configuration embeddings for single-machine deployments.

Sources: crates/mnem-mcp/src/tools/embed.rs:1-60

Configuration Resolution

Embedding providers are configured through a precedence chain that varies slightly between consumer applications.

mnem-cli Precedence

PrioritySourceFields
1Environment variablesMNEM_EMBED_PROVIDER, MNEM_EMBED_MODEL, MNEM_EMBED_API_KEY_ENV, MNEM_EMBED_BASE_URL, MNEM_EMBED_DIM
2~/.mnem/config.toml[embed] section
3<repo>/config.toml[embed] section
4Bundled ONNX fallbackWhen compiled with bundled-embedder feature

Sources: crates/mnem-cli/src/config.rs:80-120

mnem-http Precedence

PrioritySourceBehavior
1POST /v1/embed request bodyPer-request model override
2<data_dir>/config.tomlServer-wide [embed] section

The HTTP server loads embed configuration lazily at startup. A malformed [embed] section logs a warning but does not prevent server startup—auto-embed simply remains disabled.

fn load_embed_config(data_dir: &Path) -> Option<mnem_embed_providers::ProviderConfig> {
    #[derive(serde::Deserialize)]
    struct MiniCfg {
        embed: Option<mnem_embed_providers::ProviderConfig>,
    }
    let path = data_dir.join("config.toml");
    let s = std::fs::read_to_string(&path).ok()?;
    match toml::from_str::<MiniCfg>(&s) {
        Ok(parsed) => parsed.embed,
        Err(e) => {
            tracing::warn!(path = %path.display(), error = %e,
                "config.toml [embed] parse failed; auto-embed disabled"
            );
            None
        }
    }
}

Sources: crates/mnem-http/src/lib.rs:1-50

mnem-mcp Precedence

The MCP server uses a simplified three-tier chain without the global ~/.mnem/config.toml lookup (design point: per-repo isolation):

  1. MNEM_EMBED_* environment variables
  2. <repo>/config.toml [embed] section
  3. Bundled ONNX fallback (only when bundled-embedder feature is compiled)

Sources: crates/mnem-mcp/src/tools/embed.rs:20-40

ProviderConfig Schema

The configuration is parsed from TOML into a discriminated union:

pub enum ProviderConfig {
    Openai(OpenaiConfig),
    Ollama(OllamaConfig),
    Onnx(OnnxConfig),
}

Each variant carries only the parameters relevant to that provider, keeping the configuration minimal.

Error Handling

All embedding operations return EmbedError, which is mapped from transport failures into actionable diagnostics:

graph LR
    A["ureq::Error"] --> B{"EmbedError"}
    B --> C["RateLimited"]
    B --> D["BadRequest<br/>status + body"]
    B --> E["Server<br/>status + body"]
    B --> F["Network<br/>transport message"]
    B --> G["Decode<br/>JSON parse failure"]

Sources: crates/mnem-embed-providers/src/http.rs:1-50

Error Display for Users

When embedding fails, mnem-cli formats the error into a short, actionable one-liner suitable for eprintln!:

ProviderCommon CauseSuggestion
OpenAIInvalid API keyCheck MNEM_EMBED_API_KEY_ENV
OllamaServer not runningVerify ollama serve is active
ONNXMissing model fileEnsure all-MiniLM-L6-v2 is bundled

The format_embed_failure function accepts a context parameter ("embedding" for writes, "query embedding" for retrieval) to tailor suggestions.

Sources: crates/mnem-cli/src/commands/mod.rs:50-100

Integration with Node Storage

Embedding vectors are stored on Node objects for use during semantic retrieval:

pub struct Node {
    pub id: NodeId,
    pub label: String,
    pub summary: Option<String>,           // LLM-facing retrieval text
    pub content: Option<Bytes>,           // Opaque payload
    pub context_sentence: Option<String>,  // Anthropic contextual retrieval prefix
    pub props: BTreeMap<String, Ipld>,    // Structured properties
}

The summary field is the primary text indexed by the dense embedder. The context_sentence (per Anthropic's 2024 Contextual Retrieval paper) is prepended to summary before embedding to capture positional context, reducing retrieval failure by 49-67%.

Sources: crates/mnem-core/src/objects/node.rs:1-50

Bundled Embedder Feature

The bundled-embedder Cargo feature compiles in an ONNX provider with all-MiniLM-L6-v2. When enabled:

  • mnem embed works out-of-the-box without external services
  • The MCP mnem_retrieve tool has a tier-3 fallback when no explicit vector provider is configured
  • Ideal for air-gapped environments or local-first workflows

When not enabled, missing embedder configuration results in a warning during ingest; nodes are created without vectors, and a recovery path via mnem reindex is promoted.

Summary

Embedding Providers abstracts vector generation behind a common interface, supporting three backends with distinct deployment profiles:

  • OpenAI: Cloud-hosted, highest quality, requires API credentials
  • Ollama: Self-hosted, flexible model selection, local compute
  • ONNX: Offline-capable, bundled model, zero-configuration

Configuration flows from environment variables through TOML files, with graceful fallback behavior that never prevents core operations from functioning.

Sources: crates/mnem-cli/src/commands/mod.rs:1-50

Storage Backend

Related topics: System Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Section BlockStore Trait

Continue reading this section for the full explanation and source context.

Section OpHeadsStore Trait

Continue reading this section for the full explanation and source context.

Section KnnEdgesStore Trait

Continue reading this section for the full explanation and source context.

Related topics: System Architecture

Storage Backend

The storage backend is a critical subsystem in mnem that provides persistent storage for the content-addressable object graph. It abstracts storage operations behind well-defined traits, enabling pluggable storage implementations while maintaining a consistent API for the core data layer.

Architecture Overview

The storage backend follows a trait-based abstraction pattern where mnem-core defines the storage interfaces and concrete implementations are provided by backend crates. This separation allows the core logic to remain independent of specific storage technologies.

graph TD
    subgraph "Application Layer"
        CLI[mnem-cli]
        HTTP[mnem-http]
    end
    
    subgraph "mnem-core"
        Repo[Repository]
        Transaction[Transaction]
        Objects[Node / Edge / Commit]
    end
    
    subgraph "Storage Traits"
        Blockstore[BlockStore Trait]
        OpHeadsStore[OpHeadsStore Trait]
        KnnEdgesStore[KnnEdgesStore Trait]
    end
    
    subgraph "Backend Implementations"
        RedbBackend[mnem-backend-redb]
    end
    
    CLI --> Repo
    HTTP --> Repo
    Repo --> Transaction
    Transaction --> Blockstore
    Transaction --> OpHeadsStore
    Repo --> KnnEdgesStore
    Blockstore --> RedbBackend
    OpHeadsStore --> RedbBackend
    KnnEdgesStore --> RedbBackend

Core Storage Traits

The storage layer is built on three fundamental traits that define the contract between the core library and storage implementations.

BlockStore Trait

The BlockStore trait provides low-level operations for storing and retrieving binary data blocks identified by Content Identifiers (CIDs).

// crates/mnem-core/src/store/blockstore.rs
pub trait Blockstore: Send + Sync {
    fn get(&self, cid: &Cid) -> Result<Option<Vec<u8>>>;
    fn put(&self, block: &[u8]) -> Result<Cid>;
    fn put_many<I>(&self, blocks: I) -> Result<Vec<Cid>>
    where
        I: IntoIterator<Item = Vec<u8>>,
        I::IntoIter: Send + Sync;
}
MethodPurposeReturn Type
get(cid)Retrieve a block by its CIDResult<Option<Vec<u8>>>
put(block)Store a single block, returning its CIDResult<Cid>
put_many(blocks)Batch insert multiple blocksResult<Vec<Cid>>

The trait implements the CAR (Content Addressable Archive) storage pattern where data integrity is verified through content hashing. Sources: crates/mnem-core/src/store/blockstore.rs:1-20

OpHeadsStore Trait

The OpHeadsStore trait manages operation heads—references to the latest operations in the operational transform system. It supports both single-head and multi-head scenarios with conflict detection.

// crates/mnem-core/src/store/op_heads.rs
pub trait OpHeadsStore: Send + Sync {
    fn get_heads(&self) -> Result<Vec<Cid>>;
    fn put_head(&self, op: &Op) -> Result<()>;
    fn put_heads(&self, ops: &[Op]) -> Result<()>;
    fn merge_heads(&self, merged: Vec<Cid>) -> Result<()>;
}
MethodPurpose
get_heads()Retrieve all current operation head CIDs
put_head(op)Atomically update the single head
put_heads(ops)Set multiple operation heads
merge_heads(merged)Replace heads with merged result after conflict resolution

Sources: crates/mnem-core/src/store/op_heads.rs:1-50

KnnEdgesStore Trait

The KnnEdgesStore trait provides specialized storage for k-nearest-neighbor graph edges, enabling efficient vector similarity searches.

// Backend interface for KNN edge storage
pub trait KnnEdgesStore: Send + Sync {
    fn insert(&self, source_id: NodeId, embedding: &[f32]) -> Result<()>;
    fn search(&self, query: &[f32], k: usize) -> Result<Vec<(NodeId, f32)>>;
}

Sources: crates/mnem-backend-redb/src/knn_edges_store.rs:1-30

Redb Backend Implementation

The mnem-backend-redb crate provides the reference implementation using the Redb embedded database, a fast, lightweight key-value store written in Rust.

Module Structure

mnem-backend-redb/
├── src/
│   ├── lib.rs           # Main entry point and configuration
│   ├── blockstore.rs    # BlockStore implementation
│   └── knn_edges_store.rs # KNN edge storage with HNSW

Initialization

The backend initializes by opening or creating a Redb database file:

// crates/mnem-backend-redb/src/lib.rs
pub struct Backend {
    db: redb::Database,
    path: PathBuf,
}

impl Backend {
    pub fn open(path: &Path) -> Result<Self> {
        let db = redb::Database::create(path)?;
        Ok(Self { db, path: path.to_path_buf() })
    }
}

Sources: crates/mnem-backend-redb/src/lib.rs:1-50

BlockStore Implementation

The Redb blockstore implementation wraps the database with CAR-compatible semantics:

// crates/mnem-backend-redb/src/blockstore.rs
impl Blockstore for RedbBlockstore {
    fn get(&self, cid: &Cid) -> Result<Option<Vec<u8>>> {
        let key = cid.to_bytes();
        let guard = self.db.begin()?;
        let table = guard.open_table(BLOCKS_TABLE)?;
        Ok(table.get(key)?.map(|v| v.value().as_bytes().to_vec()))
    }
    
    fn put(&self, block: &[u8]) -> Result<Cid> {
        let hash = multihash::Sha256::digest(block);
        let cid = Cid::new_v1(DAG_CBOR, hash);
        // Store with CID bytes as key
    }
}

Sources: crates/mnem-backend-redb/src/blockstore.rs:1-100

Storage Format

The redb backend organizes data into multiple tables:

Table NameKey TypeValue TypePurpose
blocksCID bytesRaw block dataContent-addressed storage
op_headsFixed keyCID bytesOperation head references
knn_edgesNodeIdSerialized edgesVector similarity graph

Transaction Model

mnem implements a transactional write model through the Transaction type in the repository layer. Transactions provide ACID-like semantics for graph modifications.

graph LR
    A[Begin Transaction] --> B[Add Nodes]
    B --> C[Add Edges]
    C --> D[Commit]
    D --> E[Update OpHeads]
    E --> F[Success]
    
    D --> G[Abort]
    G --> H[Rollback]

Write Operations

Transactions support atomic batch operations:

// Conceptual transaction interface
impl Transaction {
    pub fn add_node(&mut self, node: Node) -> Result<NodeId>;
    pub fn add_edge(&mut self, source: NodeId, target: NodeId, relation: &str) -> Result<()>;
    pub fn commit(self) -> Result<Commit>;
}

Sources: crates/mnem-core/src/repo/mod.rs:1-100

Commit Structure

Each commit creates an immutable snapshot of the repository state:

// crates/mnem-core/src/objects/commit.rs
pub struct Commit {
    pub operation: Operation,
    pub parent: Option<Cid>,
    pub author: Author,
    pub message: String,
    pub timestamp: DateTime<Utc>,
}

Sources: crates/mnem-core/src/objects/commit.rs:1-50

Data Persistence Flow

sequenceDiagram
    participant App as Application
    participant Tx as Transaction
    participant BS as BlockStore
    participant OHS as OpHeadsStore
    participant Redb as Redb DB

    App->>Tx: begin()
    Tx->>Tx: add_node(node)
    Tx->>BS: put(block)
    BS->>Redb: write(cid, data)
    Tx->>Tx: add_edge(src, dst)
    Tx->>BS: put(block)
    Tx->>Tx: commit()
    Tx->>BS: put(commit_block)
    Tx->>OHS: put_head(new_op)
    OHS->>Redb: update_heads()
    Redb-->>Tx: success
    Tx-->>App: commit_cid

Configuration

Storage backend behavior is configured through the Config structure:

// crates/mnem-cli/src/config.rs
pub struct Config {
    pub store: Option<StoreConfig>,
    pub data_dir: PathBuf,
    // ...
}

pub struct StoreConfig {
    pub path: PathBuf,
    pub flush_interval_ms: Option<u64>,
}

Sources: crates/mnem-cli/src/config.rs:1-100

Configuration Options

OptionTypeDefaultDescription
pathPathBufdata/Base directory for storage files
flush_interval_msu641000Periodic flush interval in milliseconds

Object Types and Serialization

Node Storage

Nodes are serialized using DAG-CBOR and stored as blocks:

// crates/mnem-core/src/objects/node.rs
pub struct Node {
    pub id: NodeId,
    pub ntype: NodeType,
    pub summary: Option<String>,
    pub props: BTreeMap<String, Ipld>,
    pub content: Option<Bytes>,
    pub context_sentence: Option<String>,
}

Sources: crates/mnem-core/src/objects/node.rs:1-100

Edge Storage

Edges link nodes with labeled relationships:

// crates/mnem-core/src/objects/edge.rs
pub struct Edge {
    pub source: NodeId,
    pub target: NodeId,
    pub relation: String,
    pub confidence: Option<f32>,
}

Sources: crates/mnem-core/src/objects/edge.rs:1-50

Error Handling

Storage operations return the Error type defined in the core crate:

// crates/mnem-core/src/store/mod.rs
pub enum Error {
    #[error("block not found: {0}")]
    BlockNotFound(Cid),
    #[error("serialization failed: {0}")]
    SerializationFailed(String),
    #[error("database error: {0}")]
    DatabaseError(String),
}

Sources: crates/mnem-core/src/store/mod.rs:1-100

Error Recovery

Error TypeRecovery Strategy
BlockNotFoundIndicates data corruption; repository repair required
SerializationFailedCheck data integrity; may indicate schema mismatch
DatabaseErrorRetry operation; check disk space and permissions

Indexes and Secondary Storage

Vector Index

The KNN edges store maintains a vector index for similarity search operations:

graph TD
    A[Query Vector] --> B[KnnEdgesStore]
    B --> C[HNSW Index]
    C --> D[Approximate KNN Search]
    D --> E[Top-K Results]

The implementation uses HNSW (Hierarchical Navigable Small World) algorithm for efficient approximate nearest neighbor search. Sources: crates/mnem-backend-redb/src/knn_edges_store.rs:1-100

Query Interface

The retrieve module composes vector search with graph traversal:

// crates/mnem-core/src/retrieve/mod.rs
pub struct Retriever {
    blockstore: Arc<dyn Blockstore>,
    knn_edges: Arc<dyn KnnEdgesStore>,
    // ...
}

Sources: crates/mnem-core/src/store/op_heads.rs:1-50

Ingestion Pipeline

Related topics: Core Components

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Markdown (.md, .markdown)

Continue reading this section for the full explanation and source context.

Section PDF (.pdf)

Continue reading this section for the full explanation and source context.

Section Code Files

Continue reading this section for the full explanation and source context.

Related topics: Core Components

Ingestion Pipeline

The Ingestion Pipeline is the core system in mnem responsible for transforming external source documents (Markdown, PDFs, code files, conversations) into structured graph nodes within the repository. It handles parsing, chunking, entity extraction, and writing to the graph transaction—all without committing, allowing callers to control transaction boundaries.

Overview

The pipeline orchestrates a multi-stage process:

  1. Detection — Determine source kind from file extension or explicit configuration
  2. Parsing — Convert raw bytes into a list of Section objects
  3. Chunking — Split sections into semantically meaningful Chunk objects
  4. Extraction — Optionally identify entities and relations via rule-based or LLM providers
  5. Writing — Add nodes and edges to a borrowed Transaction
graph TD
    A[Raw Bytes] --> B[SourceKind Detection]
    B --> C[Parser Selection]
    C --> D[Parse to Sections]
    D --> E[Chunker Strategy]
    E --> F[Extract Entities & Relations]
    F --> G[Transaction Write]
    G --> H[IngestResult]
    
    C -->|md| C1[Markdown Parser]
    C -->|pdf| C2[PDF Parser]
    C -->|code| C3[Tree-sitter Parser]
    C -->|json/jsonl| C4[Conversation Parser]
    C -->|text| C5[Plain Text]

Source Kind Detection

The Ingester automatically detects the source kind based on file extension. This determines both the parser and the default chunking strategy.

Extension(s)SourceKindDefault Chunker
.md, .markdownMarkdownParagraph
.txtTextSentenceRecursive (256 tokens, 32 overlap)
.pdfPdfSentenceRecursive (512 tokens, 64 overlap)
.json, .jsonlConversationSession (max 10 messages)
.rsCode(Rust)Structural
.py, .pyiCode(Python)Structural
.js, .mjs, .cjsCode(JavaScript)Structural
.ts, .tsx, .mts, .ctsCode(TypeScript)Structural
.goCode(Go)Structural
.javaCode(Java)Structural
.c, .hCode(C)Structural
.cpp, .cc, .cxx, .hppCode(Cpp)Structural
.rb, .gemspec, .rakeCode(Ruby)Structural
.cs, .csxCode(CSharp)Structural
Unknown/ext noneTextSentenceRecursive

Sources: pipeline.rs:source_kind_from_ext() types.rs:SourceKind types.rs:CodeLanguage::from_extension()

Supported File Formats

Markdown (`.md`, `.markdown`)

Parsed using CommonMark + GitHub Flavored Markdown (GFM) support. The parser extracts headings with depth information, creating section boundaries that respect document structure. Each heading becomes a section boundary.

PDF (`.pdf`)

Pure-Rust text-layer extraction using pdf-extract. One section per page is created, with heading set to "Page {n}" at depth 1. PDFs with fewer than 100 text characters per page are flagged as potentially scanned. Malformed PDFs return Error::ParseFailed. Sources: pdf.rs:MIN_TEXT_PER_PAGE pdf.rs:parse_pdf()

Code Files

Parsed using tree-sitter for supported languages (Rust, Python, JavaScript, TypeScript, Go, Java, C, Cpp, Ruby, CSharp). The parser extracts function and class bodies as sections, preserving structural boundaries. Sources: code.rs

Conversations (`.json`, `.jsonl`)

Supports chat exports from ChatGPT, Claude, and generic conversation formats. Messages are extracted with role (user/assistant/system), content, and timestamps when available. Sources: conversation.rs

Plain Text (`.txt` and others)

Falls back to plain text parsing for unknown extensions, including files without extensions like README.

Chunker Strategies

The ChunkerKind enum defines five chunking strategies. Callers can override the auto-selected strategy via CLI or API.

Strategy Selection

graph TD
    A[SourceKind] --> B[auto_chunker]
    B -->|Markdown| C[Paragraph]
    B -->|Text| D[SentenceRecursive<br/>256 tokens, 32 overlap]
    B -->|Pdf| E[SentenceRecursive<br/>512 tokens, 64 overlap]
    B -->|Conversation| F[Session<br/>max 10 messages]
    B -->|Code| G[Structural]

Paragraph Chunker

Splits each section's body on double-newline boundaries. Fast and deterministic, ideal for Markdown where authoring structure already matches desired chunk boundaries. Sources: chunk.rs:ChunkerKind::Paragraph

Recursive Chunker

Token-budgeted word-window sliding window with configurable overlap. Kept for backwards compatibility. Sources: chunk.rs:ChunkerKind::Recursive

SentenceRecursive Chunker

Sentence-aware token-budgeted packing using Unicode sentence boundaries (UAX #29). Preferred for prose:

  • Chunks never cut mid-sentence
  • Overlap measured at sentence granularity
  • Average chunk size is more uniform

Default for Text (256 tokens, 32 overlap) and Pdf (512 tokens, 64 overlap) source kinds. Sources: chunk.rs:ChunkerKind::SentenceRecursive chunk.rs:auto_chunker()

Session Chunker

Groups contiguous conversation messages into session chunks. Boundaries fire on:

  • Role returning to user, OR
  • Reaching max_messages (default: 10)

Preserves turn ordering. Default for Conversation source kind. Sources: chunk.rs:ChunkerKind::Session

Structural Chunker

One chunk per section. Used for code sources where each section is already a function or class body extracted by the tree-sitter parser. Sources: chunk.rs:ChunkerKind::Structural

Entity Extraction

The pipeline optionally extracts entities and relations using configured extractors.

RuleExtractor (Default)

Delegates entity detection to a NerProvider (default: capitalized-phrase heuristic) and proximity-based relation detection via verb-window regex. Supported relation patterns include: joined, founded, acquired, owns, hired, etc. Sources: extract.rs:RuleExtractor extract.rs:verb_window

Optional: OllamaExtractor

Schema-constrained NER via a local Ollama server (gated behind ollama feature). Hallucinated spans are verified against section text and rejected. Failures degrade gracefully to empty results, keeping the rule-based baseline as the load-bearing path. Sources: lib.rs:extract_llm

Optional: KeyBertAdapter

Statistical entity extraction adapter driven by the server's configured embedder (gated behind keybert feature). Sources: lib.rs:extract_keybert

Pipeline API

Ingester Configuration

pub struct IngestConfig {
    pub chunker: ChunkerKind,
    pub extractor: ExtractorKind,
    pub ner_provider: Option<NerProviderKind>,
    pub include_text: bool,
}

Sources: pipeline.rs:IngestConfig lib.rs:IngestConfig

Core Method

pub fn ingest(
    &self,
    tx: &mut Transaction,
    bytes: &[u8],
    kind: SourceKind,
) -> Result<IngestResult, Error>

Returns an IngestResult with counts and elapsed time. The commit_cid field is left None—callers who want a CID should call tx.commit(...) afterwards.

Errors:

  • Error::ParseFailed — parser rejects the input
  • Error::UnsupportedSource — source kind not covered
  • Error::Commit — upstream codec/blockstore failures from Transaction::add_node/add_edge

Sources: pipeline.rs:Ingester::ingest()

CLI Integration

The mnem ingest command provides CLI access to the pipeline.

mnem ingest notes.md
mnem ingest --text "The quick brown fox"
mnem ingest --chunker recursive --max-tokens 1024 book.pdf
mnem ingest --recursive docs/

CLI Options

FlagDescriptionDefault
--chunkerStrategy selectionauto
--max-tokensTarget tokens per chunk512
--overlapOverlap tokens (recursive)32
--recursiveWalk directory treesfalse
--ntypeRoot Doc node labelDoc
-m, --messageCommit messageAuto-generated

Sources: commands/ingest.rs:Args

Output Nodes

The pipeline writes three node types to the graph:

  1. Doc node — Root node representing the ingested document
  2. Chunk nodes — Smaller content pieces with summary, content, context_sentence, and props fields
  3. Entity nodes — Extracted entities with span information
  4. Relation edges — Connections between entities based on relation extraction

Contextual Retrieval

Each chunk optionally stores a context_sentence—an LLM-generated one-sentence placement cue (e.g., "This paragraph is from Section 3 of a legal contract..."). This is prepended to the summary before embedding, following Anthropic's 2024 Contextual Retrieval recipe, which reports -49% to -67% retrieval-failure reduction. Sources: node.rs:Node.context_sentence

Sidecar Support

For PDFs with poor text-layer extraction, the pipeline supports escalation to external tools:

  • docling (gated behind sidecar-docling feature)
  • unstructured-ingest (gated behind sidecar-unstructured feature)

Sidecars are invoked when built-in PDF extraction quality is insufficient. Sources: lib.rs:sidecar

Token Estimation

Token counts are estimated via whitespace split (tokens_estimate field on Chunk). This is intentionally fast and deterministic. Cl100k accuracy is a documented future improvement. Sources: chunk.rs:token estimation comment

Error Handling

Error TypeCauseRecovery
ParseFailedMalformed input, encryptionReturn error, don't create nodes
UnsupportedSourceUnknown source kindReturn error
CommitBlockstore failureReturn error
Sidecar errorsMissing binary, CLI failureReturn Error::Sidecar
LLM extraction failureTimeout, schema mismatchDegrade to empty Vec

Sources: pipeline.rs:ingest errors lib.rs:Error types

Sources: pipeline.rs:source_kind_from_ext() types.rs:SourceKind types.rs:CodeLanguage::from_extension()

CLI Commands Reference

This page documents all command-line interface commands available in mnem-cli, the primary user-facing tool for interacting with mnem repositories.

Section init

Continue reading this section for the full explanation and source context.

Section status

Continue reading this section for the full explanation and source context.

Section ingest

Continue reading this section for the full explanation and source context.

Section add node

Continue reading this section for the full explanation and source context.

Overview

The mnem CLI provides a unified interface for managing a local knowledge graph repository. It supports operations including repository initialization, content ingestion, node/edge manipulation, branching, tagging, retrieval, and third-party tool integration.

graph TD
    A[mnem CLI] --> B[Repository Operations]
    A --> C[Content Ingestion]
    A --> D[Graph Manipulation]
    A --> E[Version Control]
    A --> F[Retrieval]
    A --> G[Integration]
    
    B --> B1[init]
    B --> B2[status]
    
    C --> C1[ingest]
    
    D --> D1[add node]
    D --> D2[add edge]
    
    E --> E1[log]
    E --> E2[show]
    E --> E3[refs]
    E --> E4[tag]
    E --> E5[branches]
    
    F --> F1[retrieve]
    
    G --> G1[integrate]

Sources: crates/mnem-cli/src/main.rs:40-90

Global Options

The following options are available for all commands:

OptionShortDescription
--repo <PATH>-RPath to the repository directory (.mnem/). Defaults to walking up from the current directory, like git does.

Sources: crates/mnem-cli/src/main.rs:33-36

Repository Operations

init

Initializes a new mnem repository.

mnem init [OPTIONS]
OptionDescription
--path <PATH>Custom repository path
--name <NAME>Repository name
--author <NAME>Default author name
--email <EMAIL>Default author email

status

Prints current op-head, head commit, ref summary, and label counts.

mnem status [OPTIONS]
# Examples:
mnem status                    # current op + head commit + ref count
mnem -R ~/notes status         # explicit repo path

Content Ingestion

ingest

Parses external source files into the graph, creating Doc + Chunk + Entity nodes.

mnem ingest <PATH> [OPTIONS]

#### Supported Source Types

ExtensionSource KindChunker Strategy
.md, .markdownMarkdownParagraph
.txtPlain TextSentenceRecursive (256 tokens, 32 overlap)
.pdfPDFSentenceRecursive (512 tokens, 64 overlap)
.json, .jsonlConversationSession (10 messages max)
.rsRust CodeStructural
.py, .pyiPython CodeStructural
.js, .mjs, .cjsJavaScript CodeStructural
.ts, .tsx, .mts, .ctsTypeScript CodeStructural
.goGo CodeStructural
.javaJava CodeStructural
.c, .hC CodeStructural
.cpp, .cc, .cxx, .hpp, .hxxC++ CodeStructural
.rb, .gemspec, .rake, .erbRuby CodeStructural
.cs, .csxC# CodeStructural
OtherTextSentenceRecursive

Sources: crates/mnem-cli/src/commands/ingest.rs:1-50

#### ingest Options

OptionDescriptionDefault
<PATH>File or directory to ingestRequired (unless --text)
--textInline text to ingest-
--ntype <LABEL>Root Doc node labelDoc
--chunker <STRATEGY>Chunker strategyauto
--max-tokens <N>Target tokens per chunk512
--overlap <N>Overlap tokens between chunks32
--recursiveWalk directory treesfalse
-m, --message <MSG>Commit messageAuto-generated

#### Chunker Strategies

StrategyDescription
autoPicks strategy based on source kind
paragraphSplits on double-newline (Markdown)
recursiveToken-budgeted sliding window
sentence_recursiveSentence-aware token packing
sessionGroups conversation messages
structuralOne chunk per section (code)

Sources: crates/mnem-cli/src/commands/ingest.rs:60-80

Graph Manipulation

add node

Creates a new node in the graph.

mnem add node [OPTIONS]
OptionDescription
-s, --summary <TEXT>Node summary
--label <LABEL>Node type label
--prop <KEY=VALUE>Property (can be repeated)
--context-sentence <TEXT>Positional context for retrieval
# Examples:
mnem add node -s "Alice lives in Berlin"
mnem add node --label Person --prop name=Alice --prop city=Berlin -s "Alice is a climber"

add edge

Creates a directed edge between two nodes.

mnem add edge [OPTIONS]
OptionDescription
--from <UUID>Source node UUID
--to <UUID>Target node UUID
--label <LABEL>Edge type label
--prop <KEY=VALUE>Property (can be repeated)
# Examples:
mnem add edge --from <src-uuid> --to <dst-uuid> --label knows

Sources: crates/mnem-cli/src/main.rs:65-80

Version Control

log

Walks the op-log backwards from the current head.

mnem log [OPTIONS]
OptionDescription
--limit <N>Maximum number of operations to show
--format <FORMAT>Output format (short, full, json)

show

Shows the full detail of one operation.

mnem show <OPERATION_ID>
# Examples:
mnem show 01HZ...

refs

Manages symbolic references to commits.

mnem refs <SUBCOMMAND>

#### refs Subcommands

SubcommandDescription
listList every ref in the current view
set <name> <target>Set ref to point at a target CID
delete <name>Delete a ref
# Examples:
mnem refs list
mnem refs set feature_branch 01HXYZ...
mnem refs delete old_branch

Sources: crates/mnem-cli/src/commands/refs.rs:1-45

tag

Manages named tags that point to commits.

mnem tag <SUBCOMMAND>

#### tag Subcommands

SubcommandDescription
listList every refs/tags/<name> ref with their target CIDs
create <name>Create a new tag
delete <name>Delete a tag

#### tag create Options

OptionDescription
<name>Tag name (stored as refs/tags/<name>)
targetOptional commit CID, ref name, branch shortname, or HEAD
--from <CID>Commit CID / ref / branch to point the tag at
# Examples:
mnem tag list
mnem tag create v0.9
mnem tag create release-2024 --from 01HZ...
mnem tag delete v0.9

Sources: crates/mnem-cli/src/commands/tag.rs:1-60

branches

Manages named branches in the repository.

mnem branches [OPTIONS]
OptionDescription
--listList all branches
--create <NAME>Create a new branch
--delete <NAME>Delete a branch
--switch <NAME>Switch to a branch

#### Branch Output Format

{
  "schema": "mnem.v1.branches",
  "branches": [
    {"name": "main", "head": "<commit-cid>", "is_current": true},
    ...
  ]
}

Retrieval

retrieve

Searches the graph for nodes matching a query.

mnem retrieve [OPTIONS] <QUERY>
OptionDescriptionDefault
--top-k <N>Number of results to return10
--max-tokens <N>Maximum tokens in response4096
--include <FIELD>Fields to include (summary, context, props)All
--format <FORMAT>Output format (text, json)text
# Examples:
mnem retrieve "query"
mnem retrieve --top-k 5 --max-tokens 2048 "machine learning"

Integration

integrate

Integrates mnem system prompts with third-party AI tools.

mnem integrate <HOST> [OPTIONS]

#### Supported Hosts

HostSystem Prompt Path
claude-code~/.claude/CLAUDE.md
gemini-cli~/.gemini/GEMINI.md
cursor~/.cursor/rules/mnem.mdc
continue~/.continue/config.json
zed~/.config/zed/settings.json (Linux) or ~/Library/Application Support/Zed/settings.json (macOS)

#### integrate Options

OptionDescription
--installInstall system prompt to host
--uninstallRemove system prompt from host
--statusShow integration status
graph LR
    A[mnem integrate] --> B{Host Selection}
    B --> C[Claude Code]
    B --> D[Cursor]
    B --> E[Continue]
    B --> F[Zed]
    B --> G[Gemini CLI]
    
    C --> H[Markdown Marker]
    D --> H
    E --> I[JSON Field: systemMessage]
    F --> J[JSON Field: assistant.system_prompt]
    G --> H

Sources: crates/mnem-cli/src/integrate.rs:1-60

Configuration

The CLI loads configuration from ~/.config/mnem/config.toml or .mnem/config.toml in the repository root.

SettingDescription
user.nameAuthor name for commits
user.emailAuthor email for commits
user.agent_idAgent identifier fallback
llm.providerLLM provider (ollama, openai, anthropic)
llm.modelModel name
llm.base_urlAPI base URL (default: http://localhost:11434)
llm.timeout_secsRequest timeout (default: 120)

#### Author String Format

The author string for commits follows this precedence:

  1. name <email> if both present
  2. name if only name present
  3. email if only email present
  4. agent_id if only that present
  5. mnem-cli as fallback

Sources: crates/mnem-cli/src/config.rs:1-80

Command Pipeline

The following diagram shows how commands interact with the repository:

graph TD
    subgraph "CLI Layer"
        A[mnem CLI] --> B[Commands]
        B --> C[Ingest]
        B --> D[Add]
        B --> E[Retrieve]
        B --> F[Refs/Tags]
    end
    
    subgraph "Core Layer"
        C --> G[Ingester Pipeline]
        G --> H[Parser]
        H --> I[Chunker]
        I --> J[Extractor]
        J --> K[Transaction]
        
        D --> K
        F --> K
        E --> L[Retriever]
    end
    
    subgraph "Storage Layer"
        K --> M[Transaction]
        M --> N[Blockstore]
        M --> O[OpHeadsStore]
        L --> P[VectorIndex]
        P --> N
    end

Exit Codes

CodeMeaning
0Success
1General error
2Invalid arguments
3Repository not found
4Object not found
5Conflict detected

Sources: crates/mnem-cli/src/main.rs:40-90

GraphRAG Implementation

Related topics: Hybrid Retrieval System, Core Components

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Module Structure

Continue reading this section for the full explanation and source context.

Section Community Detection (community.rs)

Continue reading this section for the full explanation and source context.

Section Confidence Scoring (confidence.rs)

Continue reading this section for the full explanation and source context.

Related topics: Hybrid Retrieval System, Core Components

GraphRAG Implementation

GraphRAG (Graph-based Retrieval Augmented Generation) is a hybrid retrieval approach that combines vector similarity search with graph-structured knowledge representation. In mnem, the GraphRAG implementation provides community-based entity extraction, confidence scoring, and intelligent graph traversal for enhanced context retrieval.

Overview

The mnem GraphRAG system operates as a layered architecture that:

  1. Extracts entities and relationships from ingested documents
  2. Builds a knowledge graph with typed edges and communities
  3. Enables community-aware retrieval that goes beyond simple vector similarity
  4. Provides confidence-calibrated results suitable for agentic workflows

The implementation lives in crates/mnem-graphrag/ and integrates with the core retrieval pipeline in crates/mnem-core/src/retrieve/.

Core Components

Module Structure

ModulePurpose
lib.rsMain entry point and public API exports
community.rsCommunity detection and hierarchy management
calibration.rsConfidence score calibration utilities
confidence.rsConfidence scoring algorithms
summarize.rsCommunity and entity summarization

Community Detection (`community.rs`)

Community detection partitions the knowledge graph into semantically coherent clusters. The implementation supports hierarchical community structures where:

  • Leaf communities contain tightly interconnected entities
  • Parent communities aggregate related sub-communities
  • Cross-community edges connect related concepts across boundaries

Communities are used during retrieval to:

  • Expand candidate sets by including related entities within the same community
  • Filter results to the most relevant community cluster
  • Enable "zoom-in" and "zoom-out" traversal patterns

Sources: crates/mnem-graphrag/src/community.rs

Confidence Scoring (`confidence.rs`)

Every extracted entity and relationship receives a confidence score based on:

  • Extraction evidence: Frequency and clarity of mentions in source documents
  • Graph connectivity: Number and strength of edges connecting to other entities
  • Source reliability: Document-level trust signals from the ingest pipeline

Confidence scores are normalized to a [0.0, 1.0] range and drive downstream filtering decisions.

Sources: crates/mnem-graphrag/src/confidence.rs

Calibration (`calibration.rs`)

Calibration ensures that confidence scores accurately reflect true extraction quality. The module provides:

  • Score distribution analysis: Histogram-based validation of score distributions
  • Threshold tuning: Per-use-case threshold adjustment for precision/recall tradeoffs
  • Calibration curves: Tools for evaluating score reliability

Sources: crates/mnem-graphrag/src/calibration.rs

Summarization (`summarize.rs`)

The summarization module generates concise descriptions for:

  • Individual entities: One-sentence summaries capturing core identity
  • Relationships: Edge labels and descriptions explaining connections
  • Communities: Multi-sentence overviews of community purpose and membership

Summaries are stored as context_sentence on Node objects, enabling contextual retrieval patterns described in the Anthropic Contextual Retrieval paper.

Sources: crates/mnem-graphrag/src/summarize.rs

Retrieval Integration

Community Filter (`community_filter.rs`)

The retrieval pipeline integrates GraphRAG through the community filter stage. When enabled, the retriever:

  1. Identifies the community containing the top-scoring candidate
  2. Expands the candidate set to include other high-confidence entities in that community
  3. Re-ranks the expanded set using the configured reranker
graph TD
    A[Query Embedding] --> B[Vector Search]
    B --> C[Initial Candidates]
    C --> D[Community Detection]
    D --> E[Community Expansion]
    E --> F[Reranker]
    F --> G[Final Results]

Sources: crates/mnem-core/src/retrieve/community_filter.rs

Retriever Configuration

The Retriever struct in crates/mnem-core/src/retrieve/retriever.rs exposes GraphRAG-related options:

ParameterTypeDefaultDescription
graph_expandOption<usize>NoneExpansion radius for community-based retrieval
graph_decayOption<f32>NoneDecay factor for graph traversal weights
graph_depthOption<usize>NoneMaximum traversal depth
community_filter_enabledboolfalseEnable community-based filtering
ppr_size_gateOption<usize>NonePPR personalization size threshold

PPR-Based Expansion

For larger graphs, mnem supports Personalized PageRank (PPR) based expansion using the adjacency index:

adjacency_index: Option<Arc<dyn AdjacencyIndex + Send + Sync>>

When the adjacency index is available, PPR mode provides:

  • Personalized scoring based on seed nodes
  • Cohesive community member inclusion
  • Falls back to historical decay walk when index is unavailable

Sources: crates/mnem-core/src/retrieve/retriever.rs:10-50

Data Flow

Ingest Pipeline to GraphRAG

graph LR
    A[Source File] --> B[Parser]
    B --> C[Chunker]
    C --> D[Entity Extractor]
    D --> E[Graph Builder]
    E --> F[Community Detection]
    F --> G[Confidence Scoring]
    G --> H[Committed Nodes/Edges]

The ingest pipeline in crates/mnem-ingest/src/pipeline.rs coordinates:

  1. Parsing: Detect source type and extract raw content
  2. Chunking: Split into manageable units using auto-selected chunker
  3. Extraction: Rule-based or LLM-powered entity extraction
  4. Graph building: Create nodes and edges in the transaction
  5. Commit: Persist to the IPLD-based object store

Sources: crates/mnem-ingest/src/pipeline.rs

Chunk Strategy by Source Type

Source KindChunkerTokensOverlap
MarkdownParagraph--
TextSentenceRecursive25632
PDFSentenceRecursive51264
ConversationSession10 messages-
CodeStructural--

Sources: crates/mnem-ingest/src/chunk.rs

Configuration

TOML Configuration

[retrieve]
limit = 20              # Maximum results
budget = 8192           # Token budget
vector_cap = 10         # Vector search candidates
graph_expand = 5        # Community expansion size
graph_depth = 3         # Traversal depth
rerank_top_k = 5        # Final reranking pool

[community]
enabled = true          # Enable community filtering
min_community_size = 3  # Minimum entities per community

CLI Configuration

# Ingest with community extraction
mnem ingest --extractor keybert docs/

# Retrieve with community expansion
mnem retrieve "query" --graph-expand 10

# Configure via config command
mnem config set retrieve.graph_expand 5

Sources: crates/mnem-cli/src/config.rs

API Reference

Core Types

#### Community

pub struct Community {
    pub id: CommunityId,
    pub parent: Option<CommunityId>,
    pub members: Vec<EntityId>,
    pub summary: Option<String>,
    pub depth: u32,
}

#### Entity

pub struct Entity {
    pub id: EntityId,
    pub ntype: String,
    pub summary: Option<String>,
    pub context_sentence: Option<String>,
    pub confidence: f32,
    pub community: Option<CommunityId>,
}

Public API (`lib.rs`)

FunctionSignatureDescription
detect_communities(graph: &Graph) -> Vec<Community>Run community detection
score_entity(entity: &Entity, graph: &Graph) -> f32Calculate confidence
calibrate_scores(scores: Vec<f32>) -> Vec<f32>Apply calibration
summarize_community(community: &Community) -> StringGenerate summary
expand_from_seed(seed: &[NodeId], depth: usize) -> Vec<NodeId>Graph expansion

Architecture Diagram

graph TD
    subgraph "Ingest Layer"
        I1[Markdown]
        I2[PDF]
        I3[Code]
        I4[Conversation]
    end
    
    subgraph "Extract Layer"
        E1[Rule Extractor]
        E2[LLM Extractor]
        E3[KeyBERT Adapter]
    end
    
    subgraph "Graph Layer"
        G1[Node Builder]
        G2[Edge Builder]
        G3[Community Detector]
    end
    
    subgraph "Score Layer"
        S1[Confidence Scorer]
        S2[Calibrator]
    end
    
    subgraph "Retrieve Layer"
        R1[Vector Index]
        R2[Community Filter]
        R3[Reranker]
    end
    
    I1 --> E1
    I2 --> E2
    I3 --> E1
    I4 --> E3
    
    E1 --> G1
    E2 --> G1
    E3 --> G1
    
    G1 --> G2
    G2 --> G3
    
    G3 --> S1
    S1 --> S2
    
    S2 --> R1
    R1 --> R2
    R2 --> R3

Experimental Features

E1: Community Expander

Experiment E1 enables community-expansion during retrieval:

  • When cfg.enabled is false (default): Stage is a no-op
  • When enabled: Top-N seeds' communities pull in additional cohesive members
  • Additive only: Never drops existing candidates
  • Matrix v4 showed -29pp R@10 regression with the old drop-filter semantic

E2: PPR Graph Expansion

Experiment E2 introduces Personalized PageRank for graph expansion:

  • Uses optional AdjacencyIndex for efficient neighborhood queries
  • Falls back to historical decay walk when index unavailable
  • Maintains byte-identical retrieval for default configuration

Sources: crates/mnem-core/src/retrieve/retriever.rs

Warnings and Diagnostics

The retrieval system emits warnings when GraphRAG features encounter issues:

Warning CodeFeatureDescription
community_filterCommunity FilterNo-op community filter triggered
graph_modePPRPPR ran without substrate graph
graph_expandExpansionAuthored adjacency list was empty
min_confidenceConfidenceResults fell below confidence floor
warnings_truncatedDiagnosticsWarning list was truncated

Sources: crates/mnem-core/src/retrieve/warnings.rs

Best Practices

  1. Enable community filtering for queries requiring holistic context
  2. Tune graph_expand based on graph density—larger graphs need smaller expansion radii
  3. Calibrate confidence thresholds per use case using the calibration module
  4. Use structural chunking for codebases to capture function-level entity granularity
  5. Set context_sentence on high-value nodes to improve contextual retrieval

See Also

Sources: crates/mnem-graphrag/src/community.rs

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high [feature] hermes support

The project may affect permissions, credentials, data exposure, or host boundaries.

medium README/documentation is current enough for a first validation pass.

The project should not be treated as fully validated until this signal is reviewed.

medium [bug] Broken docs links: SPEC.md, ROADMAP.md, and Architecture page

Users cannot judge support quality until recent activity, releases, and issue response are checked.

medium Maintainer activity is unknown

Users cannot judge support quality until recent activity, releases, and issue response are checked.

Doramagic Pitfall Log

Doramagic extracted 8 source-linked risk signals. Review them before installing or handing real data to the project.

1. Security or permission risk: [feature] hermes support

  • Severity: high
  • Finding: Security or permission risk is backed by a source signal: [feature] hermes support. Treat it as a review item until the current version is checked.
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/Uranid/mnem/issues/27

2. Capability assumption: README/documentation is current enough for a first validation pass.

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: The project should not be treated as fully validated until this signal is reviewed.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: capability.assumptions | github_repo:1221867246 | https://github.com/Uranid/mnem | README/documentation is current enough for a first validation pass.
  • Severity: medium
  • Finding: Maintenance risk is backed by a source signal: [bug] Broken docs links: SPEC.md, ROADMAP.md, and Architecture page. Treat it as a review item until the current version is checked.
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/Uranid/mnem/issues/23

4. Maintenance risk: Maintainer activity is unknown

  • Severity: medium
  • Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:1221867246 | https://github.com/Uranid/mnem | last_activity_observed missing

5. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: downstream_validation.risk_items | github_repo:1221867246 | https://github.com/Uranid/mnem | no_demo; severity=medium

6. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: risks.scoring_risks | github_repo:1221867246 | https://github.com/Uranid/mnem | no_demo; severity=medium

7. Maintenance risk: issue_or_pr_quality=unknown

  • Severity: low
  • Finding: issue_or_pr_quality=unknown。
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:1221867246 | https://github.com/Uranid/mnem | issue_or_pr_quality=unknown

8. Maintenance risk: release_recency=unknown

  • Severity: low
  • Finding: release_recency=unknown。
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:1221867246 | https://github.com/Uranid/mnem | release_recency=unknown

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 3

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using mnem with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence