# https://github.com/Uranid/mnem 项目说明书

生成时间：2026-05-15 07:30:14 UTC

## 目录

- [Introduction to mnem](#page-introduction)
- [Installation Guide](#page-installation)
- [System Architecture](#page-architecture)
- [Core Components](#page-core-components)
- [Hybrid Retrieval System](#page-hybrid-retrieval)
- [Embedding Providers](#page-embed-providers)
- [Storage Backend](#page-storage-backend)
- [Ingestion Pipeline](#page-ingestion)
- [CLI Commands Reference](#page-cli-commands)
- [GraphRAG Implementation](#page-graphrag)

<a id='page-introduction'></a>

## Introduction to mnem

### 相关页面

相关主题：[System Architecture](#page-architecture), [Installation Guide](#page-installation)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [crates/mnem-core/src/lib.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/lib.rs)
- [crates/mnem-core/src/objects/node.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/objects/node.rs)
- [crates/mnem-core/src/objects/operation.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/objects/operation.rs)
- [crates/mnem-ingest/src/lib.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-ingest/src/lib.rs)
- [crates/mnem-ingest/src/chunk.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-ingest/src/chunk.rs)
- [crates/mnem-ingest/src/pipeline.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-ingest/src/pipeline.rs)
- [crates/mnem-ingest/src/types.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-ingest/src/types.rs)
</details>

# Introduction to mnem

mnem is a Rust-based knowledge management system designed for AI agents. It provides a structured approach to storing, retrieving, and managing information using a DAG-based (Directed Acyclic Graph) storage architecture with content-addressed data structures.

## Overview

mnem serves as a personal knowledge graph for AI agents, enabling them to:

- **Store structured information** with nodes, edges, and properties in a version-controlled repository
- **Ingest various document formats** including Markdown, PDF, plain text, code files, and conversation logs
- **Retrieve relevant context** using vector search, sparse ranking, and token-budget packing
- **Track changes** through a commit-based operation log with cryptographic signatures
- **Support branching** for experimental or temporary state management

资料来源：[crates/mnem-core/src/lib.rs:1-30]()

## Architecture

mnem is organized as a monorepo with multiple Rust crates:

```mermaid
graph TD
    subgraph "mnem Repository Structure"
        CLI["mnem-cli<br/>Command Line Interface"]
        HTTP["mnem-http<br/>HTTP API Server"]
        INGEST["mnem-ingest<br/>Document Ingestion"]
        CORE["mnem-core<br/>Core Data Model & Retrieval"]
    end
    
    CLI --> CORE
    HTTP --> CORE
    INGEST --> CORE
    
    INGEST --> |"parse/chunk/extract"| RAW[("Raw Source<br/>.md .pdf .txt .json")]
    CORE --> |"store/retrieve"| GRAPH[("Knowledge<br/>Graph")]
```

### Crate Responsibilities

| Crate | Purpose |
|-------|---------|
| `mnem-core` | Core data models (`Node`, `Edge`, `Commit`, `Operation`), DAG-CBOR codec, prolly trees, vector/sparse indexing, agent-facing retrieval |
| `mnem-ingest` | Document parsing, chunking strategies, entity extraction (rule-based, KeyBERT, or LLM) |
| `mnem-cli` | Terminal interface for all operations |
| `mnem-http` | REST API for remote agent access |

资料来源：[crates/mnem-core/src/lib.rs:15-25]()

## Core Data Model

### Nodes

Nodes are the fundamental unit of information storage. Each node contains:

```mermaid
graph LR
    subgraph "Node Structure"
        NTYPE["ntype<br/>Node Type Label"]
        CTX["context_sentence<br/>Positional Cue"]
        SUM["summary<br/>LLM-facing Text"]
        PROPS["props<br/>Property Map"]
        CONTENT["content<br/>Opaque Payload"]
    end
```

| Field | Type | Description |
|-------|------|-------------|
| `ntype` | `String` | Semantic label (e.g., `Fact`, `Doc`, `Person`) |
| `context_sentence` | `Option<String>` | LLM-generated placement cue for contextual retrieval |
| `summary` | `Option<String>` | Primary text for embedding and retrieval |
| `props` | `BTreeMap<String, Ipld>` | Structured key-value metadata |
| `content` | `Option<Bytes>` | Opaque payload (document body, file data) |

The `context_sentence` field implements Anthropic's 2024 contextual retrieval approach, storing an LLM-generated one-sentence placement cue that captures positional and relational context. 资料来源：[crates/mnem-core/src/objects/node.rs:30-75]()

### Edges

Edges represent relationships between nodes. They are typed links with source and target references.

### Operations and Commits

| Concept | Description |
|---------|-------------|
| `Operation` | A single atomic change to the repository state |
| `Commit` | A snapshot referencing a sequence of operations |
| `View` | Current head state with references and tombstones |

Operations include metadata for provenance:

```rust
pub struct Operation {
    pub author: String,
    pub agent_id: Option<String>,
    pub task_id: Option<String>,
    pub host: Option<String>,
    pub time: u64,
    pub description: String,
    pub signature: Option<Signature>,
}
```

资料来源：[crates/mnem-core/src/objects/operation.rs:15-35]()

## Document Ingestion Pipeline

The ingestion system (`mnem-ingest`) handles the transformation of raw documents into chunked, indexed nodes:

```mermaid
flowchart LR
    RAW["Raw Source<br/>.md .pdf .txt"] --> PARSE["Parse"]
    PARSE --> SECTION["Sections"]
    SECTION --> CHUNK["Chunk"]
    CHUNK --> EXTRACT["Extract Entities<br/>Relations"]
    EXTRACT --> NODE["Nodes + Edges"]
    NODE --> STORE["Commit to Store"]
```

资料来源：[crates/mnem-ingest/src/lib.rs:20-45]()

### Supported Source Types

| Source | Extensions | Strategy | Default Chunker |
|--------|------------|----------|-----------------|
| Markdown | `.md`, `.markdown` | CommonMark + GFM | `Paragraph` |
| Text | `.txt`, unknown | Plain text | `SentenceRecursive` |
| PDF | `.pdf` | Text layer extraction | `SentenceRecursive` |
| Conversation | `.json`, `.jsonl` | Chat export formats | `Session` |
| Code | `.rs`, `.py`, `.js`, `.ts`, `.go`, `.java`, `.c`, `.cpp`, `.rb`, `.cs` | Tree-sitter parsing | `Structural` |

资料来源：[crates/mnem-ingest/src/pipeline.rs:45-60]()

### Chunker Strategies

Five chunking strategies are available:

| Strategy | Description | Use Case |
|----------|-------------|----------|
| `Paragraph` | Splits on double-newlines | Markdown documents |
| `Recursive` | Token-budgeted word-window sliding | Backwards compatibility |
| `SentenceRecursive` | Sentence-aware token packing using Unicode boundaries | Prose (Text, PDF) |
| `Session` | Groups messages up to `max_messages` | Conversation logs |
| `Structural` | One chunk per section | Code (function/class level) |

The `SentenceRecursive` chunker is the preferred strategy for prose as it prevents cutting mid-sentence and produces more uniform chunk sizes. Token counts are estimated via whitespace split for speed and determinism. 资料来源：[crates/mnem-ingest/src/chunk.rs:1-45]()

### Entity Extraction

mnem supports multiple extraction providers:

| Provider | Method |
|----------|--------|
| `rule` (default) | Capitalized phrase heuristic |
| `keybert` | Statistical keyword extraction (requires feature flag) |
| `ollama` | LLM-based extraction (requires feature flag) |
| `none` | Suppress entity extraction |

资料来源：[crates/mnem-cli/src/commands/ingest.rs:25-40]()

## Retrieval System

The retrieval layer composes multiple ranking strategies to deliver relevant context to agents under a token budget:

```mermaid
graph TD
    QUERY["Query"] --> VEC["Vector Search"]
    QUERY --> SPARSE["Sparse Ranking"]
    VEC --> RERANK["Rerank"]
    SPARSE --> RERANK
    RERANK --> PACK["Token Budget Packing"]
    PACK --> RESULT["Context for Agent"]
```

The retriever renders nodes in a YAML-like format:

```text
ntype: <ntype>
id: <uuid>
context: <context_sentence>
summary: <summary>
<prop_key>: <prop_value>
```

- `ntype` and `id` are always present
- `context` appears before `summary` (per Anthropic's contextual-retrieval recipe)
- `summary` is clipped at 8192 chars by default
- Scalar props are emitted in BTreeMap order; non-scalar props are skipped

资料来源：[crates/mnem-core/src/retrieve/mod.rs:1-50]()

## CLI Interface

The `mnem` CLI provides commands for repository management:

| Command | Description |
|---------|-------------|
| `mnem ingest <path>` | Parse and commit documents to the graph |
| `mnem tag` | Manage versioned references (create, list, delete) |
| `mnem branch` | Create and manage branches |

### Ingest Command Options

| Option | Default | Description |
|--------|---------|-------------|
| `--chunker` | `auto` | Strategy: `auto`, `paragraph`, `recursive`, `sentence_recursive`, `session`, `structural` |
| `--max-tokens` | `512` | Target tokens per chunk |
| `--overlap` | `32` | Overlap tokens between chunks |
| `--recursive` | false | Walk directory trees |
| `--extractor` | `none` | Entity extraction provider |
| `--ner-provider` | `rule` | NER method: `rule`, `none` |

资料来源：[crates/mnem-cli/src/commands/ingest.rs:50-70]()

## HTTP API

The HTTP server exposes REST endpoints for remote agent access:

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/v1/ingest` | POST | Ingest documents with JSON or multipart payload |
| `/v1/branches` | GET | List all branches |
| `/v1/branches` | POST | Create a new branch |

### Ingest Request Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `chunker` | String | Strategy: `auto`, `paragraph`, `recursive`, `session` |
| `max_tokens` | u32 | Target tokens per chunk |
| `overlap` | u32 | Overlap tokens between chunks |
| `author` | String | Required commit author |
| `message` | String | Optional commit message |
| `extractor` | String | Extraction provider |
| `ner_provider` | String | NER method override |

资料来源：[crates/mnem-http/src/handlers_ingest.rs:15-45]()

## Key Design Principles

1. **No unsafe code**: The entire `mnem-core` crate enforces `#![forbid(unsafe_code)]` 资料来源：[crates/mnem-core/src/lib.rs:30]()

2. **Canonical encoding**: Every object type preserves byte-exact round-trip property (`decode(encode(x)) == x`)

3. **Deterministic retrieval**: Node props use `BTreeMap` for consistent iteration order

4. **Extensible architecture**: Sidecar support for external tools (docling, unstructured) via feature flags

5. **Branch support**: Tags and branches enable experimental state management without losing history

---

<a id='page-installation'></a>

## Installation Guide

### 相关页面

相关主题：[Introduction to mnem](#page-introduction)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [crates/mnem-cli/src/main.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/main.rs)
- [crates/mnem-cli/src/config.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/config.rs)
- [py-packages/mnem-cli/README.md](https://github.com/Uranid/mnem/blob/main/py-packages/mnem-cli/README.md)
- [crates/mnem-ingest/src/lib.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-ingest/src/lib.rs)
- [crates/mnem-core/src/lib.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/lib.rs)
- [crates/mnem-cli/src/commands/ingest.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/commands/ingest.rs)
</details>

# Installation Guide

## Overview

The mnem project is a Git-like version control system designed specifically for AI Agent Knowledge management. It provides versioned storage, retrieval, and synchronization of structured knowledge nodes. This guide covers all supported installation methods, system requirements, and configuration steps to get mnem running on your platform.

The project is organized as a Rust monorepo with multiple crates and language bindings. Installation options include native binaries via multiple package managers, Python packages, Docker containers, and prebuilt releases. 资料来源：[crates/mnem-core/src/lib.rs:1-20]()

## System Requirements

### Supported Platforms

mnem supports the following platforms and architectures:

| Platform | Architecture | Notes |
|----------|--------------|-------|
| Linux | x86_64, aarch64 | Full support |
| macOS | arm64 (Apple Silicon), x86_64 | Rosetta 2 compatible |
| Windows | x86_64 | Full support |

资料来源：[py-packages/mnem-cli/README.md:1-20]()

### Runtime Dependencies

| Component | Requirement | Purpose |
|-----------|-------------|---------|
| Rust toolchain | 1.70+ (stable) | Building from source |
| Python | 3.9+ | Python bindings (`mnem-py`) |
| Node.js | 18+ | npm package |
| Docker | 20.10+ | Container deployment |

## Installation Methods

### CLI Installation via pip

The simplest method to install the mnem CLI is through Python's package manager:

```bash
pip install mnem-cli
mnem --version
```

On first run, `mnem` automatically downloads the correct prebuilt binary for your platform from the GitHub release assets and caches it in `~/.mnem_cli/`. Subsequent calls run the cached binary directly. 资料来源：[py-packages/mnem-cli/README.md:1-15]()

### CLI Installation via Cargo

For users with the Rust toolchain installed, install from crates.io:

```bash
cargo install --locked mnem-cli --features bundled-embedder
```

The `--features bundled-embedder` flag compiles the embedder dependency into the binary, making it self-contained without external embedding services. 资料来源：[crates/mnem-cli/src/main.rs:1-50]()

### CLI Installation via npm

Node.js users can install globally via npm:

```bash
npm install -g mnem-cli
```

资料来源：[py-packages/mnem-cli/README.md:1-20]()

### Prebuilt Binaries

Download prebuilt binaries directly from the [GitHub Releases](https://github.com/Uranid/mnem/releases) page. Binaries are available for all supported platforms in the release assets.

After downloading, make the binary executable:

```bash
chmod +x mnem-*-x86_64-unknown-linux-gnu
./mnem-*-x86_64-unknown-linux-gnu --version
```

### Docker Installation

Container-based deployment is available via Docker. The project includes both a `Dockerfile` and `docker-compose.yml` for containerized deployments.

To build the Docker image:

```bash
docker build -t mnem:latest .
```

For orchestrated deployments using docker-compose:

```bash
docker-compose up -d
```

资料来源：[Dockerfile](https://github.com/Uranid/mnem/blob/main/Dockerfile), [docker-compose.yml](https://github.com/Uranid/mnem/blob/main/docker-compose.yml)

## Python Bindings

For programmatic access from Python applications, install the Python bindings package:

```bash
pip install mnem-py
```

This package provides the `import pymnem` interface for Python applications to interact with mnem repositories. 资料来源：[py-packages/mnem-cli/README.md:1-30]()

## Build from Source

### Prerequisites

- Rust 1.70 or later (stable toolchain)
- Cargo (included with Rust)
- Git

### Build Steps

```bash
# Clone the repository
git clone https://github.com/Uranid/mnem.git
cd mnem

# Build the CLI
cargo build --release --bin mnem

# Build all crates
cargo build --release
```

### Feature Flags

The project supports several feature flags to customize the build:

| Feature | Description |
|---------|-------------|
| `bundled-embedder` | Embedder for local vector storage |
| `keybert` | Statistical keyphrase extraction |
| `ollama` | LLM-based extraction via Ollama |
| `sidecar-docling` | PDF extraction via docling CLI |
| `sidecar-unstructured` | PDF extraction via unstructured |

资料来源：[crates/mnem-ingest/src/lib.rs:1-60]()

## Initial Configuration

### Repository Initialization

After installation, initialize a new mnem repository:

```bash
mnem init
```

This creates the `.mnem/` directory with the repository database (`repo.redb`). 资料来源：[crates/mnem-cli/src/main.rs:1-80]()

### Configuration File

The CLI reads configuration from `.mnem/config.toml` in the repository root. Configuration includes:

```toml
[user]
name = "Your Name"
email = "your.email@example.com"
agent_id = "agent-identifier"

[llm]
provider = "ollama"  # or "openai", "anthropic"
model = "llama3.2"
base_url = "http://localhost:11434"
timeout_secs = 120
```

The author string for commits follows the format `name <email>` when both are present. If only one is available, it uses that value alone. When neither is configured, it falls back to the `agent_id` or defaults to `"mnem-cli"`. 资料来源：[crates/mnem-cli/src/config.rs:1-50]()

### Repository Path Resolution

The CLI automatically searches for the `.mnem/` directory by walking up from the current working directory, similar to Git's behavior. You can override this with the `-R` / `--repo` flag:

```bash
mnem -R ~/notes status
```

资料来源：[crates/mnem-cli/src/main.rs:1-80]()

## HTTP Server Deployment

### Starting the Server

The HTTP server provides REST API access to mnem repositories:

```bash
mnem serve --port 8080
```

### API Endpoints

| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/v1/ingest` | POST | Ingest documents |
| `/v1/branches` | GET | List branches |
| `/v1/branches` | POST | Create branch |
| `/v1/retrieve` | POST | Query knowledge |

### Ingest Configuration

The ingest endpoint accepts JSON payloads with the following parameters:

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `content` | String | Yes | - | Content to ingest |
| `chunker` | String | No | `auto` | Chunking strategy |
| `max_tokens` | u32 | No | 512 | Target tokens per chunk |
| `overlap` | u32 | No | 32 | Overlap tokens |
| `author` | String | Yes | - | Commit author |
| `message` | String | No | `"mnem http ingest"` | Commit message |
| `extractor` | String | No | `"none"` | Entity extractor |
| `ner_provider` | String | No | `"rule"` | NER provider |

资料来源：[crates/mnem-http/src/handlers_ingest.rs:1-50]()

## Verification

### Verify Installation

After installation, verify the CLI is working:

```bash
mnem --version
mnem status
```

### First-Run Wizard

On first run with no repository present, mnem launches a first-run wizard to help configure the basic settings. Returning users see the `mnem status` output directly. 资料来源：[crates/mnem-cli/src/main.rs:1-80]()

## Alternative Installation Summary

| Method | Command | Notes |
|--------|---------|-------|
| pip | `pip install mnem-cli` | Auto-downloads binary |
| cargo | `cargo install --locked mnem-cli --features bundled-embedder` | Self-contained binary |
| npm | `npm install -g mnem-cli` | Node.js integration |
| Docker | `docker-compose up -d` | Containerized deployment |
| Binary | Download from Releases | Manual installation |

资料来源：[py-packages/mnem-cli/README.md:1-20]()

## Next Steps

After installation, consult these related guides:

- **Quick Start** - Create your first repository and add content
- **Configuration Reference** - Complete configuration options
- **Ingest Guide** - Document ingestion and chunking strategies
- **Retrieve Guide** - Query your knowledge base

---

<a id='page-architecture'></a>

## System Architecture

### 相关页面

相关主题：[Core Components](#page-core-components), [Storage Backend](#page-storage-backend)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [crates/mnem-core/src/lib.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/lib.rs)
- [crates/mnem-core/src/id/link.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/id/link.rs)
- [crates/mnem-core/src/objects/node.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/objects/node.rs)
- [crates/mnem-core/src/retrieve/mod.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/retrieve/mod.rs)
- [crates/mnem-ingest/src/lib.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-ingest/src/lib.rs)
- [crates/mnem-ingest/src/pipeline.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-ingest/src/pipeline.rs)
- [crates/mnem-ingest/src/types.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-ingest/src/types.rs)
- [crates/mnem-ingest/src/chunk.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-ingest/src/chunk.rs)
</details>

# System Architecture

## Overview

mnem is a content-addressed, CRDT-based (Conflict-free Replicated Data Types) knowledge management system designed for agentic workflows. The system provides immutable content-addressed storage with a secondary vector index for retrieval-augmented generation (RAG) applications.

The architecture follows a modular design with distinct crates handling different concerns:

| Crate | Purpose |
|-------|---------|
| `mnem-core` | Core data types, CRDT operations, storage, indexing, retrieval |
| `mnem-ingest` | Source parsing, chunking, and entity extraction |
| `mnem-cli` | Command-line interface |
| `mnem-http` | HTTP API server |

资料来源：[crates/mnem-core/src/lib.rs:1-30]()

---

## Core Data Model

### Content Identifiers (CIDs)

Content in mnem is identified by cryptographic hashes using IPLD CIDs (Content Identifiers). CIDs are self-describing content addresses that encode the hash function and compressed representation of the content.

The system uses DAG-CBOR encoding for canonical binary serialization, ensuring byte-exact round-trip encoding/decoding:

```text
decode(encode(x)) == x
encode(decode(b)) == b
```

资料来源：[crates/mnem-core/src/lib.rs:35-40]()

### Phantom-Typed Links

mnem introduces `Link<T>` - a phantom-typed CID that encodes the type of the referenced content at the type level. While a bare CID points to "some content," a `Link<T>` points to "content that is a `T`."

```rust
pub struct Link<T: ?Sized> {
    cid: Cid,
    _target: PhantomData<fn() -> T>,
}
```

The phantom type prevents reference-mixing bugs at compile time. For example, `fn parents(&self) -> &[Link<Commit>]` will reject a `Link<Node>` at compile time.

On the wire, `Link<T>` is identical to a `Cid` - same bytes, same CBOR tag. The phantom type exists solely for Rust-level type safety.

资料来源：[crates/mnem-core/src/id/link.rs:1-35]()

### Node Object

The fundamental unit of content storage is the `Node`:

```rust
pub struct Node {
    pub id: NodeId,
    pub ntype: NodeType,
    pub context_sentence: Option<String>,
    pub summary: Option<String>,
    pub props: BTreeMap<String, Ipld>,
    pub content: Option<Bytes>,
    pub ext: Option<BTreeMap<String, Ipld>>,
}
```

| Field | Purpose |
|-------|---------|
| `id` | Unique identifier |
| `ntype` | Semantic type (e.g., `Chunk`, `Concept`, `Entity`) |
| `context_sentence` | LLM-generated positional cue for contextual retrieval |
| `summary` | Token-efficient content representation for LLM consumption |
| `props` | Structured key-value properties |
| `content` | Opaque payload (document body, file data) |
| `ext` | Forward-compatibility extension map |

The `context_sentence` field implements Anthropic's 2024 Contextual Retrieval technique, storing LLM-generated placement cues alongside nodes. This reportedly reduces retrieval failure by 49-67%.

资料来源：[crates/mnem-core/src/objects/node.rs:1-80]()

---

## Repository Structure

mnem uses a CRDT-based repository model with the following object types:

```mermaid
graph TD
    A[View] --> B[Commit]
    B --> C[Node]
    B --> D[Edge]
    C --> E[Node Content]
    D --> F[Link<Node>]
    D --> G[Link<Node>]
    
    B --> H[Operation]
    H --> I[ChangeId]
    H --> J[OperationId]
```

### Core Objects

| Type | Description |
|------|-------------|
| `Node` | Atomic content unit with type, summary, and properties |
| `Edge` | Directed relationship between two nodes |
| `Commit` | Atomic snapshot of changes with parent references |
| `Operation` | Individual change record for sync |
| `View` | Immutable repository head view |
| `IndexSet` | Secondary index collection |

资料来源：[crates/mnem-core/src/lib.rs:10-20]()

### Blockstore Abstraction

The system abstracts storage through the `Blockstore` trait, allowing different backends:

```rust
pub trait Blockstore {
    fn get(&self, cid: &Cid) -> Result<Option<Vec<u8>>>;
    fn put(&self, data: &[u8]) -> Result<Cid>;
}
```

Reference implementations include in-memory stores, with pluggable backends for production use.

---

## Ingest Pipeline

The ingest pipeline handles parsing, chunking, and extracting content from various source types.

```mermaid
graph LR
    A[Raw Bytes] --> B[Source Detection]
    B --> C[Parser]
    C --> D[Chunker]
    D --> E[Extractor]
    E --> F[Transaction]
    F --> G[Commit]
```

### Source Types

The system automatically detects source types from file extensions:

| Extension | SourceKind | Chunker Strategy |
|-----------|------------|------------------|
| `md`, `markdown` | `Markdown` | Paragraph |
| `pdf` | `Pdf` | SentenceRecursive (512 tokens, 64 overlap) |
| `json`, `jsonl` | `Conversation` | Session (10 messages) |
| `rs` | `Code(Rust)` | Structural |
| `py`, `pyi` | `Code(Python)` | Structural |
| `js`, `mjs`, `cjs` | `Code(JavaScript)` | Structural |
| `ts`, `tsx`, `mts`, `cts` | `Code(TypeScript)` | Structural |
| Other | `Text` | SentenceRecursive (256 tokens, 32 overlap) |

资料来源：[crates/mnem-ingest/src/types.rs:1-50]()

### Chunker Strategies

| Strategy | Configuration | Use Case |
|----------|---------------|----------|
| `Paragraph` | - | Markdown documents |
| `SentenceRecursive` | `max_tokens`, `overlap` | Text, PDFs |
| `Session` | `max_messages` | Chat/conversation logs |
| `Structural` | - | Code (tree-sitter based) |
| `Recursive` | `max_tokens`, `overlap` | Generic recursive splitting |

For code parsing, mnem uses tree-sitter to extract named language constructs:

| Language | Extracted Items |
|----------|-----------------|
| Rust | `function_item`, `struct_item`, `enum_item`, `trait_item` |
| Python | `function_definition`, `class_definition` |
| JavaScript/TypeScript | `function_declaration`, `class_declaration`, `method_definition` |
| Go | `function_declaration`, `type_declaration` |

资料来源：[crates/mnem-ingest/src/chunk.rs:1-50]()

### Auto-Chunker Defaults

The `auto_chunker` function maps source kinds to default chunking strategies:

```rust
pub fn auto_chunker(kind: SourceKind, heuristics: ChunkerAuto) -> ChunkerKind {
    match kind {
        SourceKind::Markdown => ChunkerKind::Paragraph,
        SourceKind::Text => ChunkerKind::SentenceRecursive { 
            max_tokens: 256, 
            overlap: 32 
        },
        SourceKind::Pdf => ChunkerKind::SentenceRecursive { 
            max_tokens: 512, 
            overlap: 64 
        },
        SourceKind::Conversation => ChunkerKind::Session { 
            max_messages: 10 
        },
        SourceKind::Code(_) => ChunkerKind::Structural,
    }
}
```

资料来源：[crates/mnem-ingest/src/chunk.rs:1-60]()

---

## Retrieval System

The retrieval system provides agent-facing context assembly with token-budget packing.

### Node Rendering

Nodes are rendered to a compact, deterministic YAML-like format for LLM consumption:

```text
ntype: <ntype>
id: <uuid>
context: <context_sentence>
summary: <summary>
<prop_key>: <prop_value>
```

Rendering rules:
- `ntype` and `id` are always present
- `context` appears before `summary` (per Anthropic's contextual retrieval recipe)
- `summary` is clipped at 8192 chars by default (configurable via `MNEM_RENDER_SUMMARY_CAP_CHARS`)
- Scalar props (`String`, `Integer`, `Float`, `Bool`) are emitted in BTreeMap order
- Non-scalar props (`Link`, `Map`, `List`, `Bytes`, `Null`) are skipped

资料来源：[crates/mnem-core/src/retrieve/mod.rs:1-50]()

### Retrieval Pipeline

```mermaid
graph TD
    A[Query] --> B[Filters]
    B --> C[Vector Ranking]
    C --> D[Sparse Ranking]
    D --> E[Token Budget Packing]
    E --> F[Rendered Context]
```

The `Retriever` composes:
1. **Filters** - Property-based pre-filtering
2. **Vector Index** - Dense embedding similarity search
3. **Sparse Ranking** - BM25-style keyword matching
4. **Token Budget Packing** - Context window optimization

---

## Module Organization

### mnem-core

Core library providing:

| Module | Exports |
|--------|---------|
| `id` | `Cid`, `NodeId`, `ChangeId`, `OperationId`, `Link<T>` |
| `codec` | DAG-CBOR encode/decode, DAG-JSON debug export |
| `objects` | `Node`, `Edge`, `Commit`, `Operation`, `View` |
| `prolly` | Prolly tree algorithms, chunks, builder |
| `store` | `Blockstore`, `OpHeadsStore` traits |
| `repo` | `ReadonlyRepo`, `Transaction` facade |
| `index` | `Query`, `BruteForceVectorIndex` |
| `retrieve` | `Retriever` for agent-facing context |
| `sign` | Ed25519 signing, revocation-list verification |

### mnem-ingest

Content ingestion library with optional LLM features:

| Module | Description |
|--------|-------------|
| `chunk` | Text chunking strategies |
| `code` | Tree-sitter based code parsing |
| `conversation` | Chat log processing |
| `extract` | Rule-based entity extraction |
| `extract_keybert` | Statistical NER (feature-gated) |
| `extract_llm` | LLM-based extraction (feature-gated) |
| `md` | Markdown parsing |
| `pdf` | PDF processing |
| `sidecar` | External parser integration |

资料来源：[crates/mnem-ingest/src/lib.rs:1-50]()

---

## Configuration

### LLM Providers

mnem supports multiple LLM providers for extraction and synthesis:

| Provider | Config Keys |
|----------|-------------|
| OpenAI | `llm.provider=openai`, `llm.model`, `llm.api_key_env`, `llm.base_url` |
| Ollama | `llm.provider=ollama`, `llm.model`, `llm.base_url` (defaults to `http://localhost:11434`) |

### Retrieval Configuration

| Key | Description | Default |
|-----|-------------|---------|
| `retrieve.limit` | Max results to return | - |
| `retrieve.budget` | Token budget for context | - |
| `retrieve.vector_cap` | Max vectors per query | - |
| `retrieve.graph_expand` | Neighbor expansion count | - |
| `retrieve.graph_depth` | Traversal depth | - |
| `retrieve.rerank_top_k` | Top-k for reranking | - |
| `retrieve.hyde_max_tokens` | HyDE document tokens | - |

### Rerank Providers

Supported rerankers: `cohere`, `voyage`, `jina`

### NER Providers

| Provider | Description |
|----------|-------------|
| `rule` | Capitalized-phrase heuristic (default) |
| `none` | Disable entity extraction |

---

## Crate Dependencies

```mermaid
graph TD
    CLI[mnem-cli] --> CORE[mnem-core]
    CLI --> INGEST[mnem-ingest]
    HTTP[mnem-http] --> CORE
    HTTP --> INGEST
    INGEST --> CORE
```

All crates maintain `#![forbid(unsafe_code)]` - no unsafe code permitted in the repository.

资料来源：[crates/mnem-core/src/lib.rs:30-45]()

---

## Summary

The mnem system architecture implements a layered, modular design:

1. **Storage Layer** - Content-addressed immutable storage with CRDT semantics
2. **Data Model** - Nodes, edges, commits with phantom-typed links for type safety
3. **Ingest Layer** - Pluggable parsers and chunkers for diverse source types
4. **Retrieval Layer** - Hybrid vector/sparse search with token-budget aware packing
5. **Interface Layer** - CLI and HTTP APIs for integration with agents and tools

The architecture prioritizes determinism, type safety, and modularity, making it suitable for building reliable agentic knowledge systems.

---

<a id='page-core-components'></a>

## Core Components

### 相关页面

相关主题：[System Architecture](#page-architecture), [Hybrid Retrieval System](#page-hybrid-retrieval)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [crates/mnem-core/src/objects/commit.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/objects/commit.rs)
- [crates/mnem-core/src/objects/node.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/objects/node.rs)
- [crates/mnem-core/src/objects/edge.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/objects/edge.rs)
- [crates/mnem-core/src/repo/mod.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/repo/mod.rs)
- [crates/mnem-core/src/repo/transaction.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/repo/transaction.rs)
- [crates/mnem-core/src/repo/merge.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/repo/merge.rs)
- [crates/mnem-core/src/prolly/tree.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/prolly/tree.rs)
</details>

# Core Components

The mnem system is built around a set of core components that work together to provide a versioned, graph-based knowledge management system. The core is implemented entirely in Rust with `#![forbid(unsafe_code)]`, ensuring memory safety throughout the codebase. Every object type preserves byte-exact canonical-encoding round-trip properties (`decode(encode(x)) == x` and `encode(decode(b)) == b`). 资料来源：[crates/mnem-core/src/lib.rs]()

## System Architecture Overview

mnem implements a content-addressed graph database with prolly trees for efficient storage and retrieval. The architecture separates concerns between data structures (`objects`), storage (`store`), repository management (`repo`), and retrieval (`retrieve`).

```mermaid
graph TD
    subgraph "mnem-core"
        OBJ[objects: Node, Edge, Commit, View]
        PRO[prolly: TreeChunk, Builder, Cursor]
        STORE[store: Blockstore, OpHeadsStore]
        REPO[repo: ReadonlyRepo, Transaction]
        IDX[index: Query, BruteForceVectorIndex]
        RET[retrieve: Retriever]
        CODEC[codec: DAG-CBOR, DAG-JSON]
    end
    
    subgraph "mnem-ingest"
        ING[Ingester Pipeline]
        CHUNK[Chunking Strategies]
        PARSE[Parsers: MD, PDF, Code, JSON]
    end
    
    subgraph "External Interfaces"
        CLI[mnem-cli]
        HTTP[mnem-http]
        MCP[MCP Server]
    end
    
    ING -->|adds nodes/edges| REPO
    REPO -->|reads/writes| STORE
    RET -->|queries| IDX
    IDX -->|indexes| OBJ
    CODEC -->|encodes/decodes| OBJ
    CLI --> REPO
    HTTP --> REPO
```

## Data Objects

The fundamental building blocks of the mnem knowledge graph are the core object types defined in `crates/mnem-core/src/objects/`. Each object is serializable via DAG-CBOR for canonical encoding. 资料来源：[crates/mnem-core/src/lib.rs]()

### Node

The `Node` is the primary unit of knowledge storage. It represents a single fact, entity, or chunk of content within the graph.

```rust
// Simplified structure from crates/mnem-core/src/objects/node.rs
pub struct Node {
    pub id: NodeId,                                    // Unique identifier
    pub ntype: String,                                 // Node type label (e.g., "Fact", "Doc")
    pub summary: Option<String>,                       // LLM-facing retrieval text
    pub props: BTreeMap<String, Ipld>,                 // Property map
    pub content: Option<Bytes>,                        // Optional opaque payload
    pub context_sentence: Option<String>,              // Positional chunk prefix
    pub ext: Option<BTreeMap<String, Ipld>>,           // Forward-compat extension map
}
```

| Field | Type | Description |
|-------|------|-------------|
| `id` | `NodeId` | Unique content-addressed identifier |
| `ntype` | `String` | Free-form type label for the node |
| `summary` | `Option<String>` | Text summary for LLM retrieval under token budget |
| `props` | `BTreeMap<String, Ipld>` | Structured metadata with any DAG-CBOR value |
| `content` | `Option<Bytes>` | Opaque payload (document body, file data) |
| `context_sentence` | `Option<String>` | LLM-generated placement cue per Anthropic's contextual retrieval recipe |
| `ext` | `Option<BTreeMap>` | Forward-compat extension map preserving unknown fields |

The `summary` field is designed for LLM consumption—the field agents read when assembling context under a token budget. It is distinct from `props` (structured) and `content` (opaque payload). 资料来源：[crates/mnem-core/src/objects/node.rs]()

The `context_sentence` implements Anthropic's 2024 Contextual Retrieval paper approach, which reports -49% to -67% retrieval-failure reduction when present. mnem stores it on the node so the render path can surface it back to the agent for faithful source attribution. 资料来源：[crates/mnem-core/src/objects/node.rs]()

### Edge

Edges connect nodes and represent relationships between entities.

```rust
// From crates/mnem-core/src/objects/edge.rs
pub struct Edge {
    pub src: NodeId,           // Source node ID
    pub rel: String,           // Relation label (e.g., "works_at", "extracted_from")
    pub dst: NodeId,           // Destination node ID
    pub props: BTreeMap<String, Ipld>,  // Optional edge properties
}
```

Edges are used to create graph relationships like `works_at`, `lives_in`, `traveling_with`, `has_preference`, and `extracted_from`. The mnem-cli integration guidelines recommend using the compound `mnem_commit_relation` tool when both endpoints are entities—it resolve-or-creates both nodes and adds the edge in one call. 资料来源：[crates/mnem-cli/src/integrate.rs]()

### Commit

The `Commit` object represents a point-in-time snapshot of the repository state.

```rust
// From crates/mnem-core/src/objects/commit.rs
pub struct Commit {
    pub message: String,           // Commit message
    pub author: Author,            // Author information
    pub timestamp: Timestamp,       // Commit timestamp
    pub root: NodeId,              // Root of the node tree
    pub ops: Vec<Operation>,        // Operations applied in this commit
}
```

### View

The `View` contains repository metadata including branch references and commit heads.

```rust
// Referenced in crates/mnem-http/src/handlers.rs
pub struct View {
    pub heads: Vec<Cid>,           // Current head commit CIDs
    pub refs: BTreeMap<String, RefTarget>,  // Named references
}
```

The View exposes branch information via the HTTP API with the schema `mnem.v1.branches`. 资料来源：[crates/mnem-http/src/handlers.rs]()

### Operation

Operations represent individual changes applied to the repository. They are collected within commits to provide a complete audit trail.

## Repository Layer

The repository layer provides the main interface for interacting with the knowledge graph. It is defined across several modules in `crates/mnem-core/src/repo/`. 资料来源：[crates/mnem-core/src/repo/mod.rs]()

### ReadonlyRepo

`ReadonlyRepo` provides a read-only view into the repository state.

```rust
// Simplified from crates/mnem-core/src/repo/mod.rs
pub trait ReadonlyRepo {
    fn view(&self) -> &View;
    fn blockstore(&self) -> &dyn Blockstore;
}
```

### Transaction

`Transaction` enables write operations to the repository. All changes are staged until explicitly committed.

```rust
// From crates/mnem-core/src/repo/transaction.rs
pub struct Transaction {
    // Internal state managing pending operations
}

impl Transaction {
    pub fn add_node(&mut self, node: Node) -> Result<NodeId, Error>;
    pub fn add_edge(&mut self, edge: Edge) -> Result<EdgeId, Error>;
    pub fn commit(self, author: Author, message: String) -> Result<ReadonlyRepo, Error>;
}
```

The `ingest` method on the Ingester pipeline uses Transaction to add nodes and edges:

> Parse, chunk, extract, and write into `tx`. Does **not** commit. `bytes` is the raw source payload; `kind` says how to parse it. Returns an `IngestResult` with counts and elapsed time. The `commit_cid` field is left `None` - callers who want a CID should call `tx.commit(...)` afterwards and stash the returned `ReadonlyRepo`'s head commit CID. 资料来源：[crates/mnem-ingest/src/pipeline.rs]()

### Merge and Conflict Detection

The merge system handles combining divergent repository states.

```rust
// From crates/mnem-core/src/repo/merge.rs
pub fn detect_conflicts(
    repo: &ReadonlyRepo,
    left: Cid,
    right: Cid,
    lca: Option<Cid>,
) -> Result<MergeConflicts, Error>;
```

Conflict detection supports an explicit `ConflictPolicy` for customizing merge behavior. The detector loads tombstone sets via the Views attached to each commit's operation. 资料来源：[crates/mnem-core/src/repo/conflict.rs]()

```mermaid
graph LR
    A[Commit A] -->|diverged| B[Common Ancestor]
    C[Commit B] -->|diverged| B
    B --> D[Detect Conflicts]
    D --> E{MergeConflicts?}
    E -->|Yes| F[Surface conflicts to caller]
    E -->|No| G[Auto-merge possible]
```

## Prolly Trees

mnem uses prolly trees (probabilistic trees) for efficient storage and lookup of the node graph. This is implemented in `crates/mnem-core/src/prolly/`. 资料来源：[crates/mnem-core/src/prolly/tree.rs]()

```mermaid
graph TD
    subgraph "Prolly Tree Structure"
        ROOT[Root Node / TreeChunk] --> LEFT[Left Child TreeChunk]
        ROOT --> RIGHT[Right Child TreeChunk]
        LEFT --> LL[Leaf TreeChunk]
        LEFT --> LR[Leaf TreeChunk]
        RIGHT --> RL[Leaf TreeChunk]
        RIGHT --> RR[Leaf TreeChunk]
    end
    
    style ROOT fill:#e1f5fe
    style LL fill:#f3e5f5
    style LR fill:#f3e5f5
    style RL fill:#f3e5f5
    style RR fill:#f3e5f5
```

The prolly tree implementation includes:

| Component | Purpose |
|-----------|---------|
| `TreeChunk` | Immutable chunk containing sorted entries |
| `Builder` | Constructs new trees from operations |
| `Cursor` | Navigates tree structure for lookups |
| `diff` | Computes differences between trees |
| `merge` | Merges divergent tree versions |

Prolly trees provide logarithmic-time lookups and efficient diffing for collaborative editing scenarios. 资料来源：[crates/mnem-core/src/lib.rs]()

## Storage Layer

The storage layer abstracts over different backend implementations.

### Blockstore

```rust
// From crates/mnem-core/src/lib.rs
pub trait Blockstore {
    fn get(&self, cid: &Cid) -> Result<Option<Vec<u8>>, Error>;
    fn put(&self, cid: &Cid, data: &[u8]) -> Result<(), Error>;
}
```

### OpHeadsStore

```rust
// From crates/mnem-core/src/lib.rs
pub trait OpHeadsStore {
    fn get_heads(&self) -> Result<Vec<Cid>, Error>;
    fn set_heads(&mut self, heads: &[Cid]) -> Result<(), Error>;
}
```

The codebase includes in-memory reference implementations of both traits for testing and development. 资料来源：[crates/mnem-core/src/lib.rs]()

## Index System

Secondary indexes enable efficient querying of the knowledge graph.

### Query

The primary query interface for searching nodes and edges.

### BruteForceVectorIndex

A vector index implementation for semantic search capabilities. This works in conjunction with the retrieve module to provide dense + sparse retrieval lanes that capture positional and relational context. 资料来源：[crates/mnem-core/src/lib.rs]()

## Retrieval System

The retrieve module provides the agent-facing interface for context assembly.

```rust
// From crates/mnem-core/src/retrieve/mod.rs
pub struct Retriever { /* ... */ }
```

The retriever composes:

1. **Filters** - Pre-filter nodes by type, properties, or time range
2. **Vector ranking** - Dense embeddings from the configured embedder
3. **Sparse ranking** - BM25-style keyword matching
4. **Token-budget packing** - Assembles context within LLM token limits

### Node Rendering

Nodes are rendered to a compact, deterministic YAML-like format suitable for LLM consumption:

```text
ntype: <ntype>
id: <uuid>
context: <context_sentence>
summary: <summary>
<prop_key>: <prop_value>
```

- `ntype` and `id` are always present
- `context` is emitted if `node.context_sentence` is `Some` (sits BEFORE summary per Anthropic's contextual-retrieval recipe)
- `summary` is emitted if `node.summary` is `Some`, clipped at `DEFAULT_RENDER_SUMMARY_CAP_CHARS` (8192) chars
- Scalar props (`String`, `Integer`, `Float`, `Bool`) are emitted in BTreeMap order
- Non-scalar props (`Link`, `Map`, `List`, `Bytes`, `Null`) are skipped
- Opaque `content` bytes are never rendered 资料来源：[crates/mnem-core/src/retrieve/mod.rs]()

## Identification System

mnem uses phantom-typed identifiers for type safety:

| Type | Description |
|------|-------------|
| `NodeId` | Identifies a node in the graph |
| `EdgeId` | Identifies an edge |
| `ChangeId` | Identifies a change operation |
| `OperationId` | Identifies an operation |
| `Link<T>` | Phantom-typed link to any type |

All CIDs are content-addressed, ensuring that the same content always produces the same identifier. 资料来源：[crates/mnem-core/src/lib.rs]()

## Codec System

The codec system provides canonical encoding and decoding:

```rust
// From crates/mnem-core/src/lib.rs
pub mod codec {
    pub fn encode<T: Encode>(&self, value: &T) -> Vec<u8>;
    pub fn decode<T: Decode>(&self, bytes: &[u8]) -> Result<T, Error>;
}
```

- **DAG-CBOR** - Primary serialization format with canonical encoding guarantees
- **DAG-JSON** - Debug export format for human inspection

Every object type preserves the byte-exact canonical-encoding round-trip property. 资料来源：[crates/mnem-core/src/lib.rs]()

## Signing System

The `sign` module provides Ed25519 signing and revocation-list verification for trust and integrity:

```rust
// From crates/mnem-core/src/lib.rs
pub mod sign {
    // Ed25519 signing operations
    // Revocation-list verification
}
```

## Chunking Integration

While chunking is primarily handled by the `mnem-ingest` crate, the core objects are designed to work seamlessly with chunked content:

The `Chunk` type is used throughout the system:

```rust
// Referenced from crates/mnem-ingest/src/chunk.rs
pub struct Chunk {
    pub content: String,
    pub tokens_estimate: usize,  // Fast whitespace-split estimation
}
```

Chunks preserve source order: section 0's chunks come before section 1's. Empty sections are skipped silently. 资料来源：[crates/mnem-ingest/src/chunk.rs]()

## Summary

The mnem core components form a layered architecture:

| Layer | Components | Responsibility |
|-------|------------|----------------|
| **Objects** | Node, Edge, Commit, View, Operation | Core data structures |
| **Storage** | Blockstore, OpHeadsStore | Persistence abstraction |
| **Trees** | ProllyTree, TreeChunk, Builder | Efficient ordered storage |
| **Repository** | ReadonlyRepo, Transaction | Access control and mutation |
| **Index** | Query, VectorIndex | Secondary access paths |
| **Retrieval** | Retriever | Agent-facing context assembly |
| **Codec** | DAG-CBOR, DAG-JSON | Canonical serialization |
| **Crypto** | Ed25519 signing | Integrity and trust |

This architecture enables mnem to serve as a versioned, collaborative knowledge graph with strong consistency guarantees and efficient retrieval capabilities for LLM integration.

---

<a id='page-hybrid-retrieval'></a>

## Hybrid Retrieval System

### 相关页面

相关主题：[Embedding Providers](#page-embed-providers)

<details>
<summary>Relevant Source Files</summary>

以下源码文件用于生成本页说明：

- [crates/mnem-core/src/retrieve/mod.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/retrieve/mod.rs)
- [crates/mnem-core/src/objects/node.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/objects/node.rs)
- [crates/mnem-http/src/handlers.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-http/src/handlers.rs)
- [crates/mnem-cli/src/config.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/config.rs)
- [crates/mnem-ingest/src/types.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-ingest/src/types.rs)
- [crates/mnem-ingest/src/chunk.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-ingest/src/chunk.rs)
- [crates/mnem-core/src/lib.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/lib.rs)
</details>

# Hybrid Retrieval System

## Overview

The Hybrid Retrieval System in mnem is an agent-facing retrieval subsystem that composes multiple ranking strategies—vector (dense), sparse, and graph-based expansion—into a unified token-budgeted context assembly pipeline. It is designed for LLM consumption, enabling autonomous agents to fetch relevant nodes from the repository under strict token budgets.

The system lives in `crates/mnem-core/src/retrieve/` and is exposed via HTTP API (`crates/mnem-http/src/handlers.rs`) and CLI (`crates/mnem-cli/`).

资料来源：[crates/mnem-core/src/lib.rs:18-23]()

## Architecture

```mermaid
graph TD
    subgraph "Retrieval Entry Points"
        HTTP[HTTP API: POST /v1/retrieve]
        CLI[CLI: mnem retrieve]
    end
    
    subgraph "Hybrid Retrieval Core"
        RT[Retriever]
        HF[Hybrid Fuser]
        VQ[Vector Query]
        SQ[Sparse Query]
        GQ[Graph Expansion]
        TB[Token Budget Packer]
    end
    
    subgraph "Indexes"
        VI[Vector Index]
        SI[Sparse Index]
        GI[Graph Index]
    end
    
    HTTP --> RT
    CLI --> RT
    RT --> HF
    HF --> VQ
    HF --> SQ
    HF --> GQ
    VQ --> VI
    SQ --> SI
    GQ --> GI
    HF --> TB --> Output[LLM Context]
```

## Core Components

### Retriever

The `Retriever` struct is the main facade for retrieval operations. It orchestrates query planning, index selection, and result fusion.

**Key Responsibilities:**
- Accept a query string and configuration parameters
- Dispatch parallel queries to vector, sparse, and graph indexes
- Fuse ranked results using configurable strategies
- Pack results into token budgets suitable for LLM context windows

资料来源：[crates/mnem-core/src/retrieve/mod.rs:1-50]()

### Node Rendering

Before results reach the LLM, nodes are rendered to a compact, deterministic YAML-like text representation:

```text
ntype: <ntype>
id: <uuid>
context: <context_sentence>
summary: <summary>
<prop_key>: <prop_value>
...
```

**Rendering Rules:**
| Field | Condition | Notes |
|-------|-----------|-------|
| `ntype` | Always | Node type identifier |
| `id` | Always | UUID |
| `context` | If `node.context_sentence` is `Some` | Position cue, emitted BEFORE summary |
| `summary` | If `node.summary` is `Some` | Clipped at 8192 chars by default |
| Scalar props | Always | Strings, integers, floats, booleans in BTreeMap order |
| Non-scalar props | Skipped | Links, Maps, Lists, Bytes, Null |

资料来源：[crates/mnem-core/src/retrieve/mod.rs:60-95]()

### Context Sentence (Anthropic Contextual Retrieval)

mnem implements Anthropic's 2024 Contextual Retrieval recipe. Each node may carry an optional `context_sentence` field—an LLM-generated one-sentence placement cue.

> "This paragraph is from Section 3 of a legal contract between Alice and Bob's employer..."

The ingest pipeline prepends this to `summary` before embedding so both dense and sparse lanes capture positional and relational context.

资料来源：[crates/mnem-core/src/objects/node.rs:95-115]()

## Retrieval Configuration

### CLI Configuration Keys

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `retrieve.limit` | `usize` | — | Maximum results to return |
| `retrieve.budget` | `u32` | — | Token budget for result packing |
| `retrieve.vector_cap` | `usize` | — | Vector index candidate cap |
| `retrieve.graph_expand` | `usize` | — | Graph neighbor expansion count |
| `retrieve.graph_depth` | `usize` | — | Graph traversal depth |
| `retrieve.graph_decay` | `u32` | — | Decay factor for graph scores |
| `retrieve.rerank_top_k` | `usize` | — | Top-K for re-ranking |
| `retrieve.hyde_max_tokens` | `usize` | — | Max tokens for HyDE hypothesis |
| `rerank.model` | `String` | — | Re-ranker model identifier |
| `rerank.base_url` | `String` | — | Re-ranker service base URL |

资料来源：[crates/mnem-cli/src/config.rs:1-100]()

### HTTP API Parameters

The `POST /v1/retrieve` endpoint accepts the following JSON body:

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `query` | `String` | Required | Search query |
| `limit` | `usize` | 20 | Result limit (clamped to `MAX_RETRIEVE_LIMIT`) |
| `vector_cap` | `usize` | — | Vector candidate cap (clamped to `MAX_VECTOR_CAP`) |
| `rerank_top_k` | `usize` | — | Re-rank candidate count (clamped to `MAX_RERANK_TOP_K`) |
| `hyde` | `bool` | false | Enable HyDE extractive summarization |
| `summarize` | `bool` | false | Enable centroid + MMR summarization |
| `summarize_k` | `usize` | 3 | Summary sentences count |

**Clamping Constants:**
- `MAX_RETRIEVE_LIMIT` — Prevents unbounded result sets
- `MAX_VECTOR_CAP` — Bounds vector search candidates
- `MAX_RERANK_TOP_K` — Limits re-ranking computation

资料来源：[crates/mnem-http/src/handlers.rs:200-280]()

## Chunking Strategies

The retrieval system operates on pre-chunked content. The ingest pipeline supports five chunking strategies, selectable per source kind:

| Strategy | Source Kind | Configuration | Behavior |
|----------|-------------|---------------|----------|
| `Paragraph` | Markdown | None | Splits on double-newline boundaries |
| `SentenceRecursive` | Text | `max_tokens`, `overlap` | Sentence-aware token-budgeted packing using Unicode UAX #29 boundaries |
| `SentenceRecursive` | PDF | `max_tokens=512`, `overlap=64` | Same as above with larger defaults |
| `Session` | Conversation | `max_messages=10` | Groups messages until role returns to `user` or max reached |
| `Structural` | Code | None | One chunk per section (function/class body from tree-sitter parser) |
| `Recursive` | (legacy) | `max_tokens`, `overlap` | Token-budgeted word-window sliding window |

资料来源：[crates/mnem-ingest/src/chunk.rs:1-100]()

### Auto-Chunking

The `auto_chunker(kind, heuristics)` function selects optimal strategies:

```rust
match kind {
    SourceKind::Markdown => ChunkerKind::Paragraph,
    SourceKind::Text => ChunkerKind::SentenceRecursive { max_tokens: 256, overlap: 32 },
    SourceKind::Pdf => ChunkerKind::SentenceRecursive { max_tokens: 512, overlap: 64 },
    SourceKind::Conversation => ChunkerKind::Session { max_messages: 10 },
    SourceKind::Code(_) => ChunkerKind::Structural,
}
```

资料来源：[crates/mnem-ingest/src/chunk.rs:40-65]()

## Source Kind Taxonomy

| Kind | Extensions | Parser | Index Type |
|------|------------|--------|------------|
| `Markdown` | `.md`, `.markdown` | `parse_markdown` | Hybrid |
| `Pdf` | `.pdf` | Sidecar (docling/unstructured) | Hybrid |
| `Conversation` | `.json`, `.jsonl` | Session parser | Session |
| `Text` | Other/unspecified | Raw text | Hybrid |
| `Code(Rust)` | `.rs` | Tree-sitter | Structural |
| `Code(Python)` | `.py`, `.pyi` | Tree-sitter | Structural |
| `Code(JavaScript)` | `.js`, `.mjs`, `.cjs` | Tree-sitter | Structural |
| `Code(TypeScript)` | `.ts`, `.tsx`, `.mts`, `.cts` | Tree-sitter | Structural |
| `Code(Go)` | `.go` | Tree-sitter | Structural |
| `Code(Java)` | `.java` | Tree-sitter | Structural |
| `Code(C)` | `.c`, `.h` | Tree-sitter | Structural |
| `Code(Cpp)` | `.cpp`, `.cc`, `.cxx`, `.hpp` | Tree-sitter | Structural |
| `Code(Ruby)` | `.rb`, `.gemspec`, `.rake`, `.erb` | Tree-sitter | Structural |
| `Code(CSharp)` | `.cs`, `.csx` | Tree-sitter | Structural |

资料来源：[crates/mnem-ingest/src/types.rs:1-80]()

## Retrieval Flow

```mermaid
sequenceDiagram
    participant Client
    participant Retriever
    participant VectorIndex
    participant SparseIndex
    participant GraphIndex
    participant Fuser
    participant TokenBudgetPacker
    participant LLM

    Client->>Retriever: query + config
    Retriever->>VectorIndex: vector_search(query)
    Retriever->>SparseIndex: sparse_search(query)
    Retriever->>GraphIndex: graph_expand(seed_nodes)
    VectorIndex-->>Fuser: ranked_candidates
    SparseIndex-->>Fuser: ranked_candidates
    GraphIndex-->>Fuser: ranked_candidates
    Fuser->>Fuser: reciprocal_rank_fusion
    Fuser->>TokenBudgetPacker: fused_results
    alt summarize=true
        TokenBudgetPacker->>TokenBudgetPacker: centroid_MMR_extraction
    end
    TokenBudgetPacker-->>LLM: token_budgeted_context
```

## HyDE (Hypothetical Document Embeddings)

When `hyde=true`, the system generates extractive summaries of top-M candidate nodes before final ranking. This follows the HyDE (Hypothetical Document Embeddings) pattern where:

1. Initial candidates are retrieved
2. Extractive summarization produces hypotheses
3. Hypotheses are re-embedded and ranked
4. Final top-K are packed into the context budget

资料来源：[crates/mnem-http/src/handlers.rs:250-270]()

## Branch Name Validation

The HTTP API validates branch names before creating commit references during ingest operations:

```
Invalid characters: space, tab, newline, null, ~, ^, :, ?, *, [, \, @{, .., //
Invalid patterns: leading /, trailing /, trailing ., trailing .lock
```

资料来源：[crates/mnem-http/src/handlers.rs:180-210]()

## Extractor Integration

The retrieval system works in conjunction with the entity extraction pipeline. Extractors produce entity spans and relation spans that populate the graph index:

| Extractor | Provider | Features |
|-----------|----------|----------|
| `RuleExtractor` | Default (NER) | Capitalized phrase heuristic, verb-window regex relations |
| `KeyBertAdapter` | Statistical | Requires `keybert` feature flag |
| `LLM` | Ollama | Requires `ollama` feature flag |

资料来源：[crates/mnem-ingest/src/extract.rs:1-100]()

## Configuration Example

```toml
[retrieve]
limit = 20
budget = 4096
vector_cap = 100
graph_expand = 5
graph_depth = 2
graph_decay = 80
rerank_top_k = 10
hyde_max_tokens = 256

[rerank]
model = "cross-encoder/ms-marco-MiniLM-L-6-v2"
base_url = "http://localhost:8080"

[ner]
provider = "rule"  # or "none"
```

资料来源：[crates/mnem-cli/src/config.rs:50-120]()

## See Also

- [Node Data Model](objects/node.md) — Node structure with summary and context_sentence
- [Chunking Strategies](ingest/chunk.md) — Five chunker implementations
- [HTTP API Reference](http/api.md) — REST endpoint documentation
- [Entity Extraction](ingest/extract.md) — NER and relation extraction

---

<a id='page-embed-providers'></a>

## Embedding Providers

### 相关页面

相关主题：[Hybrid Retrieval System](#page-hybrid-retrieval)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [crates/mnem-cli/src/commands/mod.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/commands/mod.rs)
- [crates/mnem-cli/src/config.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/config.rs)
- [crates/mnem-http/src/lib.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-http/src/lib.rs)
- [crates/mnem-mcp/src/tools/embed.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-mcp/src/tools/embed.rs)
- [crates/mnem-core/src/objects/node.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/objects/node.rs)
- [crates/mnem-embed-providers/src/http.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-embed-providers/src/http.rs)
</details>

# Embedding Providers

Embedding Providers is a pluggable subsystem in the mnem monorepo that abstracts the generation of vector embeddings for text content. It lives in the `crates/mnem-embed-providers` crate and is consumed by `mnem-cli`, `mnem-http`, and `mnem-mcp` to support dense vector indexing and semantic retrieval.

## Architecture Overview

The provider system follows a **strategy pattern** with runtime-configurable backends. Each provider implements the same `Embedder` trait, returning `Vec<f32>` vectors regardless of the underlying implementation (HTTP API, local model, ONNX runtime).

```mermaid
graph TD
    A["mnem-cli / mnem-http / mnem-mcp"] --> B["mnem-embed-providers"]
    B --> C["ProviderConfig"]
    C --> D["OpenAI Provider"]
    C --> E["Ollama Provider"]
    C --> F["ONNX Provider"]
    D --> G["REST API / OpenAI Compatible"]
    E --> H["Local Ollama Server"]
    F --> I["Local ONNX Runtime"]
    
    J["config.toml / ENV vars"] --> B
```

资料来源：[crates/mnem-cli/src/commands/mod.rs:1-50](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/commands/mod.rs)

## Supported Providers

| Provider | Backend Type | Model Selection | Configuration |
|----------|-------------|-----------------|---------------|
| **OpenAI** | Remote REST API | Via `model` field | `base_url`, `api_key`, `timeout_secs` |
| **Ollama** | Local REST API | Via `model` field | `base_url` (default: `http://localhost:11434`), `timeout_secs` |
| **ONNX** | Local ONNX Runtime | Bundled `all-MiniLM-L6-v2` | No network required |

资料来源：[crates/mnem-cli/src/config.rs:1-80](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/config.rs)

### OpenAI Provider

Sends text to OpenAI's embedding API or any OpenAI-compatible endpoint. Requires:
- `base_url`: API endpoint (default: `https://api.openai.com/v1`)
- `api_key`: Authentication token
- `model`: Embedding model identifier

### Ollama Provider

Connects to a local Ollama server for running open-source embedding models. Default endpoint is `http://localhost:11434`. The provider sets a 120-second timeout by default.

### ONNX Provider

Runs inference entirely offline using the ONNX Runtime with the `all-MiniLM-L6-v2` model. This is the **bundled default** when mnem is compiled with the `bundled-embedder` feature, providing zero-configuration embeddings for single-machine deployments.

资料来源：[crates/mnem-mcp/src/tools/embed.rs:1-60](https://github.com/Uranid/mnem/blob/main/crates/mnem-mcp/src/tools/embed.rs)

## Configuration Resolution

Embedding providers are configured through a **precedence chain** that varies slightly between consumer applications.

### mnem-cli Precedence

| Priority | Source | Fields |
|----------|--------|--------|
| 1 | Environment variables | `MNEM_EMBED_PROVIDER`, `MNEM_EMBED_MODEL`, `MNEM_EMBED_API_KEY_ENV`, `MNEM_EMBED_BASE_URL`, `MNEM_EMBED_DIM` |
| 2 | `~/.mnem/config.toml` | `[embed]` section |
| 3 | `<repo>/config.toml` | `[embed]` section |
| 4 | Bundled ONNX fallback | When compiled with `bundled-embedder` feature |

资料来源：[crates/mnem-cli/src/config.rs:80-120](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/config.rs)

### mnem-http Precedence

| Priority | Source | Behavior |
|----------|--------|----------|
| 1 | `POST /v1/embed` request body | Per-request model override |
| 2 | `<data_dir>/config.toml` | Server-wide `[embed]` section |

The HTTP server loads embed configuration lazily at startup. A malformed `[embed]` section logs a warning but does not prevent server startup—auto-embed simply remains disabled.

```rust
fn load_embed_config(data_dir: &Path) -> Option<mnem_embed_providers::ProviderConfig> {
    #[derive(serde::Deserialize)]
    struct MiniCfg {
        embed: Option<mnem_embed_providers::ProviderConfig>,
    }
    let path = data_dir.join("config.toml");
    let s = std::fs::read_to_string(&path).ok()?;
    match toml::from_str::<MiniCfg>(&s) {
        Ok(parsed) => parsed.embed,
        Err(e) => {
            tracing::warn!(path = %path.display(), error = %e,
                "config.toml [embed] parse failed; auto-embed disabled"
            );
            None
        }
    }
}
```

资料来源：[crates/mnem-http/src/lib.rs:1-50](https://github.com/Uranid/mnem/blob/main/crates/mnem-http/src/lib.rs)

### mnem-mcp Precedence

The MCP server uses a simplified three-tier chain without the global `~/.mnem/config.toml` lookup (design point: per-repo isolation):

1. `MNEM_EMBED_*` environment variables
2. `<repo>/config.toml` `[embed]` section
3. Bundled ONNX fallback (only when `bundled-embedder` feature is compiled)

资料来源：[crates/mnem-mcp/src/tools/embed.rs:20-40](https://github.com/Uranid/mnem/blob/main/crates/mnem-mcp/src/tools/embed.rs)

## ProviderConfig Schema

The configuration is parsed from TOML into a discriminated union:

```rust
pub enum ProviderConfig {
    Openai(OpenaiConfig),
    Ollama(OllamaConfig),
    Onnx(OnnxConfig),
}
```

Each variant carries only the parameters relevant to that provider, keeping the configuration minimal.

## Error Handling

All embedding operations return `EmbedError`, which is mapped from transport failures into actionable diagnostics:

```mermaid
graph LR
    A["ureq::Error"] --> B{"EmbedError"}
    B --> C["RateLimited"]
    B --> D["BadRequest<br/>status + body"]
    B --> E["Server<br/>status + body"]
    B --> F["Network<br/>transport message"]
    B --> G["Decode<br/>JSON parse failure"]
```

资料来源：[crates/mnem-embed-providers/src/http.rs:1-50](https://github.com/Uranid/mnem/blob/main/crates/mnem-embed-providers/src/http.rs)

### Error Display for Users

When embedding fails, mnem-cli formats the error into a short, actionable one-liner suitable for `eprintln!`:

| Provider | Common Cause | Suggestion |
|----------|--------------|------------|
| OpenAI | Invalid API key | Check `MNEM_EMBED_API_KEY_ENV` |
| Ollama | Server not running | Verify `ollama serve` is active |
| ONNX | Missing model file | Ensure `all-MiniLM-L6-v2` is bundled |

The `format_embed_failure` function accepts a `context` parameter (`"embedding"` for writes, `"query embedding"` for retrieval) to tailor suggestions.

资料来源：[crates/mnem-cli/src/commands/mod.rs:50-100](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/commands/mod.rs)

## Integration with Node Storage

Embedding vectors are stored on `Node` objects for use during semantic retrieval:

```rust
pub struct Node {
    pub id: NodeId,
    pub label: String,
    pub summary: Option<String>,           // LLM-facing retrieval text
    pub content: Option<Bytes>,           // Opaque payload
    pub context_sentence: Option<String>,  // Anthropic contextual retrieval prefix
    pub props: BTreeMap<String, Ipld>,    // Structured properties
}
```

The `summary` field is the primary text indexed by the dense embedder. The `context_sentence` (per Anthropic's 2024 Contextual Retrieval paper) is prepended to `summary` before embedding to capture positional context, reducing retrieval failure by 49-67%.

资料来源：[crates/mnem-core/src/objects/node.rs:1-50](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/objects/node.rs)

## Bundled Embedder Feature

The `bundled-embedder` Cargo feature compiles in an ONNX provider with `all-MiniLM-L6-v2`. When enabled:

- `mnem embed` works out-of-the-box without external services
- The MCP `mnem_retrieve` tool has a tier-3 fallback when no explicit vector provider is configured
- Ideal for air-gapped environments or local-first workflows

When not enabled, missing embedder configuration results in a warning during ingest; nodes are created without vectors, and a recovery path via `mnem reindex` is promoted.

## Summary

Embedding Providers abstracts vector generation behind a common interface, supporting three backends with distinct deployment profiles:

- **OpenAI**: Cloud-hosted, highest quality, requires API credentials
- **Ollama**: Self-hosted, flexible model selection, local compute
- **ONNX**: Offline-capable, bundled model, zero-configuration

Configuration flows from environment variables through TOML files, with graceful fallback behavior that never prevents core operations from functioning.

---

<a id='page-storage-backend'></a>

## Storage Backend

### 相关页面

相关主题：[System Architecture](#page-architecture)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [crates/mnem-core/src/store/mod.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/store/mod.rs)
- [crates/mnem-core/src/store/blockstore.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/store/blockstore.rs)
- [crates/mnem-core/src/store/op_heads.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/store/op_heads.rs)
- [crates/mnem-backend-redb/src/lib.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-backend-redb/src/lib.rs)
- [crates/mnem-backend-redb/src/blockstore.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-backend-redb/src/blockstore.rs)
- [crates/mnem-backend-redb/src/knn_edges_store.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-backend-redb/src/knn_edges_store.rs)
</details>

# Storage Backend

The storage backend is a critical subsystem in mnem that provides persistent storage for the content-addressable object graph. It abstracts storage operations behind well-defined traits, enabling pluggable storage implementations while maintaining a consistent API for the core data layer.

## Architecture Overview

The storage backend follows a trait-based abstraction pattern where `mnem-core` defines the storage interfaces and concrete implementations are provided by backend crates. This separation allows the core logic to remain independent of specific storage technologies.

```mermaid
graph TD
    subgraph "Application Layer"
        CLI[mnem-cli]
        HTTP[mnem-http]
    end
    
    subgraph "mnem-core"
        Repo[Repository]
        Transaction[Transaction]
        Objects[Node / Edge / Commit]
    end
    
    subgraph "Storage Traits"
        Blockstore[BlockStore Trait]
        OpHeadsStore[OpHeadsStore Trait]
        KnnEdgesStore[KnnEdgesStore Trait]
    end
    
    subgraph "Backend Implementations"
        RedbBackend[mnem-backend-redb]
    end
    
    CLI --> Repo
    HTTP --> Repo
    Repo --> Transaction
    Transaction --> Blockstore
    Transaction --> OpHeadsStore
    Repo --> KnnEdgesStore
    Blockstore --> RedbBackend
    OpHeadsStore --> RedbBackend
    KnnEdgesStore --> RedbBackend
```

## Core Storage Traits

The storage layer is built on three fundamental traits that define the contract between the core library and storage implementations.

### BlockStore Trait

The `BlockStore` trait provides low-level operations for storing and retrieving binary data blocks identified by Content Identifiers (CIDs).

```rust
// crates/mnem-core/src/store/blockstore.rs
pub trait Blockstore: Send + Sync {
    fn get(&self, cid: &Cid) -> Result<Option<Vec<u8>>>;
    fn put(&self, block: &[u8]) -> Result<Cid>;
    fn put_many<I>(&self, blocks: I) -> Result<Vec<Cid>>
    where
        I: IntoIterator<Item = Vec<u8>>,
        I::IntoIter: Send + Sync;
}
```

| Method | Purpose | Return Type |
|--------|---------|-------------|
| `get(cid)` | Retrieve a block by its CID | `Result<Option<Vec<u8>>>` |
| `put(block)` | Store a single block, returning its CID | `Result<Cid>` |
| `put_many(blocks)` | Batch insert multiple blocks | `Result<Vec<Cid>>` |

The trait implements the CAR (Content Addressable Archive) storage pattern where data integrity is verified through content hashing. 资料来源：[crates/mnem-core/src/store/blockstore.rs:1-20]()

### OpHeadsStore Trait

The `OpHeadsStore` trait manages operation heads—references to the latest operations in the operational transform system. It supports both single-head and multi-head scenarios with conflict detection.

```rust
// crates/mnem-core/src/store/op_heads.rs
pub trait OpHeadsStore: Send + Sync {
    fn get_heads(&self) -> Result<Vec<Cid>>;
    fn put_head(&self, op: &Op) -> Result<()>;
    fn put_heads(&self, ops: &[Op]) -> Result<()>;
    fn merge_heads(&self, merged: Vec<Cid>) -> Result<()>;
}
```

| Method | Purpose |
|--------|---------|
| `get_heads()` | Retrieve all current operation head CIDs |
| `put_head(op)` | Atomically update the single head |
| `put_heads(ops)` | Set multiple operation heads |
| `merge_heads(merged)` | Replace heads with merged result after conflict resolution |

资料来源：[crates/mnem-core/src/store/op_heads.rs:1-50]()

### KnnEdgesStore Trait

The `KnnEdgesStore` trait provides specialized storage for k-nearest-neighbor graph edges, enabling efficient vector similarity searches.

```rust
// Backend interface for KNN edge storage
pub trait KnnEdgesStore: Send + Sync {
    fn insert(&self, source_id: NodeId, embedding: &[f32]) -> Result<()>;
    fn search(&self, query: &[f32], k: usize) -> Result<Vec<(NodeId, f32)>>;
}
```

资料来源：[crates/mnem-backend-redb/src/knn_edges_store.rs:1-30]()

## Redb Backend Implementation

The `mnem-backend-redb` crate provides the reference implementation using the Redb embedded database, a fast, lightweight key-value store written in Rust.

### Module Structure

```
mnem-backend-redb/
├── src/
│   ├── lib.rs           # Main entry point and configuration
│   ├── blockstore.rs    # BlockStore implementation
│   └── knn_edges_store.rs # KNN edge storage with HNSW
```

### Initialization

The backend initializes by opening or creating a Redb database file:

```rust
// crates/mnem-backend-redb/src/lib.rs
pub struct Backend {
    db: redb::Database,
    path: PathBuf,
}

impl Backend {
    pub fn open(path: &Path) -> Result<Self> {
        let db = redb::Database::create(path)?;
        Ok(Self { db, path: path.to_path_buf() })
    }
}
```

资料来源：[crates/mnem-backend-redb/src/lib.rs:1-50]()

### BlockStore Implementation

The Redb blockstore implementation wraps the database with CAR-compatible semantics:

```rust
// crates/mnem-backend-redb/src/blockstore.rs
impl Blockstore for RedbBlockstore {
    fn get(&self, cid: &Cid) -> Result<Option<Vec<u8>>> {
        let key = cid.to_bytes();
        let guard = self.db.begin()?;
        let table = guard.open_table(BLOCKS_TABLE)?;
        Ok(table.get(key)?.map(|v| v.value().as_bytes().to_vec()))
    }
    
    fn put(&self, block: &[u8]) -> Result<Cid> {
        let hash = multihash::Sha256::digest(block);
        let cid = Cid::new_v1(DAG_CBOR, hash);
        // Store with CID bytes as key
    }
}
```

资料来源：[crates/mnem-backend-redb/src/blockstore.rs:1-100]()

### Storage Format

The redb backend organizes data into multiple tables:

| Table Name | Key Type | Value Type | Purpose |
|------------|----------|------------|---------|
| `blocks` | CID bytes | Raw block data | Content-addressed storage |
| `op_heads` | Fixed key | CID bytes | Operation head references |
| `knn_edges` | NodeId | Serialized edges | Vector similarity graph |

## Transaction Model

mnem implements a transactional write model through the `Transaction` type in the repository layer. Transactions provide ACID-like semantics for graph modifications.

```mermaid
graph LR
    A[Begin Transaction] --> B[Add Nodes]
    B --> C[Add Edges]
    C --> D[Commit]
    D --> E[Update OpHeads]
    E --> F[Success]
    
    D --> G[Abort]
    G --> H[Rollback]
```

### Write Operations

Transactions support atomic batch operations:

```rust
// Conceptual transaction interface
impl Transaction {
    pub fn add_node(&mut self, node: Node) -> Result<NodeId>;
    pub fn add_edge(&mut self, source: NodeId, target: NodeId, relation: &str) -> Result<()>;
    pub fn commit(self) -> Result<Commit>;
}
```

资料来源：[crates/mnem-core/src/repo/mod.rs:1-100]()

### Commit Structure

Each commit creates an immutable snapshot of the repository state:

```rust
// crates/mnem-core/src/objects/commit.rs
pub struct Commit {
    pub operation: Operation,
    pub parent: Option<Cid>,
    pub author: Author,
    pub message: String,
    pub timestamp: DateTime<Utc>,
}
```

资料来源：[crates/mnem-core/src/objects/commit.rs:1-50]()

## Data Persistence Flow

```mermaid
sequenceDiagram
    participant App as Application
    participant Tx as Transaction
    participant BS as BlockStore
    participant OHS as OpHeadsStore
    participant Redb as Redb DB

    App->>Tx: begin()
    Tx->>Tx: add_node(node)
    Tx->>BS: put(block)
    BS->>Redb: write(cid, data)
    Tx->>Tx: add_edge(src, dst)
    Tx->>BS: put(block)
    Tx->>Tx: commit()
    Tx->>BS: put(commit_block)
    Tx->>OHS: put_head(new_op)
    OHS->>Redb: update_heads()
    Redb-->>Tx: success
    Tx-->>App: commit_cid
```

## Configuration

Storage backend behavior is configured through the `Config` structure:

```rust
// crates/mnem-cli/src/config.rs
pub struct Config {
    pub store: Option<StoreConfig>,
    pub data_dir: PathBuf,
    // ...
}

pub struct StoreConfig {
    pub path: PathBuf,
    pub flush_interval_ms: Option<u64>,
}
```

资料来源：[crates/mnem-cli/src/config.rs:1-100]()

### Configuration Options

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `path` | `PathBuf` | `data/` | Base directory for storage files |
| `flush_interval_ms` | `u64` | `1000` | Periodic flush interval in milliseconds |

## Object Types and Serialization

### Node Storage

Nodes are serialized using DAG-CBOR and stored as blocks:

```rust
// crates/mnem-core/src/objects/node.rs
pub struct Node {
    pub id: NodeId,
    pub ntype: NodeType,
    pub summary: Option<String>,
    pub props: BTreeMap<String, Ipld>,
    pub content: Option<Bytes>,
    pub context_sentence: Option<String>,
}
```

资料来源：[crates/mnem-core/src/objects/node.rs:1-100]()

### Edge Storage

Edges link nodes with labeled relationships:

```rust
// crates/mnem-core/src/objects/edge.rs
pub struct Edge {
    pub source: NodeId,
    pub target: NodeId,
    pub relation: String,
    pub confidence: Option<f32>,
}
```

资料来源：[crates/mnem-core/src/objects/edge.rs:1-50]()

## Error Handling

Storage operations return the `Error` type defined in the core crate:

```rust
// crates/mnem-core/src/store/mod.rs
pub enum Error {
    #[error("block not found: {0}")]
    BlockNotFound(Cid),
    #[error("serialization failed: {0}")]
    SerializationFailed(String),
    #[error("database error: {0}")]
    DatabaseError(String),
}
```

资料来源：[crates/mnem-core/src/store/mod.rs:1-100]()

### Error Recovery

| Error Type | Recovery Strategy |
|------------|-------------------|
| `BlockNotFound` | Indicates data corruption; repository repair required |
| `SerializationFailed` | Check data integrity; may indicate schema mismatch |
| `DatabaseError` | Retry operation; check disk space and permissions |

## Indexes and Secondary Storage

### Vector Index

The KNN edges store maintains a vector index for similarity search operations:

```mermaid
graph TD
    A[Query Vector] --> B[KnnEdgesStore]
    B --> C[HNSW Index]
    C --> D[Approximate KNN Search]
    D --> E[Top-K Results]
```

The implementation uses HNSW (Hierarchical Navigable Small World) algorithm for efficient approximate nearest neighbor search. 资料来源：[crates/mnem-backend-redb/src/knn_edges_store.rs:1-100]()

### Query Interface

The retrieve module composes vector search with graph traversal:

```rust
// crates/mnem-core/src/retrieve/mod.rs
pub struct Retriever {
    blockstore: Arc<dyn Blockstore>,
    knn_edges: Arc<dyn KnnEdgesStore>,
    // ...
}
```

## Related Documentation

- [Repository Model](../core-concepts/repository-model) - Transaction and commit management
- [Object Schema](../core-concepts/object-schema) - Node, Edge, and Commit structures
- [Retrieval System](../features/retrieval) - Query and retrieval workflows
- [Configuration Guide](../getting-started/configuration) - Storage configuration options

---

<a id='page-ingestion'></a>

## Ingestion Pipeline

### 相关页面

相关主题：[Core Components](#page-core-components)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [crates/mnem-ingest/src/lib.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-ingest/src/lib.rs)
- [crates/mnem-ingest/src/pipeline.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-ingest/src/pipeline.rs)
- [crates/mnem-ingest/src/chunk.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-ingest/src/chunk.rs)
- [crates/mnem-ingest/src/pdf.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-ingest/src/pdf.rs)
- [crates/mnem-ingest/src/types.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-ingest/src/types.rs)
- [crates/mnem-cli/src/commands/ingest.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/commands/ingest.rs)
- [crates/mnem-core/src/objects/node.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/objects/node.rs)
</details>

# Ingestion Pipeline

The Ingestion Pipeline is the core system in mnem responsible for transforming external source documents (Markdown, PDFs, code files, conversations) into structured graph nodes within the repository. It handles parsing, chunking, entity extraction, and writing to the graph transaction—all without committing, allowing callers to control transaction boundaries.

## Overview

The pipeline orchestrates a multi-stage process:

1. **Detection** — Determine source kind from file extension or explicit configuration
2. **Parsing** — Convert raw bytes into a list of `Section` objects
3. **Chunking** — Split sections into semantically meaningful `Chunk` objects
4. **Extraction** — Optionally identify entities and relations via rule-based or LLM providers
5. **Writing** — Add nodes and edges to a borrowed `Transaction`

```mermaid
graph TD
    A[Raw Bytes] --> B[SourceKind Detection]
    B --> C[Parser Selection]
    C --> D[Parse to Sections]
    D --> E[Chunker Strategy]
    E --> F[Extract Entities & Relations]
    F --> G[Transaction Write]
    G --> H[IngestResult]
    
    C -->|md| C1[Markdown Parser]
    C -->|pdf| C2[PDF Parser]
    C -->|code| C3[Tree-sitter Parser]
    C -->|json/jsonl| C4[Conversation Parser]
    C -->|text| C5[Plain Text]
```

## Source Kind Detection

The `Ingester` automatically detects the source kind based on file extension. This determines both the parser and the default chunking strategy.

| Extension(s) | SourceKind | Default Chunker |
|--------------|------------|-----------------|
| `.md`, `.markdown` | `Markdown` | `Paragraph` |
| `.txt` | `Text` | `SentenceRecursive` (256 tokens, 32 overlap) |
| `.pdf` | `Pdf` | `SentenceRecursive` (512 tokens, 64 overlap) |
| `.json`, `.jsonl` | `Conversation` | `Session` (max 10 messages) |
| `.rs` | `Code(Rust)` | `Structural` |
| `.py`, `.pyi` | `Code(Python)` | `Structural` |
| `.js`, `.mjs`, `.cjs` | `Code(JavaScript)` | `Structural` |
| `.ts`, `.tsx`, `.mts`, `.cts` | `Code(TypeScript)` | `Structural` |
| `.go` | `Code(Go)` | `Structural` |
| `.java` | `Code(Java)` | `Structural` |
| `.c`, `.h` | `Code(C)` | `Structural` |
| `.cpp`, `.cc`, `.cxx`, `.hpp` | `Code(Cpp)` | `Structural` |
| `.rb`, `.gemspec`, `.rake` | `Code(Ruby)` | `Structural` |
| `.cs`, `.csx` | `Code(CSharp)` | `Structural` |
| Unknown/ext none | `Text` | `SentenceRecursive` |

资料来源：[pipeline.rs:source_kind_from_ext()]() [types.rs:SourceKind]() [types.rs:CodeLanguage::from_extension()]()

## Supported File Formats

### Markdown (`.md`, `.markdown`)

Parsed using CommonMark + GitHub Flavored Markdown (GFM) support. The parser extracts headings with depth information, creating section boundaries that respect document structure. Each heading becomes a section boundary.

### PDF (`.pdf`)

Pure-Rust text-layer extraction using `pdf-extract`. One section per page is created, with heading set to `"Page {n}"` at depth 1. PDFs with fewer than 100 text characters per page are flagged as potentially scanned. Malformed PDFs return `Error::ParseFailed`. 资料来源：[pdf.rs:MIN_TEXT_PER_PAGE]() [pdf.rs:parse_pdf()]()

### Code Files

Parsed using tree-sitter for supported languages (Rust, Python, JavaScript, TypeScript, Go, Java, C, Cpp, Ruby, CSharp). The parser extracts function and class bodies as sections, preserving structural boundaries. 资料来源：[code.rs]()

### Conversations (`.json`, `.jsonl`)

Supports chat exports from ChatGPT, Claude, and generic conversation formats. Messages are extracted with role (user/assistant/system), content, and timestamps when available. 资料来源：[conversation.rs]()

### Plain Text (`.txt` and others)

Falls back to plain text parsing for unknown extensions, including files without extensions like `README`.

## Chunker Strategies

The `ChunkerKind` enum defines five chunking strategies. Callers can override the auto-selected strategy via CLI or API.

### Strategy Selection

```mermaid
graph TD
    A[SourceKind] --> B[auto_chunker]
    B -->|Markdown| C[Paragraph]
    B -->|Text| D[SentenceRecursive<br/>256 tokens, 32 overlap]
    B -->|Pdf| E[SentenceRecursive<br/>512 tokens, 64 overlap]
    B -->|Conversation| F[Session<br/>max 10 messages]
    B -->|Code| G[Structural]
```

### Paragraph Chunker

Splits each section's body on double-newline boundaries. Fast and deterministic, ideal for Markdown where authoring structure already matches desired chunk boundaries. 资料来源：[chunk.rs:ChunkerKind::Paragraph]()

### Recursive Chunker

Token-budgeted word-window sliding window with configurable overlap. Kept for backwards compatibility. 资料来源：[chunk.rs:ChunkerKind::Recursive]()

### SentenceRecursive Chunker

Sentence-aware token-budgeted packing using Unicode sentence boundaries (UAX #29). Preferred for prose:

- Chunks never cut mid-sentence
- Overlap measured at sentence granularity
- Average chunk size is more uniform

Default for `Text` (256 tokens, 32 overlap) and `Pdf` (512 tokens, 64 overlap) source kinds. 资料来源：[chunk.rs:ChunkerKind::SentenceRecursive]() [chunk.rs:auto_chunker()]()

### Session Chunker

Groups contiguous conversation messages into session chunks. Boundaries fire on:
- Role returning to `user`, OR
- Reaching `max_messages` (default: 10)

Preserves turn ordering. Default for `Conversation` source kind. 资料来源：[chunk.rs:ChunkerKind::Session]()

### Structural Chunker

One chunk per section. Used for code sources where each section is already a function or class body extracted by the tree-sitter parser. 资料来源：[chunk.rs:ChunkerKind::Structural]()

## Entity Extraction

The pipeline optionally extracts entities and relations using configured extractors.

### RuleExtractor (Default)

Delegates entity detection to a `NerProvider` (default: capitalized-phrase heuristic) and proximity-based relation detection via verb-window regex. Supported relation patterns include: `joined`, `founded`, `acquired`, `owns`, `hired`, etc. 资料来源：[extract.rs:RuleExtractor]() [extract.rs:verb_window]()

### Optional: OllamaExtractor

Schema-constrained NER via a local Ollama server (gated behind `ollama` feature). Hallucinated spans are verified against section text and rejected. Failures degrade gracefully to empty results, keeping the rule-based baseline as the load-bearing path. 资料来源：[lib.rs:extract_llm]()

### Optional: KeyBertAdapter

Statistical entity extraction adapter driven by the server's configured embedder (gated behind `keybert` feature). 资料来源：[lib.rs:extract_keybert]()

## Pipeline API

### Ingester Configuration

```rust
pub struct IngestConfig {
    pub chunker: ChunkerKind,
    pub extractor: ExtractorKind,
    pub ner_provider: Option<NerProviderKind>,
    pub include_text: bool,
}
```

资料来源：[pipeline.rs:IngestConfig]() [lib.rs:IngestConfig]()

### Core Method

```rust
pub fn ingest(
    &self,
    tx: &mut Transaction,
    bytes: &[u8],
    kind: SourceKind,
) -> Result<IngestResult, Error>
```

**Returns** an `IngestResult` with counts and elapsed time. The `commit_cid` field is left `None`—callers who want a CID should call `tx.commit(...)` afterwards.

**Errors:**
- `Error::ParseFailed` — parser rejects the input
- `Error::UnsupportedSource` — source kind not covered
- `Error::Commit` — upstream codec/blockstore failures from `Transaction::add_node`/`add_edge`

资料来源：[pipeline.rs:Ingester::ingest()]()

## CLI Integration

The `mnem ingest` command provides CLI access to the pipeline.

```bash
mnem ingest notes.md
mnem ingest --text "The quick brown fox"
mnem ingest --chunker recursive --max-tokens 1024 book.pdf
mnem ingest --recursive docs/
```

### CLI Options

| Flag | Description | Default |
|------|-------------|---------|
| `--chunker` | Strategy selection | `auto` |
| `--max-tokens` | Target tokens per chunk | 512 |
| `--overlap` | Overlap tokens (recursive) | 32 |
| `--recursive` | Walk directory trees | false |
| `--ntype` | Root Doc node label | `Doc` |
| `-m`, `--message` | Commit message | Auto-generated |

资料来源：[commands/ingest.rs:Args]()

## Output Nodes

The pipeline writes three node types to the graph:

1. **Doc node** — Root node representing the ingested document
2. **Chunk nodes** — Smaller content pieces with `summary`, `content`, `context_sentence`, and `props` fields
3. **Entity nodes** — Extracted entities with span information
4. **Relation edges** — Connections between entities based on relation extraction

### Contextual Retrieval

Each chunk optionally stores a `context_sentence`—an LLM-generated one-sentence placement cue (e.g., "This paragraph is from Section 3 of a legal contract..."). This is prepended to the summary before embedding, following Anthropic's 2024 Contextual Retrieval recipe, which reports -49% to -67% retrieval-failure reduction. 资料来源：[node.rs:Node.context_sentence]()

## Sidecar Support

For PDFs with poor text-layer extraction, the pipeline supports escalation to external tools:

- **docling** (gated behind `sidecar-docling` feature)
- **unstructured-ingest** (gated behind `sidecar-unstructured` feature)

Sidecars are invoked when built-in PDF extraction quality is insufficient. 资料来源：[lib.rs:sidecar]()

## Token Estimation

Token counts are estimated via whitespace split (`tokens_estimate` field on `Chunk`). This is intentionally fast and deterministic. Cl100k accuracy is a documented future improvement. 资料来源：[chunk.rs:token estimation comment]()

## Error Handling

| Error Type | Cause | Recovery |
|------------|-------|----------|
| `ParseFailed` | Malformed input, encryption | Return error, don't create nodes |
| `UnsupportedSource` | Unknown source kind | Return error |
| `Commit` | Blockstore failure | Return error |
| Sidecar errors | Missing binary, CLI failure | Return `Error::Sidecar` |
| LLM extraction failure | Timeout, schema mismatch | Degrade to empty Vec |

资料来源：[pipeline.rs:ingest errors]() [lib.rs:Error types]()

---

<a id='page-cli-commands'></a>

## CLI Commands Reference

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [crates/mnem-cli/src/main.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/main.rs)
- [crates/mnem-cli/src/commands/ingest.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/commands/ingest.rs)
- [crates/mnem-cli/src/commands/tag.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/commands/tag.rs)
- [crates/mnem-cli/src/commands/refs.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/commands/refs.rs)
- [crates/mnem-cli/src/integrate.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/integrate.rs)
- [crates/mnem-cli/src/config.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/config.rs)
</details>

# CLI Commands Reference

This page documents all command-line interface commands available in `mnem-cli`, the primary user-facing tool for interacting with mnem repositories.

## Overview

The mnem CLI provides a unified interface for managing a local knowledge graph repository. It supports operations including repository initialization, content ingestion, node/edge manipulation, branching, tagging, retrieval, and third-party tool integration.

```mermaid
graph TD
    A[mnem CLI] --> B[Repository Operations]
    A --> C[Content Ingestion]
    A --> D[Graph Manipulation]
    A --> E[Version Control]
    A --> F[Retrieval]
    A --> G[Integration]
    
    B --> B1[init]
    B --> B2[status]
    
    C --> C1[ingest]
    
    D --> D1[add node]
    D --> D2[add edge]
    
    E --> E1[log]
    E --> E2[show]
    E --> E3[refs]
    E --> E4[tag]
    E --> E5[branches]
    
    F --> F1[retrieve]
    
    G --> G1[integrate]
```

资料来源：[crates/mnem-cli/src/main.rs:40-90](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/main.rs)

## Global Options

The following options are available for all commands:

| Option | Short | Description |
|--------|-------|-------------|
| `--repo <PATH>` | `-R` | Path to the repository directory (`.mnem/`). Defaults to walking up from the current directory, like `git` does. |

资料来源：[crates/mnem-cli/src/main.rs:33-36](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/main.rs)

## Repository Operations

### init

Initializes a new mnem repository.

```bash
mnem init [OPTIONS]
```

| Option | Description |
|--------|-------------|
| `--path <PATH>` | Custom repository path |
| `--name <NAME>` | Repository name |
| `--author <NAME>` | Default author name |
| `--email <EMAIL>` | Default author email |

### status

Prints current op-head, head commit, ref summary, and label counts.

```bash
mnem status [OPTIONS]
```

```bash
# Examples:
mnem status                    # current op + head commit + ref count
mnem -R ~/notes status         # explicit repo path
```

## Content Ingestion

### ingest

Parses external source files into the graph, creating Doc + Chunk + Entity nodes.

```bash
mnem ingest <PATH> [OPTIONS]
```

#### Supported Source Types

| Extension | Source Kind | Chunker Strategy |
|-----------|-------------|------------------|
| `.md`, `.markdown` | Markdown | Paragraph |
| `.txt` | Plain Text | SentenceRecursive (256 tokens, 32 overlap) |
| `.pdf` | PDF | SentenceRecursive (512 tokens, 64 overlap) |
| `.json`, `.jsonl` | Conversation | Session (10 messages max) |
| `.rs` | Rust Code | Structural |
| `.py`, `.pyi` | Python Code | Structural |
| `.js`, `.mjs`, `.cjs` | JavaScript Code | Structural |
| `.ts`, `.tsx`, `.mts`, `.cts` | TypeScript Code | Structural |
| `.go` | Go Code | Structural |
| `.java` | Java Code | Structural |
| `.c`, `.h` | C Code | Structural |
| `.cpp`, `.cc`, `.cxx`, `.hpp`, `.hxx` | C++ Code | Structural |
| `.rb`, `.gemspec`, `.rake`, `.erb` | Ruby Code | Structural |
| `.cs`, `.csx` | C# Code | Structural |
| Other | Text | SentenceRecursive |

资料来源：[crates/mnem-cli/src/commands/ingest.rs:1-50](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/commands/ingest.rs)

#### ingest Options

| Option | Description | Default |
|--------|-------------|---------|
| `<PATH>` | File or directory to ingest | Required (unless `--text`) |
| `--text` | Inline text to ingest | - |
| `--ntype <LABEL>` | Root Doc node label | `Doc` |
| `--chunker <STRATEGY>` | Chunker strategy | `auto` |
| `--max-tokens <N>` | Target tokens per chunk | `512` |
| `--overlap <N>` | Overlap tokens between chunks | `32` |
| `--recursive` | Walk directory trees | `false` |
| `-m`, `--message <MSG>` | Commit message | Auto-generated |

#### Chunker Strategies

| Strategy | Description |
|----------|-------------|
| `auto` | Picks strategy based on source kind |
| `paragraph` | Splits on double-newline (Markdown) |
| `recursive` | Token-budgeted sliding window |
| `sentence_recursive` | Sentence-aware token packing |
| `session` | Groups conversation messages |
| `structural` | One chunk per section (code) |

资料来源：[crates/mnem-cli/src/commands/ingest.rs:60-80](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/commands/ingest.rs)

## Graph Manipulation

### add node

Creates a new node in the graph.

```bash
mnem add node [OPTIONS]
```

| Option | Description |
|--------|-------------|
| `-s`, `--summary <TEXT>` | Node summary |
| `--label <LABEL>` | Node type label |
| `--prop <KEY=VALUE>` | Property (can be repeated) |
| `--context-sentence <TEXT>` | Positional context for retrieval |

```bash
# Examples:
mnem add node -s "Alice lives in Berlin"
mnem add node --label Person --prop name=Alice --prop city=Berlin -s "Alice is a climber"
```

### add edge

Creates a directed edge between two nodes.

```bash
mnem add edge [OPTIONS]
```

| Option | Description |
|--------|-------------|
| `--from <UUID>` | Source node UUID |
| `--to <UUID>` | Target node UUID |
| `--label <LABEL>` | Edge type label |
| `--prop <KEY=VALUE>` | Property (can be repeated) |

```bash
# Examples:
mnem add edge --from <src-uuid> --to <dst-uuid> --label knows
```

资料来源：[crates/mnem-cli/src/main.rs:65-80](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/main.rs)

## Version Control

### log

Walks the op-log backwards from the current head.

```bash
mnem log [OPTIONS]
```

| Option | Description |
|--------|-------------|
| `--limit <N>` | Maximum number of operations to show |
| `--format <FORMAT>` | Output format (`short`, `full`, `json`) |

### show

Shows the full detail of one operation.

```bash
mnem show <OPERATION_ID>
```

```bash
# Examples:
mnem show 01HZ...
```

### refs

Manages symbolic references to commits.

```bash
mnem refs <SUBCOMMAND>
```

#### refs Subcommands

| Subcommand | Description |
|------------|-------------|
| `list` | List every ref in the current view |
| `set <name> <target>` | Set ref to point at a target CID |
| `delete <name>` | Delete a ref |

```bash
# Examples:
mnem refs list
mnem refs set feature_branch 01HXYZ...
mnem refs delete old_branch
```

资料来源：[crates/mnem-cli/src/commands/refs.rs:1-45](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/commands/refs.rs)

### tag

Manages named tags that point to commits.

```bash
mnem tag <SUBCOMMAND>
```

#### tag Subcommands

| Subcommand | Description |
|------------|-------------|
| `list` | List every `refs/tags/<name>` ref with their target CIDs |
| `create <name>` | Create a new tag |
| `delete <name>` | Delete a tag |

#### tag create Options

| Option | Description |
|--------|-------------|
| `<name>` | Tag name (stored as `refs/tags/<name>`) |
| `target` | Optional commit CID, ref name, branch shortname, or `HEAD` |
| `--from <CID>` | Commit CID / ref / branch to point the tag at |

```bash
# Examples:
mnem tag list
mnem tag create v0.9
mnem tag create release-2024 --from 01HZ...
mnem tag delete v0.9
```

资料来源：[crates/mnem-cli/src/commands/tag.rs:1-60](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/commands/tag.rs)

### branches

Manages named branches in the repository.

```bash
mnem branches [OPTIONS]
```

| Option | Description |
|--------|-------------|
| `--list` | List all branches |
| `--create <NAME>` | Create a new branch |
| `--delete <NAME>` | Delete a branch |
| `--switch <NAME>` | Switch to a branch |

#### Branch Output Format

```json
{
  "schema": "mnem.v1.branches",
  "branches": [
    {"name": "main", "head": "<commit-cid>", "is_current": true},
    ...
  ]
}
```

## Retrieval

### retrieve

Searches the graph for nodes matching a query.

```bash
mnem retrieve [OPTIONS] <QUERY>
```

| Option | Description | Default |
|--------|-------------|---------|
| `--top-k <N>` | Number of results to return | `10` |
| `--max-tokens <N>` | Maximum tokens in response | `4096` |
| `--include <FIELD>` | Fields to include (summary, context, props) | All |
| `--format <FORMAT>` | Output format (`text`, `json`) | `text` |

```bash
# Examples:
mnem retrieve "query"
mnem retrieve --top-k 5 --max-tokens 2048 "machine learning"
```

## Integration

### integrate

Integrates mnem system prompts with third-party AI tools.

```bash
mnem integrate <HOST> [OPTIONS]
```

#### Supported Hosts

| Host | System Prompt Path |
|------|-------------------|
| `claude-code` | `~/.claude/CLAUDE.md` |
| `gemini-cli` | `~/.gemini/GEMINI.md` |
| `cursor` | `~/.cursor/rules/mnem.mdc` |
| `continue` | `~/.continue/config.json` |
| `zed` | `~/.config/zed/settings.json` (Linux) or `~/Library/Application Support/Zed/settings.json` (macOS) |

#### integrate Options

| Option | Description |
|--------|-------------|
| `--install` | Install system prompt to host |
| `--uninstall` | Remove system prompt from host |
| `--status` | Show integration status |

```mermaid
graph LR
    A[mnem integrate] --> B{Host Selection}
    B --> C[Claude Code]
    B --> D[Cursor]
    B --> E[Continue]
    B --> F[Zed]
    B --> G[Gemini CLI]
    
    C --> H[Markdown Marker]
    D --> H
    E --> I[JSON Field: systemMessage]
    F --> J[JSON Field: assistant.system_prompt]
    G --> H
```

资料来源：[crates/mnem-cli/src/integrate.rs:1-60](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/integrate.rs)

## Configuration

The CLI loads configuration from `~/.config/mnem/config.toml` or `.mnem/config.toml` in the repository root.

| Setting | Description |
|---------|-------------|
| `user.name` | Author name for commits |
| `user.email` | Author email for commits |
| `user.agent_id` | Agent identifier fallback |
| `llm.provider` | LLM provider (`ollama`, `openai`, `anthropic`) |
| `llm.model` | Model name |
| `llm.base_url` | API base URL (default: `http://localhost:11434`) |
| `llm.timeout_secs` | Request timeout (default: `120`) |

#### Author String Format

The author string for commits follows this precedence:
1. `name <email>` if both present
2. `name` if only name present
3. `email` if only email present
4. `agent_id` if only that present
5. `mnem-cli` as fallback

资料来源：[crates/mnem-cli/src/config.rs:1-80](https://github.com/Uranid/mnem/blob/main/crates/mnem-cli/src/config.rs)

## Command Pipeline

The following diagram shows how commands interact with the repository:

```mermaid
graph TD
    subgraph "CLI Layer"
        A[mnem CLI] --> B[Commands]
        B --> C[Ingest]
        B --> D[Add]
        B --> E[Retrieve]
        B --> F[Refs/Tags]
    end
    
    subgraph "Core Layer"
        C --> G[Ingester Pipeline]
        G --> H[Parser]
        H --> I[Chunker]
        I --> J[Extractor]
        J --> K[Transaction]
        
        D --> K
        F --> K
        E --> L[Retriever]
    end
    
    subgraph "Storage Layer"
        K --> M[Transaction]
        M --> N[Blockstore]
        M --> O[OpHeadsStore]
        L --> P[VectorIndex]
        P --> N
    end
```

## Exit Codes

| Code | Meaning |
|------|---------|
| `0` | Success |
| `1` | General error |
| `2` | Invalid arguments |
| `3` | Repository not found |
| `4` | Object not found |
| `5` | Conflict detected |

---

<a id='page-graphrag'></a>

## GraphRAG Implementation

### 相关页面

相关主题：[Hybrid Retrieval System](#page-hybrid-retrieval), [Core Components](#page-core-components)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [crates/mnem-graphrag/src/lib.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-graphrag/src/lib.rs)
- [crates/mnem-graphrag/src/community.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-graphrag/src/community.rs)
- [crates/mnem-graphrag/src/calibration.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-graphrag/src/calibration.rs)
- [crates/mnem-graphrag/src/confidence.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-graphrag/src/confidence.rs)
- [crates/mnem-graphrag/src/summarize.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-graphrag/src/summarize.rs)
- [crates/mnem-core/src/retrieve/community_filter.rs](https://github.com/Uranid/mnem/blob/main/crates/mnem-core/src/retrieve/community_filter.rs)
</details>

# GraphRAG Implementation

GraphRAG (Graph-based Retrieval Augmented Generation) is a hybrid retrieval approach that combines vector similarity search with graph-structured knowledge representation. In mnem, the GraphRAG implementation provides community-based entity extraction, confidence scoring, and intelligent graph traversal for enhanced context retrieval.

## Overview

The mnem GraphRAG system operates as a layered architecture that:

1. Extracts entities and relationships from ingested documents
2. Builds a knowledge graph with typed edges and communities
3. Enables community-aware retrieval that goes beyond simple vector similarity
4. Provides confidence-calibrated results suitable for agentic workflows

The implementation lives in `crates/mnem-graphrag/` and integrates with the core retrieval pipeline in `crates/mnem-core/src/retrieve/`.

## Core Components

### Module Structure

| Module | Purpose |
|--------|---------|
| `lib.rs` | Main entry point and public API exports |
| `community.rs` | Community detection and hierarchy management |
| `calibration.rs` | Confidence score calibration utilities |
| `confidence.rs` | Confidence scoring algorithms |
| `summarize.rs` | Community and entity summarization |

### Community Detection (`community.rs`)

Community detection partitions the knowledge graph into semantically coherent clusters. The implementation supports hierarchical community structures where:

- **Leaf communities** contain tightly interconnected entities
- **Parent communities** aggregate related sub-communities
- **Cross-community edges** connect related concepts across boundaries

Communities are used during retrieval to:
- Expand candidate sets by including related entities within the same community
- Filter results to the most relevant community cluster
- Enable "zoom-in" and "zoom-out" traversal patterns

资料来源：[crates/mnem-graphrag/src/community.rs]()

### Confidence Scoring (`confidence.rs`)

Every extracted entity and relationship receives a confidence score based on:

- **Extraction evidence**: Frequency and clarity of mentions in source documents
- **Graph connectivity**: Number and strength of edges connecting to other entities
- **Source reliability**: Document-level trust signals from the ingest pipeline

Confidence scores are normalized to a `[0.0, 1.0]` range and drive downstream filtering decisions.

资料来源：[crates/mnem-graphrag/src/confidence.rs]()

### Calibration (`calibration.rs`)

Calibration ensures that confidence scores accurately reflect true extraction quality. The module provides:

- **Score distribution analysis**: Histogram-based validation of score distributions
- **Threshold tuning**: Per-use-case threshold adjustment for precision/recall tradeoffs
- **Calibration curves**: Tools for evaluating score reliability

资料来源：[crates/mnem-graphrag/src/calibration.rs]()

### Summarization (`summarize.rs`)

The summarization module generates concise descriptions for:

- **Individual entities**: One-sentence summaries capturing core identity
- **Relationships**: Edge labels and descriptions explaining connections
- **Communities**: Multi-sentence overviews of community purpose and membership

Summaries are stored as `context_sentence` on Node objects, enabling contextual retrieval patterns described in the Anthropic Contextual Retrieval paper.

资料来源：[crates/mnem-graphrag/src/summarize.rs]()

## Retrieval Integration

### Community Filter (`community_filter.rs`)

The retrieval pipeline integrates GraphRAG through the community filter stage. When enabled, the retriever:

1. Identifies the community containing the top-scoring candidate
2. Expands the candidate set to include other high-confidence entities in that community
3. Re-ranks the expanded set using the configured reranker

```mermaid
graph TD
    A[Query Embedding] --> B[Vector Search]
    B --> C[Initial Candidates]
    C --> D[Community Detection]
    D --> E[Community Expansion]
    E --> F[Reranker]
    F --> G[Final Results]
```

资料来源：[crates/mnem-core/src/retrieve/community_filter.rs]()

### Retriever Configuration

The `Retriever` struct in `crates/mnem-core/src/retrieve/retriever.rs` exposes GraphRAG-related options:

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `graph_expand` | `Option<usize>` | `None` | Expansion radius for community-based retrieval |
| `graph_decay` | `Option<f32>` | `None` | Decay factor for graph traversal weights |
| `graph_depth` | `Option<usize>` | `None` | Maximum traversal depth |
| `community_filter_enabled` | `bool` | `false` | Enable community-based filtering |
| `ppr_size_gate` | `Option<usize>` | `None` | PPR personalization size threshold |

### PPR-Based Expansion

For larger graphs, mnem supports Personalized PageRank (PPR) based expansion using the adjacency index:

```rust
adjacency_index: Option<Arc<dyn AdjacencyIndex + Send + Sync>>
```

When the adjacency index is available, PPR mode provides:
- Personalized scoring based on seed nodes
- Cohesive community member inclusion
- Falls back to historical decay walk when index is unavailable

资料来源：[crates/mnem-core/src/retrieve/retriever.rs:10-50]()

## Data Flow

### Ingest Pipeline to GraphRAG

```mermaid
graph LR
    A[Source File] --> B[Parser]
    B --> C[Chunker]
    C --> D[Entity Extractor]
    D --> E[Graph Builder]
    E --> F[Community Detection]
    F --> G[Confidence Scoring]
    G --> H[Committed Nodes/Edges]
```

The ingest pipeline in `crates/mnem-ingest/src/pipeline.rs` coordinates:

1. **Parsing**: Detect source type and extract raw content
2. **Chunking**: Split into manageable units using auto-selected chunker
3. **Extraction**: Rule-based or LLM-powered entity extraction
4. **Graph building**: Create nodes and edges in the transaction
5. **Commit**: Persist to the IPLD-based object store

资料来源：[crates/mnem-ingest/src/pipeline.rs]()

### Chunk Strategy by Source Type

| Source Kind | Chunker | Tokens | Overlap |
|-------------|---------|--------|---------|
| Markdown | Paragraph | - | - |
| Text | SentenceRecursive | 256 | 32 |
| PDF | SentenceRecursive | 512 | 64 |
| Conversation | Session | 10 messages | - |
| Code | Structural | - | - |

资料来源：[crates/mnem-ingest/src/chunk.rs]()

## Configuration

### TOML Configuration

```toml
[retrieve]
limit = 20              # Maximum results
budget = 8192           # Token budget
vector_cap = 10         # Vector search candidates
graph_expand = 5        # Community expansion size
graph_depth = 3         # Traversal depth
rerank_top_k = 5        # Final reranking pool

[community]
enabled = true          # Enable community filtering
min_community_size = 3  # Minimum entities per community
```

### CLI Configuration

```bash
# Ingest with community extraction
mnem ingest --extractor keybert docs/

# Retrieve with community expansion
mnem retrieve "query" --graph-expand 10

# Configure via config command
mnem config set retrieve.graph_expand 5
```

资料来源：[crates/mnem-cli/src/config.rs]()

## API Reference

### Core Types

#### `Community`

```rust
pub struct Community {
    pub id: CommunityId,
    pub parent: Option<CommunityId>,
    pub members: Vec<EntityId>,
    pub summary: Option<String>,
    pub depth: u32,
}
```

#### `Entity`

```rust
pub struct Entity {
    pub id: EntityId,
    pub ntype: String,
    pub summary: Option<String>,
    pub context_sentence: Option<String>,
    pub confidence: f32,
    pub community: Option<CommunityId>,
}
```

### Public API (`lib.rs`)

| Function | Signature | Description |
|----------|-----------|-------------|
| `detect_communities` | `(graph: &Graph) -> Vec<Community>` | Run community detection |
| `score_entity` | `(entity: &Entity, graph: &Graph) -> f32` | Calculate confidence |
| `calibrate_scores` | `(scores: Vec<f32>) -> Vec<f32>` | Apply calibration |
| `summarize_community` | `(community: &Community) -> String` | Generate summary |
| `expand_from_seed` | `(seed: &[NodeId], depth: usize) -> Vec<NodeId>` | Graph expansion |

## Architecture Diagram

```mermaid
graph TD
    subgraph "Ingest Layer"
        I1[Markdown]
        I2[PDF]
        I3[Code]
        I4[Conversation]
    end
    
    subgraph "Extract Layer"
        E1[Rule Extractor]
        E2[LLM Extractor]
        E3[KeyBERT Adapter]
    end
    
    subgraph "Graph Layer"
        G1[Node Builder]
        G2[Edge Builder]
        G3[Community Detector]
    end
    
    subgraph "Score Layer"
        S1[Confidence Scorer]
        S2[Calibrator]
    end
    
    subgraph "Retrieve Layer"
        R1[Vector Index]
        R2[Community Filter]
        R3[Reranker]
    end
    
    I1 --> E1
    I2 --> E2
    I3 --> E1
    I4 --> E3
    
    E1 --> G1
    E2 --> G1
    E3 --> G1
    
    G1 --> G2
    G2 --> G3
    
    G3 --> S1
    S1 --> S2
    
    S2 --> R1
    R1 --> R2
    R2 --> R3
```

## Experimental Features

### E1: Community Expander

Experiment E1 enables community-expansion during retrieval:

- When `cfg.enabled` is `false` (default): Stage is a no-op
- When enabled: Top-N seeds' communities pull in additional cohesive members
- **Additive only**: Never drops existing candidates
- Matrix v4 showed -29pp R@10 regression with the old drop-filter semantic

### E2: PPR Graph Expansion

Experiment E2 introduces Personalized PageRank for graph expansion:

- Uses optional `AdjacencyIndex` for efficient neighborhood queries
- Falls back to historical decay walk when index unavailable
- Maintains byte-identical retrieval for default configuration

资料来源：[crates/mnem-core/src/retrieve/retriever.rs]()

## Warnings and Diagnostics

The retrieval system emits warnings when GraphRAG features encounter issues:

| Warning Code | Feature | Description |
|--------------|---------|-------------|
| `community_filter` | Community Filter | No-op community filter triggered |
| `graph_mode` | PPR | PPR ran without substrate graph |
| `graph_expand` | Expansion | Authored adjacency list was empty |
| `min_confidence` | Confidence | Results fell below confidence floor |
| `warnings_truncated` | Diagnostics | Warning list was truncated |

资料来源：[crates/mnem-core/src/retrieve/warnings.rs]()

## Best Practices

1. **Enable community filtering** for queries requiring holistic context
2. **Tune `graph_expand`** based on graph density—larger graphs need smaller expansion radii
3. **Calibrate confidence thresholds** per use case using the calibration module
4. **Use structural chunking** for codebases to capture function-level entity granularity
5. **Set `context_sentence`** on high-value nodes to improve contextual retrieval

## See Also

- [Retrieval System](../core/retrieve.md)
- [Ingest Pipeline](../ingest/pipeline.md)
- [Configuration Guide](../cli/config.md)
- [Contextual Retrieval](../core/contextual_retrieval.md)

---

---

## Doramagic 踩坑日志

项目：Uranid/mnem

摘要：发现 8 个潜在踩坑项，其中 1 个为 high/blocking；最高优先级：安全/权限坑 - 来源证据：[feature] hermes support。

## 1. 安全/权限坑 · 来源证据：[feature] hermes support

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：[feature] hermes support
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_c54919b2b8b340438a9e5aa17291b93a | https://github.com/Uranid/mnem/issues/27 | 来源类型 github_issue 暴露的待验证使用条件。

## 2. 能力坑 · 能力判断依赖假设

- 严重度：medium
- 证据强度：source_linked
- 发现：README/documentation is current enough for a first validation pass.
- 对用户的影响：假设不成立时，用户拿不到承诺的能力。
- 建议检查：将假设转成下游验证清单。
- 防护动作：假设必须转成验证项；没有验证结果前不能写成事实。
- 证据：capability.assumptions | github_repo:1221867246 | https://github.com/Uranid/mnem | README/documentation is current enough for a first validation pass.

## 3. 维护坑 · 来源证据：[bug] Broken docs links: SPEC.md, ROADMAP.md, and Architecture page

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个维护/版本相关的待验证问题：[bug] Broken docs links: SPEC.md, ROADMAP.md, and Architecture page
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_5c74e7a10f774af6b0460b5da009d1b4 | https://github.com/Uranid/mnem/issues/23 | 来源讨论提到 windows 相关条件，需在安装/试用前复核。

## 4. 维护坑 · 维护活跃度未知

- 严重度：medium
- 证据强度：source_linked
- 发现：未记录 last_activity_observed。
- 对用户的影响：新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- 建议检查：补 GitHub 最近 commit、release、issue/PR 响应信号。
- 防护动作：维护活跃度未知时，推荐强度不能标为高信任。
- 证据：evidence.maintainer_signals | github_repo:1221867246 | https://github.com/Uranid/mnem | last_activity_observed missing

## 5. 安全/权限坑 · 下游验证发现风险项

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：下游已经要求复核，不能在页面中弱化。
- 建议检查：进入安全/权限治理复核队列。
- 防护动作：下游风险存在时必须保持 review/recommendation 降级。
- 证据：downstream_validation.risk_items | github_repo:1221867246 | https://github.com/Uranid/mnem | no_demo; severity=medium

## 6. 安全/权限坑 · 存在评分风险

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：风险会影响是否适合普通用户安装。
- 建议检查：把风险写入边界卡，并确认是否需要人工复核。
- 防护动作：评分风险必须进入边界卡，不能只作为内部分数。
- 证据：risks.scoring_risks | github_repo:1221867246 | https://github.com/Uranid/mnem | no_demo; severity=medium

## 7. 维护坑 · issue/PR 响应质量未知

- 严重度：low
- 证据强度：source_linked
- 发现：issue_or_pr_quality=unknown。
- 对用户的影响：用户无法判断遇到问题后是否有人维护。
- 建议检查：抽样最近 issue/PR，判断是否长期无人处理。
- 防护动作：issue/PR 响应未知时，必须提示维护风险。
- 证据：evidence.maintainer_signals | github_repo:1221867246 | https://github.com/Uranid/mnem | issue_or_pr_quality=unknown

## 8. 维护坑 · 发布节奏不明确

- 严重度：low
- 证据强度：source_linked
- 发现：release_recency=unknown。
- 对用户的影响：安装命令和文档可能落后于代码，用户踩坑概率升高。
- 建议检查：确认最近 release/tag 和 README 安装命令是否一致。
- 防护动作：发布节奏未知或过期时，安装说明必须标注可能漂移。
- 证据：evidence.maintainer_signals | github_repo:1221867246 | https://github.com/Uranid/mnem | release_recency=unknown

<!-- canonical_name: Uranid/mnem; human_manual_source: deepwiki_human_wiki -->