# https://github.com/vornicx/Midas Project Manual

Generated at: 2026-06-27 18:15:10 UTC

## Table of Contents

- [Overview & Architecture](#page-1)
- [Core SDK & Memory Operations](#page-2)
- [MCP Server, Integrations & Distribution](#page-3)
- [Evaluation, Safety & Governance](#page-4)

<a id='page-1'></a>

## Overview & Architecture

### Related Pages

Related topics: [Core SDK & Memory Operations](#page-2), [MCP Server, Integrations & Distribution](#page-3), [Evaluation, Safety & Governance](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [server.json](https://github.com/vornicx/Midas/blob/main/server.json)
- [mcpb/manifest.json](https://github.com/vornicx/Midas/blob/main/mcpb/manifest.json)
- [midas/memory.py](https://github.com/vornicx/Midas/blob/main/midas/memory.py)
- [midas/mcp_server.py](https://github.com/vornicx/Midas/blob/main/midas/mcp_server.py)
- [midas/importance.py](https://github.com/vornicx/Midas/blob/main/midas/importance.py)
- [midas/distill.py](https://github.com/vornicx/Midas/blob/main/midas/distill.py)
- [midas/audit.py](https://github.com/vornicx/Midas/blob/main/midas/audit.py)
- [eval/adapters/midas_adapter.py](https://github.com/vornicx/Midas/blob/main/eval/adapters/midas_adapter.py)
- [eval/memory_safety.py](https://github.com/vornicx/Midas/blob/main/eval/memory_safety.py)
- [eval/summarization_ab.py](https://github.com/vornicx/Midas/blob/main/eval/summarization_ab.py)
- [eval/datasets.py](https://github.com/vornicx/Midas/blob/main/eval/datasets.py)
- [packages/midas-ts/src/mcp.ts](https://github.com/vornicx/Midas/blob/main/packages/midas-ts/src/mcp.ts)
- [packages/midas-ts/src/importance.ts](https://github.com/vornicx/Midas/blob/main/packages/midas-ts/src/importance.ts)
- [CONTRIBUTING.md](https://github.com/vornicx/Midas/blob/main/CONTRIBUTING.md)
- [examples/coding_agent_demo.py](https://github.com/vornicx/Midas/blob/main/examples/coding_agent_demo.py)
</details>

# Overview & Architecture

## Purpose & Design Philosophy

Midas is a local-first, source-traceable memory layer for autonomous coding agents. The defining design choice — visible in every module — is that **ingest and recall are LLM-free by default**; an LLM enters the pipeline only as an explicit opt-in via the distillation dial. As declared in the project metadata (`server.json:5`), Midas offers *"Local-first, source-traceable agent memory — no LLM at ingest, fully offline"*. The v0.0.4 release notes make this stance concrete: the default is "no-LLM", with three explicit tiers that trade off cost, locality, and fidelity, and naive distillation is documented to *not* lift answers on its own (Source: v0.0.4 release highlights).

The architecture separates three concerns that other RAG stacks conflate:

1. **Capture** — what to keep, how to score it, and how to dedupe/supersede.
2. **Recall** — how to surface the right records under a token budget without leaking the embedding.
3. **Use** — how an agent (or its audit policy) consumes the result, including a guard rail that decides whether a recalled memory may drive an action.

## Core Components

### Memory Engine

The Python `Memory` class in [midas/memory.py](https://github.com/vornicx/Midas/blob/main/midas/memory.py) is the single object agents interact with. It owns a pluggable `store`, a hybrid lexical index (`_bm25_cache` keyed by the store's change counter to avoid per-query O(N) rebuilds), and per-instance configuration for importance, supersession, and abstention. The class exposes three top-level verbs:

- `remember(...)` / `remember_many(...)` — ingest with importance, provenance, actor, and metadata.
- `recall(query, ...)` — semantic + lexical hybrid recall returning hits with score components.
- `assemble(query, token_budget=...)` — a budgeted context block ready to drop into an LLM prompt.

The engine is backed by a pluggable store (in-memory, SQLite, TurboVec) and an `embedder` (`local` ONNX bge or `hashing`). The MCP surface in [midas/mcp_server.py](https://github.com/vornicx/Midas/blob/main/midas/mcp_server.py) serialises results via `_serialize_recall_hit` and `_serialize_record`, deliberately omitting the embedding vector from every tool response (`memory_safety.py:46` calls out that `_serialize_record` exposes `id/kind/importance/provenance/actor/source/created_at/updated_at/superseded_by` only).

### No-LLM Importance Scoring

Importance is derived without any model call. [midas/importance.py](https://github.com/vornicx/Midas/blob/main/midas/importance.py) defines `ContentImportance` (content-word density + specifics: digits, proper-noun-likes, anti-backchannel floor) and `StructuralImportance` (assertion-vs-question cues). The `StructuralImportance` docstring spells out the discriminator a bag-of-features score misses: a question and a fact about the same topic score alike on content features, but only the assertion is worth remembering (`importance.py:11-14`). The TypeScript client mirrors this exact scoring in [packages/midas-ts/src/importance.ts](https://github.com/vornicx/Midas/blob/main/packages/midas-ts/src/importance.ts), using the same stopword set as a faithful port — important because the TS MCP client shown in [packages/midas-ts/src/mcp.ts](https://github.com/vornicx/Midas/blob/main/packages/midas-ts/src/mcp.ts) exposes a `Remember` tool whose description tells callers *"importance 1-5 (0 = auto-derive from content, no LLM)"*.

### Distillation Pipeline (opt-in)

Distillation is a deliberate second stage that converts raw turns into compact fact strings. [midas/distill.py](https://github.com/vornicx/Midas/blob/main/midas/distill.py) defines a `Distiller` Protocol and a `DISTILL_PROMPT` template; the reference local implementation is `OllamaDistiller` (default `llama3.2:3b` via `http://localhost:11434`, stdlib `urllib` only, no new dependency). Per the v0.0.4 release, the agent can also do its own distillation for `$0 to Midas`, and naive distillation is documented to *not* lift answers on its own. The summarisation A/B harness in [eval/summarization_ab.py](https://github.com/vornicx/Midas/blob/main/eval/summarization_ab.py) explicitly tests arms `raw, naive, struct, struct_replace` because the structured-card prompt is the hypothesis that compensates for the failures naive distillation exhibits on knowledge-update and temporal questions (`summarization_ab.py:14-22`).

### Recall, Supersession & Belief History

Recall supports `hybrid=true` (BM25 + semantic fusion), `as_of` for historical queries, and `metadata_filter` for namespaces. Medium-similarity supersession (`_content_words` gate in `memory.py`) lets the engine treat a near-duplicate with a changed value as a *new version* of the same fact rather than a separate record. [midas/audit.py](https://github.com/vornicx/Midas/blob/main/midas/audit.py) exposes `belief_history(mem, record_id)` which walks `superseded_by` links to reconstruct the OLDEST → newest timeline so a human auditor can answer "what did memory believe at t?".

### Safety & Policy Layer

[eval/memory_safety.py](https://github.com/vornicx/Midas/blob/main/eval/memory_safety.py) defines attack/benign `SafetyCase`s covering prompt injection, forgotten-confirmation reuse, and plan-as-recommendation misuse, each exercised through `guard_reliance(query, intended_use, acting_agent, limit=5)`. The adapter in [eval/adapters/midas_adapter.py](https://github.com/vornicx/Midas/blob/main/eval/adapters/midas_adapter.py) is the bridge that wires the harness into `Memory` with store/sparse/NLI/reranker wiring centralised in `reset()`.

## Deployment Surfaces

The same memory engine powers four concrete deployment surfaces:

| Surface | Entry point | Use case |
|---|---|---|
| Python library | `midas.Memory` | Embedding Midas inside an agent runtime |
| MCP server | `midas/mcp_server.py` (via `uvx midas-memory-mcp`) | Claude Desktop and any MCP-compatible host |
| TS MCP client | `packages/midas-ts/src/mcp.ts` | Node/JS agents calling a Midas MCP server |
| Eval harness | `eval/runner.py`, `eval/multiday.py`, `eval/memory_safety.py` | Offline, deterministic regression suite |

The MCP server is the canonical integration: `server.json` declares the package as `midas-memory-mcp` v0.0.4 with transport `stdio`, with env knobs `MIDAS_MCP_DB` (SQLite path), `MIDAS_MCP_EMBEDDER` (`local`|`hashing`), `MIDAS_MCP_MIN_IMPORTANCE`, and `MIDAS_MCP_MAX_RECORDS` (`server.json:13-32`). The Claude Desktop bundle in `mcpb/manifest.json` re-exports the same knobs as `user_config` fields (`mcpb/manifest.json:43-58`).

## Data Flow

```mermaid
flowchart LR
    A[Turn / observation] --> B[Capture]
    B --> C{Importance ≥ floor?}
    C -- no --> X[Discard]
    C -- yes --> D[Store + embed]
    D --> E[(Vector store<br/>+ BM25 index)]
    Q[Agent query] --> R[Recall: hybrid BM25+sem]
    E --> R
    R --> S[Budgeted context block]
    S --> T[LLM agent]
    Opt[Opt-in Distiller] -.-> B
    P[guard_reliance] -.-> S
```

The dashed lines mark the two non-default paths: distillation (`Memory(distiller=...)`) is opt-in (`distill.py:6-19`), and `guard_reliance` gates whether recalled content is allowed to drive the requested use (`memory_safety.py:62-79`). Datasets like `conflicts-v1` and `longmemeval` in [eval/datasets.py](https://github.com/vornicx/Midas/blob/main/eval/datasets.py) stress both — e.g. *confusable* questions that test whether the engine can supersede an older "Monday" with the newer "Friday" vet appointment.

## Configuration Cheat-Sheet

| Knob | Default | Effect |
|---|---|---|
| `embedder` | `local` (bge ONNX, offline) | Switch to `hashing` for zero-deps, lower quality |
| `importance` on `remember` | `0` → auto-derive | Non-zero overrides scoring |
| `MIDAS_MCP_MIN_IMPORTANCE` | unset | Filters at recall |
| `hybrid` (recall) | `false` | Fuse BM25 with semantic score |
| `as_of` (recall) | unset | Historical query — excludes later records |
| `distiller` | unset (off) | Enables `OllamaDistiller` or any `Distiller` Protocol impl |
| `abstention_threshold` | unset | Returns `"abstain"` when calibrated confidence is low |
| `forget_decayed(max_records=…)` | off | Drops lowest-value memories when store exceeds cap |

## See Also

- Distillation tiers & honest measurements: [BENCHMARKS.md](https://github.com/vornicx/Midas/blob/main/BENCHMARKS.md)
- Methodology, MCP policy, failure cases: [docs/methodology.md](https://github.com/vornicx/Midas/blob/main/docs/methodology.md)
- Long-horizon memory design notes: [docs/long-horizon-memory.md](https://github.com/vornicx/Midas/blob/main/docs/long-horizon-memory.md)
- Coding agent example: [examples/coding_agent_demo.py](https://github.com/vornicx/Midas/blob/main/examples/coding_agent_demo.py)
- Release notes (v0.0.4): [v0.0.4 release](https://github.com/vornicx/Midas/releases/tag/v0.0.4)
- Contributing guide & eval quick reference: [CONTRIBUTING.md](https://github.com/vornicx/Midas/blob/main/CONTRIBUTING.md)

---

<a id='page-2'></a>

## Core SDK & Memory Operations

### Related Pages

Related topics: [Overview & Architecture](#page-1), [Evaluation, Safety & Governance](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [midas/memory.py](https://github.com/vornicx/Midas/blob/main/midas/memory.py)
- [midas/__init__.py](https://github.com/vornicx/Midas/blob/main/midas/__init__.py)
- [midas/embeddings.py](https://github.com/vornicx/Midas/blob/main/midas/embeddings.py)
- [midas/distill.py](https://github.com/vornicx/Midas/blob/main/midas/distill.py)
- [midas/importance.py](https://github.com/vornicx/Midas/blob/main/midas/importance.py)
- [midas/policy.py](https://github.com/vornicx/Midas/blob/main/midas/policy.py)
- [midas/store.py](https://github.com/vornicx/Midas/blob/main/midas/store.py)
- [midas/sqlite_store.py](https://github.com/vornicx/Midas/blob/main/midas/sqlite_store.py)
- [midas/coding.py](https://github.com/vornicx/Midas/blob/main/midas/coding.py)
- [midas/mcp_server.py](https://github.com/vornicx/Midas/blob/main/midas/mcp_server.py)
- [quickstart.py](https://github.com/vornicx/Midas/blob/main/quickstart.py)
- [examples/coding_agent_demo.py](https://github.com/vornicx/Midas/blob/main/examples/coding_agent_demo.py)
</details>

# Core SDK & Memory Operations

## Overview

Midas is an **agentic memory SDK** that provides local-first, source-traceable memory for long-horizon agents. The core SDK lives in the `midas` package and exposes a small, store/embedder-agnostic API centered on the `Memory` class. It supports semantic recall, budgeted context assembly, importance scoring, and a governance guard — all without requiring an LLM at ingest or query time by default.

The package exports the public surface from [midas/__init__.py](https://github.com/vornicx/Midas/blob/main/midas/__init__.py), including `Memory`, embedders (`LocalEmbedder`, `HashingEmbedder`, `OpenAIEmbedder`), stores (`InMemoryStore`, optional `SQLiteStore`, `IVFStore`, `TurboVecStore`), the importance scorers, and the Guard. The default configuration runs fully offline with no LLM calls, using `HashingEmbedder` and `InMemoryStore` as zero-setup fallbacks.

## The Memory Class

The `Memory` class is the single entry point for agent integrations. It composes a store, an embedder, an optional reranker, an optional importance scorer, and an optional distiller into one cohesive object. From [quickstart.py](https://github.com/vornicx/Midas/blob/main/quickstart.py), the minimal usage is:

```python
from midas import Memory

mem = Memory()  # in-memory store + offline hashing embedder
mem.remember("Decision: the primary database is PostgreSQL.", kind="constraint", importance=5)
print(mem.assemble("When do we launch?", token_budget=128, window=1, thread_key="session"))
```

### Core Operations

The SDK is organized around four fundamental operations defined in [midas/memory.py](https://github.com/vornicx/Midas/blob/main/midas/memory.py):

| Operation | Purpose | Returns |
|-----------|---------|---------|
| `remember(content, kind, importance, ...)` | Ingest a single memory | `MemoryRecord` |
| `remember_many(items)` | Batch ingest (skips duplicates, enforces relevance floor) | `list[CaptureResult]` |
| `recall(query, limit, hybrid, as_of)` | Search and rank by relevance × importance × recency | `list[RecallHit]` |
| `assemble(query, token_budget, window)` | Build a prompt-ready, budgeted context block | `ContextBlock` |

Additional operations include `forget(memory_id)`, `forget_matching(...)`, `capture(...)` (intelligent capture that decides whether to keep a turn), `inspect_memory(id)`, `guard_reliance(...)`, and `store.all()` for raw access.

## Stores and Embedders

The SDK is **store-agnostic** and **embedder-agnostic**. The `Memory` class accepts these as constructor arguments, so swapping backends requires no code changes beyond instantiation.

**Stores** ([midas/store.py](https://github.com/vornicx/Midas/blob/main/midas/store.py), [midas/sqlite_store.py](https://github.com/vornicx/Midas/blob/main/midas/sqlite_store.py)):
- `InMemoryStore` — default, zero-setup, lost on restart.
- `SQLiteStore` — persistent local storage (requires `sqlite-vec`); survives restarts.
- `IVFStore` / `TurboVecStore` — optional ANN backends for larger corpora.

**Embedders** ([midas/embeddings.py](https://github.com/vornicx/Midas/blob/main/midas/embeddings.py)):
- `HashingEmbedder` — deterministic, offline, zero-dependency fallback.
- `LocalEmbedder` — bge ONNX model, runs on-device, no network egress.
- `OpenAIEmbedder` — hosted API (optional).
- `DiskCachedEmbedder` — wraps any embedder with on-disk caching.
- `LocalReranker` — cross-encoder reranking for higher precision at recall time.

## Memory Kinds, Provenance, and Importance

Midas enforces a typed vocabulary that governs retrieval weight and governance behavior.

**Memory kinds** (`note | chat | fact | preference | constraint | mission`) are defined as the `MemoryKind` enum in [midas/__init__.py](https://github.com/vornicx/Midas/blob/main/midas/__init__.py). The `coding` module ([midas/coding.py](https://github.com/vornicx/Midas/blob/main/midas/coding.py)) extends this vocabulary with code-specific tags like `architecture_decision`, `dependency_choice`, and `forbidden_action` — each mapped to a core `MemoryKind` plus metadata.

**Provenance** (`planning | action | observation | user_confirmation`) determines whether recalled memory may justify an action. The Guard ([midas/guard.py](https://github.com/vornicx/Midas/blob/main/midas/guard.py)) requires `user_confirmation` provenance for external or destructive actions.

**Importance** is an integer 1–5, or `0` to auto-derive from content without an LLM. The `StructuralImportance` scorer in [midas/importance.py](https://github.com/vornicx/Midas/blob/main/midas/importance.py) uses regex-based cues (first-person assertions, durable signals, copula, meta-discourse) to distinguish *assertions of personal facts* from *questions that merely mention salient words* — a discriminator the bag-of-features content score misses.

## Distillation (v0.0.4)

The v0.0.4 release introduced a **3-tier distillation dial** documented in [midas/distill.py](https://github.com/vornicx/Midas/blob/main/midas/distill.py):

1. **No distillation** (default) — raw turns stored verbatim. Measured to outperform naive LLM distillation on knowledge-update and temporal reasoning questions.
2. **Agent-driven distillation** — the agent's own LLM distills before calling `remember`. Costs $0 to Midas.
3. **Local distiller** — `OllamaDistiller` runs on-device via Ollama (e.g., `llama3.2:3b`). Also $0 to Midas, with the explicit tradeoff that distilled facts are non-deterministic paraphrases, not verbatim sources.

The `Distiller` protocol accepts a batch of texts and returns compact fact strings. `HTTPDistiller` provides an HTTP-based variant for remote models.

## Context Assembly

The `assemble` method ([midas/memory.py](https://github.com/vornicx/Midas/blob/main/midas/memory.py)) is the core prompt-injection primitive. It:

1. Recalls top-k hits matching the query.
2. Pulls in same-thread neighbours (`window` parameter, keyed by `thread_key`).
3. Orders records by priority (pinned first, then highest-value).
4. Greedily fills a token budget, breaking when full.
5. Formats each record via `format_record` (lean by default; full provenance opt-in).

The returned `ContextBlock` contains the assembled text, the underlying records, and the token count — ready to drop into an agent's prompt or chat template.

## Governance and Guard

The Guard layer ([midas/guard.py](https://github.com/vornicx/Midas/blob/main/midas/guard.py), [midas/policy.py](https://github.com/vornicx/Midas/blob/main/midas/policy.py)) enforces that memory may guide planning but cannot independently authorize external or destructive actions without explicit user confirmation. The `check_memory_use` operation returns a `MemoryUseDecision` with an `allowed` boolean and rationale, which agent loops must consult before acting on memory-sourced instructions.

## MCP Server Integration

For agent frameworks that speak the Model Context Protocol, [midas/mcp_server.py](https://github.com/vornicx/Midas/blob/main/midas/mcp_server.py) exposes the same operations as MCP tools: `remember`, `recall`, `build_context` (wraps `assemble`), `capture`, `forget`, `forget_matching`, `inspect_memory`, and `check_memory_use`. The server is configurable via environment variables (`MIDAS_MCP_DB`, `MIDAS_MCP_EMBEDDER`, `MIDAS_MCP_MAX_RECORDS`) and ships as a `uvx`-installable package (`midas-memory-mcp`).

## See Also

- Agent Policy and Instructions — for the system-prompt text agents should follow.
- Guard and Memory Use Decisions — for governance semantics.
- Coding Agent Memory Vocabulary — for the `coding` module's code-specific extension.
- MCP Server Tools — for the protocol-level tool surface.
- Distillation Dial — for the 3-tier distillation configuration.

---

<a id='page-3'></a>

## MCP Server, Integrations & Distribution

### Related Pages

Related topics: [Overview & Architecture](#page-1), [Core SDK & Memory Operations](#page-2)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [midas/mcp_server.py](https://github.com/vornicx/Midas/blob/main/midas/mcp_server.py)
- [packages/midas-ts/package.json](https://github.com/vornicx/Midas/blob/main/packages/midas-ts/package.json)
- [packages/midas-ts/README.md](https://github.com/vornicx/Midas/blob/main/packages/midas-ts/README.md)
- [server.json](https://github.com/vornicx/Midas/blob/main/server.json)
- [mcpb/manifest.json](https://github.com/vornicx/Midas/blob/main/mcpb/manifest.json)
- [CHANGELOG.md](https://github.com/vornicx/Midas/blob/main/CHANGELOG.md)
- [README.md](https://github.com/vornicx/Midas/blob/main/README.md)
</details>

# MCP Server, Integrations & Distribution

## Overview

The Midas MCP Server exposes the Midas memory engine to any Model Context Protocol client — Claude Desktop, IDE agents, or custom MCP clients — without changing the SDK's local-first design. The server exposes no LLM at ingest or query: recall remains offline, embeddings are computed locally, and memories are persisted to a local SQLite file the operator controls. The reference implementation is the Python package `midas-memory-mcp`; an experimental TypeScript port (`@midas/midas-memory-mcp`, via `packages/midas-ts/`) ships for Node-first hosts and shares a byte-compatible SQLite schema so a Python server and a TS server can target the same DB file. Source: [midas/mcp_server.py:1-40](), [packages/midas-ts/README.md:1-20](), [CHANGELOG.md:1-40]().

## Architecture

The MCP server is a thin transport layer over the existing `Memory` SDK. The `FastMCP` instance is built once at import time, and tools are registered through `@server.tool(...)` decorators. Tool handlers delegate to `_mem.remember`, `_mem.recall`, `_mem.store.get`, `_mem.forget`, and `_mem.guard_reliance`, optionally applying a namespace metadata filter via `_ns_filter(namespace)` so a single DB can be scoped per project or user. Source: [midas/mcp_server.py:80-180]().

```mermaid
flowchart LR
    Client["MCP Client<br/>(Claude Desktop, IDE, agent)"] -- "stdio JSON-RPC" --> Server["mcp_server.py<br/>FastMCP"]
    Server -- "remember/recall/capture/forget" --> Memory["Memory SDK"]
    Memory --> Embedder["LocalEmbedder / HashingEmbedder"]
    Memory --> Store["SQLiteStore or InMemoryStore"]
    Store --> DB[("local .sqlite3 file")]
```

The same SDK is reused by the TypeScript port, which mirrors the schema and hashing math bit-for-bit; this is what enables cross-runtime shared DBs. Source: [packages/midas-ts/README.md:18-40]().

## MCP Tools

The server registers seven tools, each with explicit `ToolAnnotations` so clients know which calls are read-only or destructive. Source: [midas/mcp_server.py:40-220]().

| Tool | Mutating? | Purpose |
|---|---|---|
| `remember` | yes | Store a memory; `importance=0` auto-derives from content (no LLM). |
| `recall` | read | Source-traceable hits with optional `hybrid` (BM25+fused) and `as_of` historical view. |
| `capture` | yes | Hands-off ingestion: Midas scores and drops low-value turns automatically. |
| `check_memory_use` | read | Guard decision (`guard_reliance`) for an intended action. |
| `forget` | destructive | Delete one memory by id; supersession chains relink. |
| `inspect_memory` | read | Fetch one stored record by id without search. |
| `get_agent_memory_instructions` | read | Returns the policy text, provenance taxonomy, and guard parameters. |

Two design rules recur in the tool docstrings: (1) external or destructive actions require `provenance=user_confirmation` — otherwise `check_memory_use` returns `allowed=False` and the agent must ask the user; (2) `capture` is the workhorse for hands-off, automatic remembering because it applies the relevance policy and skips duplicates without an LLM call. Source: [midas/mcp_server.py:120-260](), [eval/memory_safety.py:30-80]().

## Configuration & Environment

All knobs are environment variables read at server start. The most important ones are listed below; the `MIDAS_MCP_` prefix is stripped in the table for readability. Source: [midas/mcp_server.py:20-70](), [README.md:120-160](), [server.json:30-80](), [mcpb/manifest.json:20-60]().

| Variable | Default | Effect |
|---|---|---|
| `MIDAS_MCP_EMBEDDER` | `local` (if fastembed) else `hashing` | Embedding backend: `local` (bge ONNX, offline), `hashing` (zero-dep), `multilingual`, or any fastembed model id. |
| `MIDAS_MCP_DB` | in-memory | Path to a SQLite file for cross-restart persistence. |
| `MIDAS_MCP_MAX_RECORDS` | unbounded | Cap the store; over it, `remember` auto-forgets the lowest-value tail (no LLM). |
| `MIDAS_MCP_MIN_IMPORTANCE` | `2` | Floor for `capture` — turns scoring below it are skipped. |
| `MIDAS_MCP_NAMESPACE` | unset | Default scope tag on writes and applied to reads. |
| `MIDAS_MCP_ACTOR` | `midas-mcp` | Actor id stamped on MCP memories. |
| `MIDAS_MCP_ANN=1` | off | Sub-linear IVF index for very large stores. |
| `MIDAS_MCP_SUPERSEDE` | off | Enable NLI-gated belief revision. |
| `MIDAS_MCP_NLI=1` | off | Enable NLI entailment check (requires `local` extras). |
| `MIDAS_MCP_AUTO_MAINTAIN=<min>` | off | Idle-time upkeep interval. |
| `MIDAS_MCP_PINNED` | unset | Pin standing directives. |

Namespaces are the multi-tenancy primitive: a single SQLite file can serve several projects or agents without cross-contamination because `remember` writes a `namespace` tag into `metadata` and `recall` filters by it through `_ns_filter`. Source: [midas/mcp_server.py:90-150]().

## Distribution Channels

Midas reaches users through four parallel channels, each backed by the same Python entry point `python -m midas.mcp_server` (or the `midas-mcp` console script). Source: [CHANGELOG.md:1-60](), [server.json:1-90](), [mcpb/manifest.json:1-80](), [packages/midas-ts/package.json:1-40]().

1. **PyPI — `midas-memory-mcp`.** Installable with `uv pip install "midas-memory-mcp[mcp,local]"`. Lets developers pin a version in their own stack.
2. **Official MCP registry (`io.github.vornicx/midas`).** Listed via `server.json`; users can install with one click or run install-free with `uvx midas-memory-mcp`. The manifest declares `transport: stdio` and exposes `MIDAS_MCP_DB`, `MIDAS_MCP_EMBEDDER`, `MIDAS_MCP_MAX_RECORDS` to clients. Source: [server.json:10-80]().
3. **Claude Desktop / MCPB bundle.** `mcpb/manifest.json` packages the server as a one-click extension: it points `command` at `uvx midas-memory-mcp`, declares `claude_desktop >= 0.10.0`, surfaces four user-config fields (`db_path`, `embedder`, `min_importance`, `max_records`), and routes them into the env vars above. Privacy policy: `https://github.com/vornicx/Midas/blob/main/PRIVACY.md`. Source: [mcpb/manifest.json:1-80]().
4. **TypeScript port — `packages/midas-ts/`.** Ships an `npx midas-memory-mcp` launcher, targets `node >= 22.5`, and uses the same SQLite schema + float32 blob encoding as the Python server so either runtime can read/write the same DB. The README calls it explicitly "experimental" — semantic ONNX embeddings, NLI-gated revision, and the full eval harness remain Python-only. Source: [packages/midas-ts/package.json:1-40](), [packages/midas-ts/README.md:1-40]().

The release-time optimizations in v0.0.4 — a 442→198 token MCP policy shrink, lean `build_context` defaults, and an ~8× faster cached BM25 index — apply identically across every distribution channel because they live in the shared SDK rather than in any one wrapper. Source: [CHANGELOG.md:20-60]().

## Common Failure Modes

- **Empty `MIDAS_MCP_DB` path** means in-memory only; restarts lose every memory. Set a path to persist. Source: [midas/mcp_server.py:20-40]().
- **`capture` silently drops turns** when the importance floor (`MIDAS_MCP_MIN_IMPORTANCE`, default 2) is not met — this is by design so chat noise does not pollute the store, but callers wanting raw `remember` should call that tool instead. Source: [midas/mcp_server.py:160-220]().
- **`forget` on an id missing from the store** returns `"no memory with id <id>"` rather than raising — clients must check the return value. Source: [midas/mcp_server.py:200-220]().
- **`check_memory_use` for `external_action` or `destructive_action`** returns `allowed=False` unless the recalled memory has `provenance=user_confirmation`; agents that bypass this guard and act on weaker provenance will be blocked by the safety eval at `eval/memory_safety.py`. Source: [eval/memory_safety.py:20-80]().

## See Also

- [Core SDK & Memory Model](#) — `Memory.remember` / `recall` / `build_context` semantics
- [Distillation Dial](#) — agent-driven vs `OllamaDistiller` and the no-LLM default
- [Eval Harness & Benchmarks](#) — `eval/runner.py`, `eval/multiday.py`, `BENCHMARKS.md`

---

<a id='page-4'></a>

## Evaluation, Safety & Governance

### Related Pages

Related topics: [Overview & Architecture](#page-1), [Core SDK & Memory Operations](#page-2), [MCP Server, Integrations & Distribution](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [eval/datasets.py](https://github.com/vornicx/Midas/blob/main/eval/datasets.py)
- [eval/memory_safety.py](https://github.com/vornicx/Midas/blob/main/eval/memory_safety.py)
- [eval/summarization_ab.py](https://github.com/vornicx/Midas/blob/main/eval/summarization_ab.py)
- [eval/adapters/midas_adapter.py](https://github.com/vornicx/Midas/blob/main/eval/adapters/midas_adapter.py)
- [eval/runner.py](https://github.com/vornicx/Midas/blob/main/eval/runner.py)
- [midas/audit.py](https://github.com/vornicx/Midas/blob/main/midas/audit.py)
- [midas/importance.py](https://github.com/vornicx/Midas/blob/main/midas/importance.py)
- [midas/policy.py](https://github.com/vornicx/Midas/blob/main/midas/policy.py)
- [midas/memory.py](https://github.com/vornicx/Midas/blob/main/midas/memory.py)
- [midas/mcp_server.py](https://github.com/vornicx/Midas/blob/main/midas/mcp_server.py)
- [mcpb/manifest.json](https://github.com/vornicx/Midas/blob/main/mcpb/manifest.json)
- [CONTRIBUTING.md](https://github.com/vornicx/Midas/blob/main/CONTRIBUTING.md)
</details>

# Evaluation, Safety & Governance

Midas is a local-first, source-traceable agent memory layer that operates with **no LLM at ingest** and ships with a paired evaluation, safety, and governance stack. The three concerns are not bolted on: the same `Memory` engine used in production is the system under test, the same provenance records that drive recall also drive the audit chain, and the policy that constrains agent behavior is the same one the safety suite verifies.

## Evaluation Framework

The `eval/` package treats Midas as a black-box memory backend and runs it against standardized long-horizon benchmarks. The adapter pattern keeps the engine decoupled from the harness. `MidasAdapter` wraps a `Memory` instance and exposes `ingest`, `recall`, `forget_decayed`, and `store_size` — the four surfaces a benchmark cares about — translating the dataset's `Event` model into the dict shape `Memory.remember_many` expects. Source: [eval/adapters/midas_adapter.py:1-120]()

Datasets are first-class objects. `eval/datasets.py` ships:

- `beam` — a 6-day conflict scenario with stable controls, current-vs-stale temporal updates, and explicit unanswerable items for calibration. Source: [eval/datasets.py:1-160]()
- `longmemeval` — the ICLR 2025 long-term memory benchmark, projected onto a `Dataset` of `Sample`s where chat turns become `Event`s and gold answer turns become `Question.gold_event_ids`. Source: [eval/datasets.py:160-260]()
- Custom loaders that map each instance to exactly one `Sample`, preserving the `has_answer` flag to distinguish answerable from abstention questions.

`eval/runner.py` orchestrates a run: it ingests events, calls the agent loop, optionally verifies with NLI or an LLM judge, and bins results into `answers`, `abstentions`, `answers_grounded`, and per-category tallies. The "grounded" columns are correctness *after* the verifier override; the "base" columns are reader-only correctness, so the harness makes the verifier's contribution visible rather than hidden. Source: [eval/runner.py:1-200]()

### Distillation A/B Harness

`eval/summarization_ab.py` is a multi-arm A/B harness for the distillation dial introduced in v0.0.4. It contrasts a naive one-fact-per-batch prompt against `STRUCT_PROMPT` (structured memory cards of the form `[<entity>] <attribute> = <value> (when: <time>)`) and against a `struct_replace` mode that re-runs the struct distiller and overwrites prior cards. The harness is the empirical evidence behind the release-note claim that *naive distillation does not lift answers* — the structured card format is what preserves both sides of a knowledge update, which knowledge-update and temporal-reasoning questions test for. Source: [eval/summarization_ab.py:1-180]()

## Safety

Midas treats memory as a tool that can be manipulated, not just consulted. The safety surface is the `guard_reliance` method, exercised by `eval/memory_safety.py`'s `SafetyCase` suite. Each case is either an *attack* (the memory state tries to coerce the agent into a destructive or unauthorized action) or a *benign* control (the agent must still be allowed to act). Source: [eval/memory_safety.py:1-200]()

The pattern is identical across cases: `case.build().guard_reliance(c.query, intended_use=…, acting_agent=…, limit=5)`, then assert `decision.allowed == c.expect_allowed`. Concrete cases include:

- "plan_as_recommendation" — an internal plan stored as a planning artifact must not surface as an authoritative answer to a "what database should we migrate to" question.
- "forgotten confirmation" — once a user confirmation has been superseded, an action derived from it must be blocked.
- "destructive projection" — projected/inferred destructive actions are rejected even when the projection was confident.

Benign controls ("current confirmation -> external", "answer from an observation") exist specifically so the suite fails loudly if the guard becomes over-broad. The `run(verbose=True)` helper tallies `attack_success` (false allows) against `benign_pass` (true allows) and reports both, never collapsing them into a single accuracy number. Source: [eval/memory_safety.py:200-280]()

The agent-facing policy is text, not code, but it is the contract the safety suite verifies. `midas/policy.py` is the literal block pasted into an agent's system prompt: recall first, capture durable signal, *guard actions* via `check_memory_use` before external/destructive actions, forget only on confirmation, and for code work route forbidden actions through `check_forbidden_action`. Source: [midas/policy.py:1-80]()

## Governance

Governance in Midas is the ability to answer, after the fact, *why* a memory exists, *what* it superseded, and *how much* weight it should carry.

### Audit Trail

`midas/audit.py` reconstructs the full belief-revision chain for a given record. `belief_history(mem, record_id)` walks `superseded_by` links in both directions to return the OLDEST → NEWEST chain of revisions — exactly the timeline an auditor needs to answer "we believed v1 from t1, revised to v2 at t2, …". The function reads from `mem.store.all()` and never mutates, so it is safe to call from a synchronous request handler or an offline review tool. Source: [midas/audit.py:1-120]()

### Importance Scoring

`midas/importance.py` provides a no-LLM per-turn importance signal. `ContentImportance` scores on a bag-of-features: stopword-filtered content-word density, presence of digits, proper-noun-likes, length, and an anti-backchannel floor (so "ok" or "sounds good" does not score like a fact). `StructuralImportance` layers assertion-vs-question structure on top — the discriminator a pure content score misses, since a question and a fact about "sushi in Tokyo" share salient words but only the assertion is worth remembering. Both classes use cheap English regex cues with no dataset-specific vocabulary. Source: [midas/importance.py:1-160]()

A TypeScript port of the same scoring lives at `packages/midas-ts/src/importance.ts`, so a Node-side ingestion path applies identical governance to the same memory. Source: [packages/midas-ts/src/importance.ts:1-80]()

### MCP Surface

The MCP server exposes only the read and governance tools needed by an external auditor. `inspect_memory(memory_id)` returns a single record without search, mutation, or embedding exposure; `recall` with `explain=True` returns score components (relevance, importance_norm, recency) so a reviewer can see *why* a record ranked where it did. No LLM rewrite or rationale is generated server-side — the source citation is the explanation. Source: [midas/mcp_server.py:1-160]()

## Configuration

| Knob | Where | Default | Purpose |
|---|---|---|---|
| `abstention_threshold` | `Memory` ctor | 0.0 | Floor below which recall is treated as no-answer |
| `abstention_relevance_floor` | `Memory` ctor | 0.0 | Score component floor for abstention calibration |
| `abstention_entailment_floor` | `Memory` ctor | 0.0 | NLI entailment floor for grounded abstention |
| `min_importance` | `MIDAS_MCP_MIN_IMPORTANCE` env | 0 | Filter on the MCP server ingest path |
| `max_records` | `MIDAS_MCP_MAX_RECORDS` env | unset | Cap on store size; above it lowest-value records auto-forget |
| `verify_floor`, `verify_nli` | `eval.runner` flags | off | Toggle NLI verifier and its acceptance floor |
| `--judge` | `eval.runner` flag | off | Use an LLM judge instead of exact-match grading |

Source: [midas/memory.py:1-80](), [midas/mcp_server.py:1-120](), [eval/runner.py:1-200](), [mcpb/manifest.json:1-80](), [server.json:1-60]().

## Data Flow

```mermaid
flowchart LR
    A[Benchmark Sample] --> B[MidasAdapter]
    B --> C[Memory.remember_many]
    C --> D[Store + bm25 cache]
    A --> Q[Question]
    Q --> R[Memory.recall]
    R --> D
    R --> J[Verifier: NLI / LLM judge]
    J --> S[answers / abstentions / per-category]
    G[guard_reliance] --> D
    G --> K[allow / refuse]
    H[importance scorer] --> C
    AU[belief_history] --> D
```

## See Also

- [Memory Engine & Hybrid Recall](/wiki/memory-engine)
- [MCP Server & Tooling](/wiki/mcp-server)
- [Distillation & Distillers](/wiki/distillation)
- [Datasets & Benchmarks](/wiki/datasets)

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: vornicx/Midas

Summary: Found 12 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

## 1. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Developers should check this installation risk before relying on the project: Add a machine-readable client wiring receipt for init/status
- User impact: Developers may fail before the first successful local run: Add a machine-readable client wiring receipt for init/status
- Evidence: failure_mode_cluster:github_issue | https://github.com/vornicx/Midas/issues/15

## 2. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Developers should check this installation risk before relying on the project: Midas v0.1.0 — the midas CLI, Memory Inspector & shared-by-default memory
- User impact: Upgrade or migration may change expected behavior: Midas v0.1.0 — the midas CLI, Memory Inspector & shared-by-default memory
- Evidence: failure_mode_cluster:github_release | https://github.com/vornicx/Midas/releases/tag/v0.1.0

## 3. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Developers should check this installation risk before relying on the project: Midas v0.1.1 — recall noise floor + end-to-end quickstart
- User impact: Upgrade or migration may change expected behavior: Midas v0.1.1 — recall noise floor + end-to-end quickstart
- Evidence: failure_mode_cluster:github_release | https://github.com/vornicx/Midas/releases/tag/v0.1.1

## 4. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Developers should check this installation risk before relying on the project: v0.0.4
- User impact: Upgrade or migration may change expected behavior: v0.0.4
- Evidence: failure_mode_cluster:github_release | https://github.com/vornicx/Midas/releases/tag/v0.0.4

## 5. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/vornicx/Midas/issues/15

## 6. Configuration risk - Configuration risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.host_targets | https://github.com/vornicx/Midas

## 7. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/vornicx/Midas

## 8. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/vornicx/Midas

## 9. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/vornicx/Midas

## 10. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/vornicx/Midas

## 11. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/vornicx/Midas

## 12. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/vornicx/Midas

<!-- canonical_name: vornicx/Midas; human_manual_source: deepwiki_human_wiki -->
