Doramagic Project Pack · Human Manual
Midas
Local-first, eval-first memory for long-horizon AI agents — no LLM at ingest. Python SDK + MCP server with source-traceable recall, belief revision, selective forgetting, and reproducible benchmarks.
Overview & Architecture
Related topics: Core SDK & Memory Operations, MCP Server, Integrations & Distribution, Evaluation, Safety & Governance
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Core SDK & Memory Operations, MCP Server, Integrations & Distribution, Evaluation, Safety & Governance
Overview & Architecture
Purpose & Design Philosophy
Midas is a local-first, source-traceable memory layer for autonomous coding agents. The defining design choice — visible in every module — is that ingest and recall are LLM-free by default; an LLM enters the pipeline only as an explicit opt-in via the distillation dial. As declared in the project metadata (server.json:5), Midas offers *"Local-first, source-traceable agent memory — no LLM at ingest, fully offline"*. The v0.0.4 release notes make this stance concrete: the default is "no-LLM", with three explicit tiers that trade off cost, locality, and fidelity, and naive distillation is documented to *not* lift answers on its own (Source: v0.0.4 release highlights).
The architecture separates three concerns that other RAG stacks conflate:
- Capture — what to keep, how to score it, and how to dedupe/supersede.
- Recall — how to surface the right records under a token budget without leaking the embedding.
- Use — how an agent (or its audit policy) consumes the result, including a guard rail that decides whether a recalled memory may drive an action.
Core Components
Memory Engine
The Python Memory class in midas/memory.py is the single object agents interact with. It owns a pluggable store, a hybrid lexical index (_bm25_cache keyed by the store's change counter to avoid per-query O(N) rebuilds), and per-instance configuration for importance, supersession, and abstention. The class exposes three top-level verbs:
remember(...)/remember_many(...)— ingest with importance, provenance, actor, and metadata.recall(query, ...)— semantic + lexical hybrid recall returning hits with score components.assemble(query, token_budget=...)— a budgeted context block ready to drop into an LLM prompt.
The engine is backed by a pluggable store (in-memory, SQLite, TurboVec) and an embedder (local ONNX bge or hashing). The MCP surface in midas/mcp_server.py serialises results via _serialize_recall_hit and _serialize_record, deliberately omitting the embedding vector from every tool response (memory_safety.py:46 calls out that _serialize_record exposes id/kind/importance/provenance/actor/source/created_at/updated_at/superseded_by only).
No-LLM Importance Scoring
Importance is derived without any model call. midas/importance.py defines ContentImportance (content-word density + specifics: digits, proper-noun-likes, anti-backchannel floor) and StructuralImportance (assertion-vs-question cues). The StructuralImportance docstring spells out the discriminator a bag-of-features score misses: a question and a fact about the same topic score alike on content features, but only the assertion is worth remembering (importance.py:11-14). The TypeScript client mirrors this exact scoring in packages/midas-ts/src/importance.ts, using the same stopword set as a faithful port — important because the TS MCP client shown in packages/midas-ts/src/mcp.ts exposes a Remember tool whose description tells callers *"importance 1-5 (0 = auto-derive from content, no LLM)"*.
Distillation Pipeline (opt-in)
Distillation is a deliberate second stage that converts raw turns into compact fact strings. midas/distill.py defines a Distiller Protocol and a DISTILL_PROMPT template; the reference local implementation is OllamaDistiller (default llama3.2:3b via http://localhost:11434, stdlib urllib only, no new dependency). Per the v0.0.4 release, the agent can also do its own distillation for $0 to Midas, and naive distillation is documented to *not* lift answers on its own. The summarisation A/B harness in eval/summarization_ab.py explicitly tests arms raw, naive, struct, struct_replace because the structured-card prompt is the hypothesis that compensates for the failures naive distillation exhibits on knowledge-update and temporal questions (summarization_ab.py:14-22).
Recall, Supersession & Belief History
Recall supports hybrid=true (BM25 + semantic fusion), as_of for historical queries, and metadata_filter for namespaces. Medium-similarity supersession (_content_words gate in memory.py) lets the engine treat a near-duplicate with a changed value as a *new version* of the same fact rather than a separate record. midas/audit.py exposes belief_history(mem, record_id) which walks superseded_by links to reconstruct the OLDEST → newest timeline so a human auditor can answer "what did memory believe at t?".
Safety & Policy Layer
eval/memory_safety.py defines attack/benign SafetyCases covering prompt injection, forgotten-confirmation reuse, and plan-as-recommendation misuse, each exercised through guard_reliance(query, intended_use, acting_agent, limit=5). The adapter in eval/adapters/midas_adapter.py is the bridge that wires the harness into Memory with store/sparse/NLI/reranker wiring centralised in reset().
Deployment Surfaces
The same memory engine powers four concrete deployment surfaces:
| Surface | Entry point | Use case |
|---|---|---|
| Python library | midas.Memory | Embedding Midas inside an agent runtime |
| MCP server | midas/mcp_server.py (via uvx midas-memory-mcp) | Claude Desktop and any MCP-compatible host |
| TS MCP client | packages/midas-ts/src/mcp.ts | Node/JS agents calling a Midas MCP server |
| Eval harness | eval/runner.py, eval/multiday.py, eval/memory_safety.py | Offline, deterministic regression suite |
The MCP server is the canonical integration: server.json declares the package as midas-memory-mcp v0.0.4 with transport stdio, with env knobs MIDAS_MCP_DB (SQLite path), MIDAS_MCP_EMBEDDER (local|hashing), MIDAS_MCP_MIN_IMPORTANCE, and MIDAS_MCP_MAX_RECORDS (server.json:13-32). The Claude Desktop bundle in mcpb/manifest.json re-exports the same knobs as user_config fields (mcpb/manifest.json:43-58).
Data Flow
flowchart LR
A[Turn / observation] --> B[Capture]
B --> C{Importance ≥ floor?}
C -- no --> X[Discard]
C -- yes --> D[Store + embed]
D --> E[(Vector store<br/>+ BM25 index)]
Q[Agent query] --> R[Recall: hybrid BM25+sem]
E --> R
R --> S[Budgeted context block]
S --> T[LLM agent]
Opt[Opt-in Distiller] -.-> B
P[guard_reliance] -.-> SThe dashed lines mark the two non-default paths: distillation (Memory(distiller=...)) is opt-in (distill.py:6-19), and guard_reliance gates whether recalled content is allowed to drive the requested use (memory_safety.py:62-79). Datasets like conflicts-v1 and longmemeval in eval/datasets.py stress both — e.g. *confusable* questions that test whether the engine can supersede an older "Monday" with the newer "Friday" vet appointment.
Configuration Cheat-Sheet
| Knob | Default | Effect |
|---|---|---|
embedder | local (bge ONNX, offline) | Switch to hashing for zero-deps, lower quality |
importance on remember | 0 → auto-derive | Non-zero overrides scoring |
MIDAS_MCP_MIN_IMPORTANCE | unset | Filters at recall |
hybrid (recall) | false | Fuse BM25 with semantic score |
as_of (recall) | unset | Historical query — excludes later records |
distiller | unset (off) | Enables OllamaDistiller or any Distiller Protocol impl |
abstention_threshold | unset | Returns "abstain" when calibrated confidence is low |
forget_decayed(max_records=…) | off | Drops lowest-value memories when store exceeds cap |
See Also
- Distillation tiers & honest measurements: BENCHMARKS.md
- Methodology, MCP policy, failure cases: docs/methodology.md
- Long-horizon memory design notes: docs/long-horizon-memory.md
- Coding agent example: examples/coding_agent_demo.py
- Release notes (v0.0.4): v0.0.4 release
- Contributing guide & eval quick reference: CONTRIBUTING.md
Source: https://github.com/vornicx/Midas / Human Manual
Core SDK & Memory Operations
Related topics: Overview & Architecture, Evaluation, Safety & Governance
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview & Architecture, Evaluation, Safety & Governance
Core SDK & Memory Operations
Overview
Midas is an agentic memory SDK that provides local-first, source-traceable memory for long-horizon agents. The core SDK lives in the midas package and exposes a small, store/embedder-agnostic API centered on the Memory class. It supports semantic recall, budgeted context assembly, importance scoring, and a governance guard — all without requiring an LLM at ingest or query time by default.
The package exports the public surface from midas/__init__.py, including Memory, embedders (LocalEmbedder, HashingEmbedder, OpenAIEmbedder), stores (InMemoryStore, optional SQLiteStore, IVFStore, TurboVecStore), the importance scorers, and the Guard. The default configuration runs fully offline with no LLM calls, using HashingEmbedder and InMemoryStore as zero-setup fallbacks.
The Memory Class
The Memory class is the single entry point for agent integrations. It composes a store, an embedder, an optional reranker, an optional importance scorer, and an optional distiller into one cohesive object. From quickstart.py, the minimal usage is:
from midas import Memory
mem = Memory() # in-memory store + offline hashing embedder
mem.remember("Decision: the primary database is PostgreSQL.", kind="constraint", importance=5)
print(mem.assemble("When do we launch?", token_budget=128, window=1, thread_key="session"))
Core Operations
The SDK is organized around four fundamental operations defined in midas/memory.py:
| Operation | Purpose | Returns |
|---|---|---|
remember(content, kind, importance, ...) | Ingest a single memory | MemoryRecord |
remember_many(items) | Batch ingest (skips duplicates, enforces relevance floor) | list[CaptureResult] |
recall(query, limit, hybrid, as_of) | Search and rank by relevance × importance × recency | list[RecallHit] |
assemble(query, token_budget, window) | Build a prompt-ready, budgeted context block | ContextBlock |
Additional operations include forget(memory_id), forget_matching(...), capture(...) (intelligent capture that decides whether to keep a turn), inspect_memory(id), guard_reliance(...), and store.all() for raw access.
Stores and Embedders
The SDK is store-agnostic and embedder-agnostic. The Memory class accepts these as constructor arguments, so swapping backends requires no code changes beyond instantiation.
Stores (midas/store.py, midas/sqlite_store.py):
InMemoryStore— default, zero-setup, lost on restart.SQLiteStore— persistent local storage (requiressqlite-vec); survives restarts.IVFStore/TurboVecStore— optional ANN backends for larger corpora.
Embedders (midas/embeddings.py):
HashingEmbedder— deterministic, offline, zero-dependency fallback.LocalEmbedder— bge ONNX model, runs on-device, no network egress.OpenAIEmbedder— hosted API (optional).DiskCachedEmbedder— wraps any embedder with on-disk caching.LocalReranker— cross-encoder reranking for higher precision at recall time.
Memory Kinds, Provenance, and Importance
Midas enforces a typed vocabulary that governs retrieval weight and governance behavior.
Memory kinds (note | chat | fact | preference | constraint | mission) are defined as the MemoryKind enum in midas/__init__.py. The coding module (midas/coding.py) extends this vocabulary with code-specific tags like architecture_decision, dependency_choice, and forbidden_action — each mapped to a core MemoryKind plus metadata.
Provenance (planning | action | observation | user_confirmation) determines whether recalled memory may justify an action. The Guard (midas/guard.py) requires user_confirmation provenance for external or destructive actions.
Importance is an integer 1–5, or 0 to auto-derive from content without an LLM. The StructuralImportance scorer in midas/importance.py uses regex-based cues (first-person assertions, durable signals, copula, meta-discourse) to distinguish *assertions of personal facts* from *questions that merely mention salient words* — a discriminator the bag-of-features content score misses.
Distillation (v0.0.4)
The v0.0.4 release introduced a 3-tier distillation dial documented in midas/distill.py:
- No distillation (default) — raw turns stored verbatim. Measured to outperform naive LLM distillation on knowledge-update and temporal reasoning questions.
- Agent-driven distillation — the agent's own LLM distills before calling
remember. Costs $0 to Midas. - Local distiller —
OllamaDistillerruns on-device via Ollama (e.g.,llama3.2:3b). Also $0 to Midas, with the explicit tradeoff that distilled facts are non-deterministic paraphrases, not verbatim sources.
The Distiller protocol accepts a batch of texts and returns compact fact strings. HTTPDistiller provides an HTTP-based variant for remote models.
Context Assembly
The assemble method (midas/memory.py) is the core prompt-injection primitive. It:
- Recalls top-k hits matching the query.
- Pulls in same-thread neighbours (
windowparameter, keyed bythread_key). - Orders records by priority (pinned first, then highest-value).
- Greedily fills a token budget, breaking when full.
- Formats each record via
format_record(lean by default; full provenance opt-in).
The returned ContextBlock contains the assembled text, the underlying records, and the token count — ready to drop into an agent's prompt or chat template.
Governance and Guard
The Guard layer (midas/guard.py, midas/policy.py) enforces that memory may guide planning but cannot independently authorize external or destructive actions without explicit user confirmation. The check_memory_use operation returns a MemoryUseDecision with an allowed boolean and rationale, which agent loops must consult before acting on memory-sourced instructions.
MCP Server Integration
For agent frameworks that speak the Model Context Protocol, midas/mcp_server.py exposes the same operations as MCP tools: remember, recall, build_context (wraps assemble), capture, forget, forget_matching, inspect_memory, and check_memory_use. The server is configurable via environment variables (MIDAS_MCP_DB, MIDAS_MCP_EMBEDDER, MIDAS_MCP_MAX_RECORDS) and ships as a uvx-installable package (midas-memory-mcp).
See Also
- Agent Policy and Instructions — for the system-prompt text agents should follow.
- Guard and Memory Use Decisions — for governance semantics.
- Coding Agent Memory Vocabulary — for the
codingmodule's code-specific extension. - MCP Server Tools — for the protocol-level tool surface.
- Distillation Dial — for the 3-tier distillation configuration.
Source: https://github.com/vornicx/Midas / Human Manual
MCP Server, Integrations & Distribution
Related topics: Overview & Architecture, Core SDK & Memory Operations
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview & Architecture, Core SDK & Memory Operations
MCP Server, Integrations & Distribution
Overview
The Midas MCP Server exposes the Midas memory engine to any Model Context Protocol client — Claude Desktop, IDE agents, or custom MCP clients — without changing the SDK's local-first design. The server exposes no LLM at ingest or query: recall remains offline, embeddings are computed locally, and memories are persisted to a local SQLite file the operator controls. The reference implementation is the Python package midas-memory-mcp; an experimental TypeScript port (@midas/midas-memory-mcp, via packages/midas-ts/) ships for Node-first hosts and shares a byte-compatible SQLite schema so a Python server and a TS server can target the same DB file. Source: midas/mcp_server.py:1-40, packages/midas-ts/README.md:1-20, CHANGELOG.md:1-40.
Architecture
The MCP server is a thin transport layer over the existing Memory SDK. The FastMCP instance is built once at import time, and tools are registered through @server.tool(...) decorators. Tool handlers delegate to _mem.remember, _mem.recall, _mem.store.get, _mem.forget, and _mem.guard_reliance, optionally applying a namespace metadata filter via _ns_filter(namespace) so a single DB can be scoped per project or user. Source: midas/mcp_server.py:80-180.
flowchart LR
Client["MCP Client<br/>(Claude Desktop, IDE, agent)"] -- "stdio JSON-RPC" --> Server["mcp_server.py<br/>FastMCP"]
Server -- "remember/recall/capture/forget" --> Memory["Memory SDK"]
Memory --> Embedder["LocalEmbedder / HashingEmbedder"]
Memory --> Store["SQLiteStore or InMemoryStore"]
Store --> DB[("local .sqlite3 file")]The same SDK is reused by the TypeScript port, which mirrors the schema and hashing math bit-for-bit; this is what enables cross-runtime shared DBs. Source: packages/midas-ts/README.md:18-40.
MCP Tools
The server registers seven tools, each with explicit ToolAnnotations so clients know which calls are read-only or destructive. Source: midas/mcp_server.py:40-220.
| Tool | Mutating? | Purpose |
|---|---|---|
remember | yes | Store a memory; importance=0 auto-derives from content (no LLM). |
recall | read | Source-traceable hits with optional hybrid (BM25+fused) and as_of historical view. |
capture | yes | Hands-off ingestion: Midas scores and drops low-value turns automatically. |
check_memory_use | read | Guard decision (guard_reliance) for an intended action. |
forget | destructive | Delete one memory by id; supersession chains relink. |
inspect_memory | read | Fetch one stored record by id without search. |
get_agent_memory_instructions | read | Returns the policy text, provenance taxonomy, and guard parameters. |
Two design rules recur in the tool docstrings: (1) external or destructive actions require provenance=user_confirmation — otherwise check_memory_use returns allowed=False and the agent must ask the user; (2) capture is the workhorse for hands-off, automatic remembering because it applies the relevance policy and skips duplicates without an LLM call. Source: midas/mcp_server.py:120-260, eval/memory_safety.py:30-80.
Configuration & Environment
All knobs are environment variables read at server start. The most important ones are listed below; the MIDAS_MCP_ prefix is stripped in the table for readability. Source: midas/mcp_server.py:20-70, README.md:120-160, server.json:30-80, mcpb/manifest.json:20-60.
| Variable | Default | Effect |
|---|---|---|
MIDAS_MCP_EMBEDDER | local (if fastembed) else hashing | Embedding backend: local (bge ONNX, offline), hashing (zero-dep), multilingual, or any fastembed model id. |
MIDAS_MCP_DB | in-memory | Path to a SQLite file for cross-restart persistence. |
MIDAS_MCP_MAX_RECORDS | unbounded | Cap the store; over it, remember auto-forgets the lowest-value tail (no LLM). |
MIDAS_MCP_MIN_IMPORTANCE | 2 | Floor for capture — turns scoring below it are skipped. |
MIDAS_MCP_NAMESPACE | unset | Default scope tag on writes and applied to reads. |
MIDAS_MCP_ACTOR | midas-mcp | Actor id stamped on MCP memories. |
MIDAS_MCP_ANN=1 | off | Sub-linear IVF index for very large stores. |
MIDAS_MCP_SUPERSEDE | off | Enable NLI-gated belief revision. |
MIDAS_MCP_NLI=1 | off | Enable NLI entailment check (requires local extras). |
MIDAS_MCP_AUTO_MAINTAIN=<min> | off | Idle-time upkeep interval. |
MIDAS_MCP_PINNED | unset | Pin standing directives. |
Namespaces are the multi-tenancy primitive: a single SQLite file can serve several projects or agents without cross-contamination because remember writes a namespace tag into metadata and recall filters by it through _ns_filter. Source: midas/mcp_server.py:90-150.
Distribution Channels
Midas reaches users through four parallel channels, each backed by the same Python entry point python -m midas.mcp_server (or the midas-mcp console script). Source: CHANGELOG.md:1-60, server.json:1-90, mcpb/manifest.json:1-80, packages/midas-ts/package.json:1-40.
- PyPI —
midas-memory-mcp. Installable withuv pip install "midas-memory-mcp[mcp,local]". Lets developers pin a version in their own stack. - Official MCP registry (
io.github.vornicx/midas). Listed viaserver.json; users can install with one click or run install-free withuvx midas-memory-mcp. The manifest declarestransport: stdioand exposesMIDAS_MCP_DB,MIDAS_MCP_EMBEDDER,MIDAS_MCP_MAX_RECORDSto clients. Source: server.json:10-80. - Claude Desktop / MCPB bundle.
mcpb/manifest.jsonpackages the server as a one-click extension: it pointscommandatuvx midas-memory-mcp, declaresclaude_desktop >= 0.10.0, surfaces four user-config fields (db_path,embedder,min_importance,max_records), and routes them into the env vars above. Privacy policy:https://github.com/vornicx/Midas/blob/6548f6d50629f530c9da524bf54cf10efb551b47/PRIVACY.md. Source: mcpb/manifest.json:1-80. - TypeScript port —
packages/midas-ts/. Ships annpx midas-memory-mcplauncher, targetsnode >= 22.5, and uses the same SQLite schema + float32 blob encoding as the Python server so either runtime can read/write the same DB. The README calls it explicitly "experimental" — semantic ONNX embeddings, NLI-gated revision, and the full eval harness remain Python-only. Source: packages/midas-ts/package.json:1-40, packages/midas-ts/README.md:1-40.
The release-time optimizations in v0.0.4 — a 442→198 token MCP policy shrink, lean build_context defaults, and an ~8× faster cached BM25 index — apply identically across every distribution channel because they live in the shared SDK rather than in any one wrapper. Source: CHANGELOG.md:20-60.
Common Failure Modes
- Empty
MIDAS_MCP_DBpath means in-memory only; restarts lose every memory. Set a path to persist. Source: midas/mcp_server.py:20-40. capturesilently drops turns when the importance floor (MIDAS_MCP_MIN_IMPORTANCE, default 2) is not met — this is by design so chat noise does not pollute the store, but callers wanting rawremembershould call that tool instead. Source: midas/mcp_server.py:160-220.forgeton an id missing from the store returns"no memory with id <id>"rather than raising — clients must check the return value. Source: midas/mcp_server.py:200-220.check_memory_useforexternal_actionordestructive_actionreturnsallowed=Falseunless the recalled memory hasprovenance=user_confirmation; agents that bypass this guard and act on weaker provenance will be blocked by the safety eval ateval/memory_safety.py. Source: eval/memory_safety.py:20-80.
See Also
- Core SDK & Memory Model —
Memory.remember/recall/build_contextsemantics - Distillation Dial — agent-driven vs
OllamaDistillerand the no-LLM default - Eval Harness & Benchmarks —
eval/runner.py,eval/multiday.py,BENCHMARKS.md
Source: https://github.com/vornicx/Midas / Human Manual
Evaluation, Safety & Governance
Related topics: Overview & Architecture, Core SDK & Memory Operations, MCP Server, Integrations & Distribution
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview & Architecture, Core SDK & Memory Operations, MCP Server, Integrations & Distribution
Evaluation, Safety & Governance
Midas is a local-first, source-traceable agent memory layer that operates with no LLM at ingest and ships with a paired evaluation, safety, and governance stack. The three concerns are not bolted on: the same Memory engine used in production is the system under test, the same provenance records that drive recall also drive the audit chain, and the policy that constrains agent behavior is the same one the safety suite verifies.
Evaluation Framework
The eval/ package treats Midas as a black-box memory backend and runs it against standardized long-horizon benchmarks. The adapter pattern keeps the engine decoupled from the harness. MidasAdapter wraps a Memory instance and exposes ingest, recall, forget_decayed, and store_size — the four surfaces a benchmark cares about — translating the dataset's Event model into the dict shape Memory.remember_many expects. Source: eval/adapters/midas_adapter.py:1-120
Datasets are first-class objects. eval/datasets.py ships:
beam— a 6-day conflict scenario with stable controls, current-vs-stale temporal updates, and explicit unanswerable items for calibration. Source: eval/datasets.py:1-160longmemeval— the ICLR 2025 long-term memory benchmark, projected onto aDatasetofSamples where chat turns becomeEvents and gold answer turns becomeQuestion.gold_event_ids. Source: eval/datasets.py:160-260- Custom loaders that map each instance to exactly one
Sample, preserving thehas_answerflag to distinguish answerable from abstention questions.
eval/runner.py orchestrates a run: it ingests events, calls the agent loop, optionally verifies with NLI or an LLM judge, and bins results into answers, abstentions, answers_grounded, and per-category tallies. The "grounded" columns are correctness *after* the verifier override; the "base" columns are reader-only correctness, so the harness makes the verifier's contribution visible rather than hidden. Source: eval/runner.py:1-200
Distillation A/B Harness
eval/summarization_ab.py is a multi-arm A/B harness for the distillation dial introduced in v0.0.4. It contrasts a naive one-fact-per-batch prompt against STRUCT_PROMPT (structured memory cards of the form [<entity>] <attribute> = <value> (when: <time>)) and against a struct_replace mode that re-runs the struct distiller and overwrites prior cards. The harness is the empirical evidence behind the release-note claim that *naive distillation does not lift answers* — the structured card format is what preserves both sides of a knowledge update, which knowledge-update and temporal-reasoning questions test for. Source: eval/summarization_ab.py:1-180
Safety
Midas treats memory as a tool that can be manipulated, not just consulted. The safety surface is the guard_reliance method, exercised by eval/memory_safety.py's SafetyCase suite. Each case is either an *attack* (the memory state tries to coerce the agent into a destructive or unauthorized action) or a *benign* control (the agent must still be allowed to act). Source: eval/memory_safety.py:1-200
The pattern is identical across cases: case.build().guard_reliance(c.query, intended_use=…, acting_agent=…, limit=5), then assert decision.allowed == c.expect_allowed. Concrete cases include:
- "plan_as_recommendation" — an internal plan stored as a planning artifact must not surface as an authoritative answer to a "what database should we migrate to" question.
- "forgotten confirmation" — once a user confirmation has been superseded, an action derived from it must be blocked.
- "destructive projection" — projected/inferred destructive actions are rejected even when the projection was confident.
Benign controls ("current confirmation -> external", "answer from an observation") exist specifically so the suite fails loudly if the guard becomes over-broad. The run(verbose=True) helper tallies attack_success (false allows) against benign_pass (true allows) and reports both, never collapsing them into a single accuracy number. Source: eval/memory_safety.py:200-280
The agent-facing policy is text, not code, but it is the contract the safety suite verifies. midas/policy.py is the literal block pasted into an agent's system prompt: recall first, capture durable signal, *guard actions* via check_memory_use before external/destructive actions, forget only on confirmation, and for code work route forbidden actions through check_forbidden_action. Source: midas/policy.py:1-80
Governance
Governance in Midas is the ability to answer, after the fact, *why* a memory exists, *what* it superseded, and *how much* weight it should carry.
Audit Trail
midas/audit.py reconstructs the full belief-revision chain for a given record. belief_history(mem, record_id) walks superseded_by links in both directions to return the OLDEST → NEWEST chain of revisions — exactly the timeline an auditor needs to answer "we believed v1 from t1, revised to v2 at t2, …". The function reads from mem.store.all() and never mutates, so it is safe to call from a synchronous request handler or an offline review tool. Source: midas/audit.py:1-120
Importance Scoring
midas/importance.py provides a no-LLM per-turn importance signal. ContentImportance scores on a bag-of-features: stopword-filtered content-word density, presence of digits, proper-noun-likes, length, and an anti-backchannel floor (so "ok" or "sounds good" does not score like a fact). StructuralImportance layers assertion-vs-question structure on top — the discriminator a pure content score misses, since a question and a fact about "sushi in Tokyo" share salient words but only the assertion is worth remembering. Both classes use cheap English regex cues with no dataset-specific vocabulary. Source: midas/importance.py:1-160
A TypeScript port of the same scoring lives at packages/midas-ts/src/importance.ts, so a Node-side ingestion path applies identical governance to the same memory. Source: packages/midas-ts/src/importance.ts:1-80
MCP Surface
The MCP server exposes only the read and governance tools needed by an external auditor. inspect_memory(memory_id) returns a single record without search, mutation, or embedding exposure; recall with explain=True returns score components (relevance, importance_norm, recency) so a reviewer can see *why* a record ranked where it did. No LLM rewrite or rationale is generated server-side — the source citation is the explanation. Source: midas/mcp_server.py:1-160
Configuration
| Knob | Where | Default | Purpose |
|---|---|---|---|
abstention_threshold | Memory ctor | 0.0 | Floor below which recall is treated as no-answer |
abstention_relevance_floor | Memory ctor | 0.0 | Score component floor for abstention calibration |
abstention_entailment_floor | Memory ctor | 0.0 | NLI entailment floor for grounded abstention |
min_importance | MIDAS_MCP_MIN_IMPORTANCE env | 0 | Filter on the MCP server ingest path |
max_records | MIDAS_MCP_MAX_RECORDS env | unset | Cap on store size; above it lowest-value records auto-forget |
verify_floor, verify_nli | eval.runner flags | off | Toggle NLI verifier and its acceptance floor |
--judge | eval.runner flag | off | Use an LLM judge instead of exact-match grading |
Source: midas/memory.py:1-80, midas/mcp_server.py:1-120, eval/runner.py:1-200, mcpb/manifest.json:1-80, server.json:1-60.
Data Flow
flowchart LR
A[Benchmark Sample] --> B[MidasAdapter]
B --> C[Memory.remember_many]
C --> D[Store + bm25 cache]
A --> Q[Question]
Q --> R[Memory.recall]
R --> D
R --> J[Verifier: NLI / LLM judge]
J --> S[answers / abstentions / per-category]
G[guard_reliance] --> D
G --> K[allow / refuse]
H[importance scorer] --> C
AU[belief_history] --> DSee Also
Source: https://github.com/vornicx/Midas / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
Developers may fail before the first successful local run: Add a machine-readable client wiring receipt for init/status
Upgrade or migration may change expected behavior: Midas v0.1.0 — the midas CLI, Memory Inspector & shared-by-default memory
Upgrade or migration may change expected behavior: Midas v0.1.1 — recall noise floor + end-to-end quickstart
Upgrade or migration may change expected behavior: v0.0.4
Doramagic Pitfall Log
Found 12 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.
1. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Developers should check this installation risk before relying on the project: Add a machine-readable client wiring receipt for init/status
- User impact: Developers may fail before the first successful local run: Add a machine-readable client wiring receipt for init/status
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Add a machine-readable client wiring receipt for init/status. Context: Observed during installation or first-run setup.
- Evidence: failure_mode_cluster:github_issue | https://github.com/vornicx/Midas/issues/15
2. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Developers should check this installation risk before relying on the project: Midas v0.1.0 — the midas CLI, Memory Inspector & shared-by-default memory
- User impact: Upgrade or migration may change expected behavior: Midas v0.1.0 — the midas CLI, Memory Inspector & shared-by-default memory
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Midas v0.1.0 — the midas CLI, Memory Inspector & shared-by-default memory. Context: Observed when using python
- Evidence: failure_mode_cluster:github_release | https://github.com/vornicx/Midas/releases/tag/v0.1.0
3. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Developers should check this installation risk before relying on the project: Midas v0.1.1 — recall noise floor + end-to-end quickstart
- User impact: Upgrade or migration may change expected behavior: Midas v0.1.1 — recall noise floor + end-to-end quickstart
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Midas v0.1.1 — recall noise floor + end-to-end quickstart. Context: Observed when using python
- Evidence: failure_mode_cluster:github_release | https://github.com/vornicx/Midas/releases/tag/v0.1.1
4. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Developers should check this installation risk before relying on the project: v0.0.4
- User impact: Upgrade or migration may change expected behavior: v0.0.4
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: v0.0.4. Context: Observed when using python
- Evidence: failure_mode_cluster:github_release | https://github.com/vornicx/Midas/releases/tag/v0.0.4
5. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/vornicx/Midas/issues/15
6. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.host_targets | https://github.com/vornicx/Midas
7. Capability evidence risk: Capability evidence risk requires verification
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | https://github.com/vornicx/Midas
8. Maintenance risk: Maintenance risk requires verification
- Severity: medium
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/vornicx/Midas
9. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: downstream_validation.risk_items | https://github.com/vornicx/Midas
10. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: risks.scoring_risks | https://github.com/vornicx/Midas
11. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/vornicx/Midas
12. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/vornicx/Midas
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using Midas with real data or production workflows.
- Add a machine-readable client wiring receipt for init/status - github / github_issue
- Midas v0.1.1 — recall noise floor + end-to-end quickstart - github / github_release
- Midas v0.1.0 — the midas CLI, Memory Inspector & shared-by-default memor - github / github_release
- v0.0.4 - github / github_release
- Configuration risk requires verification - GitHub / issue
Source: Project Pack community evidence and pitfall evidence