# https://github.com/dfrostar/neuralmind Project Manual

Generated at: 2026-06-21 22:15:17 UTC

## Table of Contents

- [Overview and System Architecture](#page-1)
- [Core Features: Language Extractors, Synapse Memory, and Self-Improvement](#page-2)
- [Installation, Deployment, Backends, and Operations](#page-3)
- [Benchmarks, Evaluation, MCP Tools, and Public Methodology](#page-4)

<a id='page-1'></a>

## Overview and System Architecture

### Related Pages

Related topics: [Core Features: Language Extractors, Synapse Memory, and Self-Improvement](#page-2), [Installation, Deployment, Backends, and Operations](#page-3), [Benchmarks, Evaluation, MCP Tools, and Public Methodology](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [demo_data/sample_project/README.md](https://github.com/dfrostar/neuralmind/blob/main/demo_data/sample_project/README.md)
- [demo_data/sample_project/api/routes.py](https://github.com/dfrostar/neuralmind/blob/main/demo_data/sample_project/api/routes.py)
</details>

# Overview and System Architecture

## Purpose and Scope

NeuralMind is a semantic code retrieval and knowledge-graph system designed to compress, index, and query codebases. Its primary role is to provide fast, semantically meaningful search across a project's source code by combining static analysis (tree-sitter–based language extraction), embedding-based retrieval, and a graph store of cross-file relationships.

The project exposes both a CLI and an MCP (Model Context Protocol) server, allowing it to be embedded directly into AI workflows (e.g., live head-to-head benchmarks against `codebase-memory-mcp`). NeuralMind also publishes a public benchmark (`neuralmind benchmark --public`) for reproducible comparison against alternatives.

Source: [demo_data/sample_project/README.md:1-15]()

## High-Level Architecture

NeuralMind follows a three-stage pipeline: **extract → embed/index → retrieve**. Source files are parsed by language-specific extractors behind a tree-sitter seam, the resulting symbols and relationships are stored in a knowledge graph, and embeddings are written to a vector store for semantic retrieval.

The CLI orchestrates this pipeline via subcommands such as `neuralmind build <project>` and `neuralmind benchmark`. The MCP server (`mcp_server.py`) exposes the indexed knowledge graph and retrieval to external agents.

The fixture project's `README.md` documents the canonical regeneration flow, which mirrors the production pipeline:

```bash
pip install graphifyy
cd tests/fixtures/sample_project
graphify update .
cd ../../..
neuralmind build tests/fixtures/sample_project --force
```

Source: [demo_data/sample_project/README.md:21-29]()

```mermaid
flowchart LR
    A[Source Code] --> B[Tree-sitter Extractors]
    B --> C[Symbol + Edge IR]
    C --> D[Knowledge Graph]
    C --> E[Vector Store]
    D --> F[CLI / MCP Server]
    E --> F
    F --> G[Query / Agent]
```

## Knowledge Graph and Indexing

The system uses a graph representation (built via the `graphifyy` tool) to capture cross-file relationships — imports, function calls, route-to-handler bindings, and module dependencies. The fixture's `routes.py` illustrates the kind of wiring that the indexer must capture: HTTP routes reference handler functions across `auth`, `billing`, and `users` modules, requiring the graph to follow those edges to produce useful retrieval results.

Source: [demo_data/sample_project/api/routes.py:1-13](), [demo_data/sample_project/README.md:7-15]()

The `neuralmind build` command consumes this graph and the parsed symbols, writes embeddings, and produces the artifacts queried at runtime. CI regenerates these artifacts automatically; local regeneration is only needed for faster iteration.

Source: [demo_data/sample_project/README.md:21-29]()

## Language Coverage Behind the Tree-sitter Seam

NeuralMind uses tree-sitter as the canonical parsing backend. As of v0.37.0, eight languages are supported behind this seam:

| Version | Language Added | Reference |
|---|---|---|
| v0.27.0 | Rust | ([#245](https://github.com/dfrostar/neuralmind/issues/245)) |
| v0.28.0 | Java | ([#246](https://github.com/dfrostar/neuralmind/issues/246)) |
| v0.29.0 | ChromaDB-free default (turbovec/ONNX) | ([#251](https://github.com/dfrostar/neuralmind/issues/251)) |
| v0.32.0 | C, C++ | ([#257](https://github.com/dfrostar/neuralmind/issues/257)) |
| v0.37.0 | C# (eighth language) | ([#267](https://github.com/dfrostar/neuralmind/issues/267)) |

This breadth matters because the tree-sitter seam is the single integration point for new languages — adding an extractor does not require changes to the embedding, indexing, or retrieval layers.

## Retrieval and Learning Signal

NeuralMind's retrieval is hybrid: vector similarity over embeddings (default backend is `turbovec` / ONNX, ChromaDB-free since v0.29.0) plus graph-aware filtering from the knowledge store. A "synapse" layer acts as the single learning signal; the legacy `learned_patterns` reranker was retired in v0.25.0 in favor of this unified approach.

Source: ([v0.25.0 release notes](https://github.com/dfrostar/neuralmind/releases/tag/v0.25.0)), ([v0.29.0 release notes](https://github.com/dfrostar/neuralmind/releases/tag/v0.29.0))

The self-improvement engine (phases 1–2, v0.26.0) uses the synapse signal to auto-tune the selector stage, reducing manual configuration over time.

Source: ([v0.26.0 release notes](https://github.com/dfrostar/neuralmind/releases/tag/v0.26.0))

## Benchmarking and Community Use

The public benchmark (`neuralmind benchmark --public`, introduced in v0.31.0) is designed to be honest and reproducible — it runs NeuralMind head-to-head against alternatives on a fixed corpus. v0.33.0 added a live `codebase-memory-mcp` arm, and v0.34.0 added an opt-in LLM-judged answerability evaluation. These arms are useful when evaluating whether NeuralMind's retrieval quality justifies adopting it over a competing codebase-memory tool.

Source: ([v0.31.0 release notes](https://github.com/dfrostar/neuralmind/releases/tag/v0.31.0)), ([v0.33.0 release notes](https://github.com/dfrostar/neuralmind/releases/tag/v0.33.0)), ([v0.34.0 release notes](https://github.com/dfrostar/neuralmind/releases/tag/v0.34.0))

## Common Failure Modes

- **Stale graph after edits**: if you modify fixture (or real) source files without rerunning `graphify update` followed by `neuralmind build --force`, retrieval will return stale results. CI handles this automatically; local iteration requires manual regeneration.
- **Embedding backend mismatch**: switching between `turbovec`/ONNX and ChromaDB requires a full rebuild, since the two backends are not drop-in compatible (see v0.29.0).
- **Language extractor gaps**: pre-v0.27.0, languages like Rust and Java were unsupported behind the tree-sitter seam; queries against those files would silently fall back to generic parsing.

## See Also

- Public benchmark corpus and reproducibility notes: `neuralmind benchmark --public`
- Knowledge graph regeneration: `graphifyy` toolchain
- Synapse-based learning signal (replaces `learned_patterns` reranker as of v0.25.0)

---

<a id='page-2'></a>

## Core Features: Language Extractors, Synapse Memory, and Self-Improvement

### Related Pages

Related topics: [Overview and System Architecture](#page-1), [Benchmarks, Evaluation, MCP Tools, and Public Methodology](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [neuralmind/graphgen.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/graphgen.py)
- [neuralmind/synapses.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/synapses.py)
- [neuralmind/synapse_memory.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/synapse_memory.py)
- [neuralmind/namespaces.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/namespaces.py)
- [neuralmind/team_memory.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/team_memory.py)
- [neuralmind/self_improve.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/self_improve.py)
</details>

# Core Features: Language Extractors, Synapse Memory, and Self-Improvement

NeuralMind is a semantic code-graph retrieval system. Three subsystems form its core: a multi-language **extractor seam** that turns source code into a knowledge graph, a **synapse memory layer** that records how the graph was useful (or not), and a **self-improvement engine** that uses that signal to retune the retrieval selector automatically. Together they let NeuralMind index code, answer questions about it, and get measurably better over time without re-training a model.

This page summarizes how these three subsystems fit together, what they currently support, and where their boundaries lie.

## High-Level Architecture

The extractor ingests source files, the synapse layer records per-query feedback, and the self-improvement engine closes the loop by adjusting the retrieval selector.

```mermaid
flowchart LR
    A[Source Files] --> B[Language Extractors<br/>tree-sitter seam]
    B --> C[Knowledge Graph<br/>graphgen]
    C --> D[Retrieval Selector]
    D --> E[Answer]
    E --> F[Synapse Memory<br/>feedback signal]
    F --> G[Self-Improvement Engine<br/>selector auto-tuning]
    G --> D
```

## Language Extractors (the tree-sitter seam)

NeuralMind exposes a single "tree-sitter seam" — a uniform extractor interface — and ships multiple language implementations behind it. New languages are added by registering another extractor against the seam rather than by forking the rest of the pipeline.

| Version | Release | Language added | Source |
|---------|---------|----------------|--------|
| v0.27.0 | 2026-06-18 | Rust | [neuralmind/graphgen.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/graphgen.py) |
| v0.28.0 | 2026-06-18 | Java | [neuralmind/graphgen.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/graphgen.py) |
| v0.32.0 | 2026-06-19 | C and C++ | [neuralmind/graphgen.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/graphgen.py) |
| v0.37.0 | 2026-06-20 | C# (eighth language) | [neuralmind/graphgen.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/graphgen.py) |

The seam abstraction means that, after the initial Rust, Java, C, C++, and C# extractors landed, each subsequent language required only a new grammar binding rather than changes to the graph builder, selector, or memory layer. Each extractor is responsible for mapping its concrete syntax tree to the canonical node/edge representation consumed by [neuralmind/graphgen.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/graphgen.py).

For hermetic testing of the extractors, NeuralMind ships a fixture project under `demo_data/sample_project/` that exercises the cross-file relationships (auth, billing, users, API) without runtime dependencies.

## Synapse Memory

Synapse memory is NeuralMind's single learning signal. In v0.25.0 the prior `learned_patterns` reranker was retired in favor of the synapse layer, which now carries the full feedback loop on its own. Source: [neuralmind/synapse_memory.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/synapse_memory.py).

The synapse layer is responsible for:

- Recording which graph nodes contributed to a successful (or unsuccessful) answer.
- Storing per-query and aggregate signals in a way the self-improvement engine can consume.
- Cooperating with the namespace and team-memory layers so per-project, per-team, and global signals do not bleed into each other. Source: [neuralmind/namespaces.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/namespaces.py) and [neuralmind/team_memory.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/team_memory.py).
- Acting as the *only* learning signal — no parallel reranker competes with it. Source: [neuralmind/synapses.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/synapses.py).

This consolidation matters: there is exactly one feedback channel to inspect, version, and benchmark, which keeps the public benchmark (`neuralmind benchmark --public`, v0.31.0) honest and reproducible.

## Self-Improvement Engine

The self-improvement engine consumes the synapse signal and uses it to auto-tune the retrieval selector. Phases 1 and 2 of the engine landed in v0.26.0 and cover selector auto-tuning from the synapse signal. Source: [neuralmind/self_improve.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/self_improve.py).

In practice, the engine:

1. Reads recent synapse feedback.
2. Adjusts selector weights / thresholds that determine which graph nodes are surfaced for a given query.
3. Writes the adjusted selector back so subsequent retrievals benefit immediately — no model retraining, no manual intervention.

Because the engine only touches the selector (and not the extractor or graph shape), changes are bounded and reviewable. The public benchmark arm introduced in v0.31.0 provides a reproducible before/after measurement, and the opt-in LLM-judged answerability arm in v0.34.0 layers a quality signal on top of the existing metrics.

## Configuration and Extensibility

- **Default install is ChromaDB-free** (v0.29.0): the default embedding/vector path is `turbovec/ONNX`, with ChromaDB available as an opt-in for users who want it. This keeps the synapse memory and self-improvement loop light by default.
- **Adding a new language** is a matter of writing a new extractor against the tree-sitter seam; the rest of the pipeline inherits support automatically.
- **Team and namespace scoping** lets multiple teams share one NeuralMind deployment without their synapse signals cross-contaminating. Source: [neuralmind/team_memory.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/team_memory.py).

## Common Failure Modes

- **Mixed-language repos**: a repo that mixes, say, Rust and Python will only have those languages extracted if both extractors are enabled. The seam is per-language, not auto-detected beyond the file extension.
- **Stale synapse signal**: if the self-improvement engine is paused or its output is ignored, the selector will drift back toward its defaults; the synapse layer keeps the signal but the loop is broken.
- **Cross-team contamination**: forgetting to scope a query by namespace/team can let one project's feedback tune the selector for another. Use the namespace layer explicitly.

## See Also

- [neuralmind/graphgen.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/graphgen.py) — knowledge graph construction
- [neuralmind/synapses.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/synapses.py) — synapse primitives
- [neuralmind/synapse_memory.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/synapse_memory.py) — feedback memory
- [neuralmind/namespaces.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/namespaces.py) — namespace scoping
- [neuralmind/team_memory.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/team_memory.py) — team-scoped memory
- [neuralmind/self_improve.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/self_improve.py) — self-improvement engine
- [Release v0.37.0](https://github.com/dfrostar/neuralmind/releases/tag/v0.37.0) — latest C# extractor release
- [Release v0.26.0](https://github.com/dfrostar/neuralmind/releases/tag/v0.26.0) — self-improvement engine phases 1-2
- [Release v0.25.0](https://github.com/dfrostar/neuralmind/releases/tag/v0.25.0) — synapse layer becomes the single learning signal

---

<a id='page-3'></a>

## Installation, Deployment, Backends, and Operations

### Related Pages

Related topics: [Overview and System Architecture](#page-1), [Core Features: Language Extractors, Synapse Memory, and Self-Improvement](#page-2)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [pyproject.toml](https://github.com/dfrostar/neuralmind/blob/main/pyproject.toml)
- [Dockerfile](https://github.com/dfrostar/neuralmind/blob/main/Dockerfile)
- [neuralmind/config.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/config.py)
- [neuralmind/doctor.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/doctor.py)
- [neuralmind/embedding_backend.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/embedding_backend.py)
- [neuralmind/turbovec_backend.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/turbovec_backend.py)
- [neuralmind/cli.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/cli.py)
- [neuralmind/build.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/build.py)
- [neuralmind/demo_data/sample_project/README.md](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/demo_data/sample_project/README.md)
</details>

# Installation, Deployment, Backends, and Operations

## Overview

NeuralMind is distributed as a standard Python package and is designed to be operated as a local-first code-intelligence service with a small set of swappable backends. The installation, deployment, and operations surface is intentionally narrow: a `pip install` of the package, an optional container image, a `neuralmind build` indexer, a `neuralmind doctor` health check, and a family of language-aware and embedding backends that can be selected without changing application code. The page below documents each of these surfaces based on the package metadata, the build/doctor CLI entry points, and the backend modules exposed by the project.

## Installation

The package is published as a normal Python distribution declared in `pyproject.toml`. Source: [pyproject.toml:1-40](). Because the project's design splits "always-on" dependencies from optional backend stacks, two install profiles exist:

| Profile | Command | Purpose |
| --- | --- | --- |
| Default (ChromaDB-free) | `pip install neuralmind` | Ships with the built-in tree-sitter parser and the `turbovec` / ONNX embedding backend introduced in v0.29.0. Source: [v0.29.0 release notes](https://github.com/dfrostar/neuralmind/releases/tag/v0.29.0). |
| Full / legacy | `pip install neuralmind[chroma]` | Restores the ChromaDB vector store for users migrating from earlier releases. |
| Graph regeneration tooling | `pip install graphifyy` | Required only when rebuilding the knowledge graph fixture used by the benchmark. Source: [neuralmind/demo_data/sample_project/README.md:18-26](). |

The default install is intentionally lightweight: a ChromaDB-free stack keeps cold-start time low and avoids requiring a separate vector-store server in CI. Source: [v0.29.0 release notes](https://github.com/dfrostar/neuralmind/releases/tag/v0.29.0).

## Backends

NeuralMind organises its pluggable subsystems around two seams: language extractors and embedding stores. Both are reached through thin abstractions so that adding a backend does not require touching the CLI or the indexing pipeline.

### Language extractors (tree-sitter seam)

The built-in tree-sitter backend has grown incrementally across releases and now covers eight languages. Source: [v0.37.0 release notes](https://github.com/dfrostar/neuralmind/releases/tag/v0.37.0). The currently supported languages, with the release that introduced built-in support for each, are:

- Rust — v0.27.0. Source: [v0.27.0 release notes](https://github.com/dfrostar/neuralmind/releases/tag/v0.27.0).
- Java — v0.28.0. Source: [v0.28.0 release notes](https://github.com/dfrostar/neuralmind/releases/tag/v0.28.0).
- C and C++ — v0.32.0. Source: [v0.32.0 release notes](https://github.com/dfrostar/neuralmind/releases/tag/v0.32.0).
- C# — v0.37.0 (eighth language). Source: [v0.37.0 release notes](https://github.com/dfrostar/neuralmind/releases/tag/v0.37.0).

Earlier releases established the seam itself (the Python, TypeScript, Go, and Ruby extractors), so the tree-sitter backend is now the single language-loading path for the CLI.

### Embedding backends

Two concrete embedding backends ship with the package:

- `neuralmind/turbovec_backend.py` — the default ONNX-based embedding backend, active out of the box since v0.29.0. Source: [neuralmind/turbovec_backend.py:1-40]().
- `neuralmind/embedding_backend.py` — the abstraction that selects between the turbovec path and a ChromaDB-backed path at runtime, based on configuration. Source: [neuralmind/embedding_backend.py:1-40]().

Selection between the two is driven by `neuralmind/config.py`, which reads optional dependency presence and environment variables rather than requiring an explicit `--backend` flag for common cases. Source: [neuralmind/config.py:1-60]().

```mermaid
flowchart LR
    A[neuralmind build / query] --> B{config.py}
    B -->|ChromaDB installed| C[embedding_backend.py<br/>ChromaDB path]
    B -->|Default| D[turbovec_backend.py<br/>ONNX]
    A --> E[Tree-sitter extractor]
    E --> E1[Python]
    E --> E2[TS / JS]
    E --> E3[Go]
    E --> E4[Ruby]
    E --> E5[Rust v0.27]
    E --> E6[Java v0.28]
    E --> E7[C / C++ v0.32]
    E --> E8[C# v0.37]
```

## Deployment

The repository ships a `Dockerfile` that wraps the default pip-install profile and exposes the CLI as the container entry point. Source: [Dockerfile:1-40](). Typical local deployment is:

```bash
docker build -t neuralmind .
docker run --rm -v "$PWD:/repo" neuralmind build /repo --force
```

For containerised deployments, the only required mount is the source tree being indexed; no external services are required for the default (ChromaDB-free) profile. Source: [v0.29.0 release notes](https://github.com/dfrostar/neuralmind/releases/tag/v0.29.0).

The indexer is invoked via `neuralmind build <path> --force`, which re-derives the knowledge graph from source. When working with the bundled benchmark fixture, the `graphifyy` tool must regenerate the graph before `neuralmind build` is rerun. Source: [neuralmind/demo_data/sample_project/README.md:18-30]().

## Operations

A small set of operational concerns are handled explicitly by the project:

- **UTF-8 I/O.** Stdout and stderr are forced to UTF-8 in v0.27.0 so that non-ASCII identifiers and log messages render correctly on Windows and minimal containers. Source: [v0.27.0 release notes](https://github.com/dfrostar/neuralmind/releases/tag/v0.27.0).
- **Health check.** `neuralmind doctor` reports installed backend versions, available tree-sitter grammars, and the active embedding backend, making it the recommended first step when troubleshooting a fresh install. Source: [neuralmind/doctor.py:1-60]().
- **Self-improvement.** Phases 1–2 of the self-improvement engine auto-tune the selector from the synapse signal; deployments that enable this should expect occasional re-indexing behaviour. Source: [v0.26.0 release notes](https://github.com/dfrostar/neuralmind/releases/tag/v0.26.0).

Common failure modes to check first when something "doesn't work after install": missing optional dependency for the ChromaDB path, a non-UTF-8 locale on the host (now mitigated), or a stale knowledge graph that needs `graphify update` followed by `neuralmind build --force`. Source: [neuralmind/demo_data/sample_project/README.md:18-30]().

## See Also

- [Public Benchmark and Reproducibility]()
- [Self-Improvement Engine and Synapse Signal]()
- [Language Extractors and Tree-sitter Backend]()

---

<a id='page-4'></a>

## Benchmarks, Evaluation, MCP Tools, and Public Methodology

### Related Pages

Related topics: [Overview and System Architecture](#page-1), [Core Features: Language Extractors, Synapse Memory, and Self-Improvement](#page-2), [Installation, Deployment, Backends, and Operations](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [benchmark.py](https://github.com/dfrostar/neuralmind/blob/main/benchmark.py)
- [benchmark_turbovec.py](https://github.com/dfrostar/neuralmind/blob/main/benchmark_turbovec.py)
- [report_turbovec.py](https://github.com/dfrostar/neuralmind/blob/main/report_turbovec.py)
- [BENCHMARK_TURBOVEC.md](https://github.com/dfrostar/neuralmind/blob/main/BENCHMARK_TURBOVEC.md)
- [neuralmind/quality.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/quality.py)
- [neuralmind/precision.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/precision.py)
- [neuralmind/demo_data/sample_project/README.md](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/demo_data/sample_project/README.md)
- [neuralmind/demo_data/sample_project/api/routes.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/demo_data/sample_project/api/routes.py)
</details>

# Benchmarks, Evaluation, MCP Tools, and Public Methodology

## Overview

NeuralMind ships an end-to-end evaluation story that is intentionally **public, reproducible, and comparative**. The intent is to make every claim about retrieval quality, compression, or speed falsifiable on a fixed corpus with a fixed procedure. Three pillars make this possible:

1. A reproducible CLI command — `neuralmind benchmark --public` — added in v0.31.0 ([#254](https://github.com/dfrostar/neuralmind/issues/254)) that runs NeuralMind head-to-head against alternative tools and writes a machine-readable report.
2. A growing library of **MCP (Model Context Protocol) tools** that let external agents (including alternative indexers) participate in the same benchmark protocol, enabling live head-to-head comparisons (v0.33.0, [#259](https://github.com/dfrostar/neuralmind/issues/259)).
3. An opt-in **LLM-judged answerability** arm (v0.34.0, [#264](https://github.com/dfrostar/neuralmind/issues/264)) that augments the deterministic score with a qualitative judgement of whether an answer would actually help a developer.

A hermetic fixture, `neuralmind/demo_data/sample_project`, anchors all runs so that CI, local developers, and external reviewers see exactly the same codebase.

## The Public Benchmark Command

The canonical entry point is the `--public` flag on the benchmark subcommand, introduced in v0.31.0. The CLI is implemented across the project root modules and is described in source as an "honest, reproducible benchmark vs alternatives" — meaning the run is designed so that a competing system can be plugged in without modification of the harness ([commit f8eca9b](https://github.com/dfrostar/neuralmind/commit/f8eca9bd7a651941f4f6d55f16c31d446339fdf2)). The Turbovec-flavored companion harness lives in `benchmark_turbovec.py` and writes reports through `report_turbovec.py`, which is the default backend as of v0.29.0 when ChromaDB is no longer installed by default.

Conceptually the public benchmark performs four stages:

```mermaid
flowchart LR
    A[Fixture Corpus<br/>sample_project] --> B[NeuralMind Index]
    A --> C[Alternative Index<br/>via MCP tool]
    B --> D[Deterministic Scorer<br/>quality.py / precision.py]
    C --> D
    D --> E[Opt-in LLM Judge<br/>answerability arm]
    D --> F[report_turbovec.py<br/>public report]
    E --> F
```

A typical invocation rebuilds the fixture's knowledge graph, re-indexes NeuralMind against it, runs the deterministic scorer over a question set, and (if opted in) queries an LLM to grade the surfaced answers for answerability.

## MCP Tools and Head-to-Head Protocol

The MCP tooling is the integration surface that lets NeuralMind compete against external codebase-memory servers on equal footing. With v0.33.0 the public benchmark includes a live head-to-head mode in which the harness speaks to a `codebase-memory-mcp` server and compares results against NeuralMind's own index ([#259](https://github.com/dfrostar/neuralmind/issues/259)).

Because MCP is a standardized protocol, alternative tools do not need a NeuralMind-specific adapter — they simply need to expose the same `query` and (optionally) `index` operations. This is what makes the comparison honest: the harness never reaches inside the competitor; it talks to it as an MCP client would.

The fixture project is designed to exercise this protocol meaningfully. As noted in the fixture README, it is "large enough to have cross-file relationships (auth depends on users + JWT, billing depends on users, api wires everything together)" while remaining small enough that every CI run stays fast (~500 lines across ~10 files) ([neuralmind/demo_data/sample_project/README.md](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/demo_data/sample_project/README.md)).

## Evaluation Layers

NeuralMind separates evaluation into two layers so that the deterministic and qualitative scores are never confused.

### Deterministic Layer

The deterministic layer lives in `neuralmind/quality.py` and `neuralmind/precision.py`. It computes objective metrics over the top-k results returned by each indexed system: whether the expected files/symbols are present, whether the ranking order is preserved, and how compact the returned context is relative to the gold answer. These metrics are reproducible byte-for-byte across runs because the corpus is static.

### LLM-Judged Answerability Layer

Added in v0.34.0 ([#264](https://github.com/dfrostar/neuralmind/issues/264)), this opt-in layer takes the top results from each system and asks an LLM whether the surfaced context would let a developer answer the original question. The arm is deliberately **opt-in** so that the default public reports stay free of stochastic noise and so that a reviewer can compare deterministic numbers without needing API credentials.

The combination of these two layers is what the README of the public benchmark refers to as "honest": a reviewer can either trust only the deterministic numbers, or — if they have credentials and a tolerance for variance — opt into the qualitative judgement.

## The Fixture Corpus and Why It Matters

All public runs operate on `neuralmind/demo_data/sample_project`. The fixture mimics a small but realistic Python web application with five cohesive packages: `auth`, `billing`, `users`, `api`, and `db`. The `api/routes.py` module alone demonstrates the cross-file relationships that the benchmark exists to measure: routes import from `..auth.handlers`, `..billing.invoices`, `..billing.stripe_client`, and `..users.crud`, then wire them to HTTP verbs and paths ([neuralmind/demo_data/sample_project/api/routes.py](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/demo_data/sample_project/api/routes.py)).

A retrieval system that only matches on file names will perform poorly here; a system that understands the import graph and the `verify_session` → `get_user` call chain across `auth` and `users` will perform well. That asymmetry is precisely what the public benchmark is designed to surface.

To regenerate the knowledge graph after editing the fixture locally:

```bash
pip install graphifyy
cd tests/fixtures/sample_project
graphify update .
cd ../../..
neuralmind build tests/fixtures/sample_project --force
```

CI runs the same sequence automatically, so contributors do not need to publish a new graph for their PR to be benchmarked.

## Language Coverage and the Tree-Sitter Seam

The benchmark's fairness depends on the fixture staying representative as the project grows new extractors. Recent releases have widened the built-in tree-sitter backend: Rust in v0.27.0 ([#245](https://github.com/dfrostar/neuralmind/issues/245)), Java in v0.28.0 ([#246](https://github.com/dfrostar/neuralmind/issues/246)), C and C++ in v0.32.0 ([#257](https://github.com/dfrostar/neuralmind/issues/257)), and C# in v0.37.0 ([#267](https://github.com/dfrostar/neuralmind/issues/267)). Because all of these plug in behind a single tree-sitter seam, the benchmark harness does not need per-language branching — it indexes the fixture (or any contributed corpus) through the same extractor interface.

## Common Failure Modes

| Symptom | Likely cause | Mitigation |
| --- | --- | --- |
| Report shows stale metrics after editing the fixture | Local knowledge graph not regenerated | Re-run `graphify update` + `neuralmind build … --force` |
| MCP head-to-head returns identical scores for both systems | Competitor MCP server not started before the harness | Start the server and confirm the `codebase-memory-mcp` endpoint is reachable |
| Deterministic and LLM-judged numbers disagree strongly | LLM judge is hallucinating file paths from context | Re-run with deterministic only; inspect top-k for spurious matches |

## See Also

- [NeuralMind Benchmark Fixture](https://github.com/dfrostar/neuralmind/blob/main/neuralmind/demo_data/sample_project/README.md)
- [Public benchmark announcement (v0.31.0)](https://github.com/dfrostar/neuralmind/releases/tag/v0.31.0)
- [Live codebase-memory-mcp head-to-head (v0.33.0)](https://github.com/dfrostar/neuralmind/releases/tag/v0.33.0)
- [LLM-judged answerability arm (v0.34.0)](https://github.com/dfrostar/neuralmind/releases/tag/v0.34.0)
- [C# extractor added (v0.37.0)](https://github.com/dfrostar/neuralmind/releases/tag/v0.37.0)

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: dfrostar/neuralmind

Summary: Found 7 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Configuration risk - Configuration risk requires verification.

## 1. Configuration risk - Configuration risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.host_targets | https://github.com/dfrostar/neuralmind

## 2. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/dfrostar/neuralmind

## 3. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/dfrostar/neuralmind

## 4. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/dfrostar/neuralmind

## 5. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/dfrostar/neuralmind

## 6. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/dfrostar/neuralmind

## 7. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/dfrostar/neuralmind

<!-- canonical_name: dfrostar/neuralmind; human_manual_source: deepwiki_human_wiki -->
