# https://github.com/tobi/qmd Project Manual

Generated at: 2026-06-22 14:52:59 UTC

## Table of Contents

- [Overview and Getting Started](#page-1)
- [Architecture, Search Pipeline, and AI Model Integration](#page-2)
- [CLI Commands, SDK Usage, and MCP Server](#page-3)
- [Deployment, Cross-Platform Support, and Extensibility](#page-4)

<a id='page-1'></a>

## Overview and Getting Started

### Related Pages

Related topics: [Architecture, Search Pipeline, and AI Model Integration](#page-2), [CLI Commands, SDK Usage, and MCP Server](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [package.json](https://github.com/tobi/qmd/blob/main/package.json)
- [src/mcp/server.ts](https://github.com/tobi/qmd/blob/main/src/mcp/server.ts)
- [src/cli/qmd.ts](https://github.com/tobi/qmd/blob/main/src/cli/qmd.ts)
- [src/cli/formatter.ts](https://github.com/tobi/qmd/blob/main/src/cli/formatter.ts)
- [finetune/README.md](https://github.com/tobi/qmd/blob/main/finetune/README.md)
- [src/bench/score.ts](https://github.com/tobi/qmd/blob/main/src/bench/score.ts)
- [src/bench/types.ts](https://github.com/tobi/qmd/blob/main/src/bench/types.ts)
</details>

# Overview and Getting Started

QMD ("Query Markup Documents") is an on-device hybrid search engine for markdown files. It combines BM25 keyword search, vector similarity, HyDE (Hypothetical Document Embeddings), and LLM-based reranking to retrieve relevant passages from local note collections. As stated in the package metadata, QMD provides "On-device hybrid search for markdown files with BM25, vector search, and LLM reranking." Source: [package.json](https://github.com/tobi/qmd/blob/main/package.json)

The current release is **v2.5.3**, distributed as the `@tobilu/qmd` npm package with a `qmd` CLI binary. Source: [package.json](https://github.com/tobi/qmd/blob/main/package.json)

## What QMD Does

QMD indexes a directory of markdown files into a local SQLite database (via `better-sqlite3` and `sqlite-vec`), then exposes two interaction surfaces:

1. A **CLI** (`bin/qmd`) for interactive terminal search and document retrieval.
2. An **MCP server** that registers QMD's search and document-fetch tools with any Model Context Protocol–compatible client (Claude Desktop, IDE plugins, etc.). Source: [src/mcp/server.ts](https://github.com/tobi/qmd/blob/main/src/mcp/server.ts)

The project is written in TypeScript and targets Node.js or Bun runtimes. Core dependencies include `node-llama-cpp` for running local GGUF models, `tree-sitter-*` for code-aware chunking, and `zod` for tool-input validation. Source: [package.json](https://github.com/tobi/qmd/blob/main/package.json)

## Architecture

QMD's runtime is organized around a central `QMDStore` that wraps the SQLite database and exposes a unified API for indexing, querying, and retrieval. The CLI and MCP server both consume this store.

```mermaid
flowchart LR
    User[User / AI Assistant] -->|CLI| CLI[qmd CLI]
    User -->|MCP protocol| MCP[MCP Server]
    CLI --> Store[QMDStore<br/>SQLite + sqlite-vec]
    MCP --> Store
    Store --> Lex[BM25 lex]
    Store --> Vec[Vector vec]
    Store --> Hyde[HyDE]
    Store --> Rerank[LLM Rerank]
    Store -->|reads/writes| FS[(Markdown files)]
```

The MCP server registers four primary tools — `query`, `get`, `multi_get`, and `status` — and exposes individual documents as MCP resources via `qmd://{+path}` URIs. Source: [src/mcp/server.ts](https://github.com/tobi/qmd/blob/main/src/mcp/server.ts) It supports both `stdio` and Streamable HTTP transports; the HTTP variant binds to localhost only. Source: [src/mcp/server.ts](https://github.com/tobi/qmd/blob/main/src/mcp/server.ts)

## Core Capabilities

### Hybrid Search Grammar

The CLI accepts a typed query document: each line prefixed with `lex:`, `vec:`, or `hyde:`. The grammar is explicitly enforced and mirrors `docs/SYNTAX.md`:

```text
query_document = [ intent_line ] { typed_line }
intent_line    = "intent:" text newline
typed_line     = type ":" text newline
type           = "lex" | "vec" | "hyde"
```

Source: [src/cli/qmd.ts](https://github.com/tobi/qmd/blob/main/src/cli/qmd.ts)

The first sub-search receives a 2× weight when results are fused. Source: [src/mcp/server.ts](https://github.com/tobi/qmd/blob/main/src/mcp/server.ts)

### Query Expansion via Fine-Tuned Model

A small Qwen3-1.7B model (fine-tuned via LoRA, SFT-only in production) expands raw user queries into structured `lex:`/`vec:`/`hyde:` lines before retrieval. The training pipeline lives in `finetune/` and achieves 92.0% average score on the held-out evaluation set. Source: [finetune/README.md](https://github.com/tobi/qmd/blob/main/finetune/README.md)

### Output Formats

Search results and documents can be rendered as `cli`, `json`, `csv`, `xml`, `md`, or `files`. The CLI defaults to a human-readable layout; `qmd get` and `qmd multi-get` are line-numbered by default and print the `#docid` and `qmd://` URI. Source: [src/cli/formatter.ts](https://github.com/tobi/qmd/blob/main/src/cli/formatter.ts)

### Line-Range Suffix

`qmd get` accepts a `:from:count` suffix on a path or docid (e.g. `qmd get "#abc123:120:40"` reads 40 lines starting at line 120). Explicit `--from` / `-l` flags still override the suffix. The MCP `get` tool accepts the same syntax. Source: [src/mcp/server.ts](https://github.com/tobi/qmd/blob/main/src/mcp/server.ts)

### Benchmarking

A dedicated benchmark harness (`src/bench/`) measures precision@k, recall, MRR, and F1 across multiple search backends against fixture queries, enabling regression tracking for retrieval quality. Source: [src/bench/score.ts](https://github.com/tobi/qmd/blob/main/src/bench/score.ts), [src/bench/types.ts](https://github.com/tobi/qmd/blob/main/src/bench/types.ts)

## Getting Started

### Installation

```bash
# Install via npm
npm install -g @tobilu/qmd

# Or run directly with Bun (also supported)
bun add -g @tobilu/qmd
```

The CLI binary is published as `qmd` and the package ships pre-built `dist/` plus `skills/` resources. Source: [package.json](https://github.com/tobi/qmd/blob/main/package.json)

### First-Run Workflow

```bash
# 1. Index a directory of markdown files
qmd index ~/notes

# 2. Generate embeddings (required for vec/hyde search)
qmd embed

# 3. Query — natural language (auto-expanded) or typed grammar
qmd query "how does authentication work"
qmd query $'lex: "connection pool"\nvec: database connection timeout'

# 4. Retrieve a specific document or docid
qmd get "#abc123:100:50"

# 5. Start the MCP server for an AI assistant
qmd mcp
```

### Configuration

QMD reads collection definitions and per-collection context from a config file resolved via `getConfigPath()`. If the config exists, the MCP server loads it at startup; otherwise the default database path is used. Source: [src/mcp/server.ts](https://github.com/tobi/qmd/blob/main/src/mcp/server.ts)

## Known Limitations

Several open issues in the community tracker document constraints relevant to new users:

- **Windows compatibility**: Multiple Unix-specific assumptions (path handling, process signals) block usage on Windows 11 with Bun 1.3.6. Source: community issue #31
- **Local-only models**: Embeddings, generation, and reranking are wired through `node-llama-cpp` and local GGUF files; there is no first-class support for OpenAI-compatible or LiteLLM-routed API providers. Source: community issues #114, #521, #620
- **Maintenance status**: A community thread has raised concerns about project maintenance, citing a large open PR/issue backlog. Source: community issue #516

Users on Windows or those needing remote-model backends should evaluate these constraints before adopting QMD in production pipelines.

## See Also

- Query Syntax & Grammar
- MCP Server Tools Reference
- Embedding & Reranking Configuration
- Benchmark Harness Guide

---

<a id='page-2'></a>

## Architecture, Search Pipeline, and AI Model Integration

### Related Pages

Related topics: [Overview and Getting Started](#page-1), [Deployment, Cross-Platform Support, and Extensibility](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [src/mcp/server.ts](https://github.com/tobi/qmd/blob/main/src/mcp/server.ts)
- [src/cli/qmd.ts](https://github.com/tobi/qmd/blob/main/src/cli/qmd.ts)
- [src/cli/formatter.ts](https://github.com/tobi/qmd/blob/main/src/cli/formatter.ts)
- [src/bench/bench.ts](https://github.com/tobi/qmd/blob/main/src/bench/bench.ts)
- [src/bench/score.ts](https://github.com/tobi/qmd/blob/main/src/bench/score.ts)
- [src/bench/types.ts](https://github.com/tobi/qmd/blob/main/src/bench/types.ts)
- [package.json](https://github.com/tobi/qmd/blob/main/package.json)
- [finetune/README.md](https://github.com/tobi/qmd/blob/main/finetune/README.md)
</details>

# Architecture, Search Pipeline, and AI Model Integration

## Overview

QMD ("Query Markup Documents") is an on-device hybrid search engine for local markdown collections. The package, published as `@tobilu/qmd` v2.5.3, combines BM25 full-text search, vector similarity, and LLM-based reranking into a single retrieval pipeline that runs entirely on the user's machine. The system is exposed both as a command-line tool (`qmd`) and as a Model Context Protocol (MCP) server, the latter making the knowledge base directly consumable by AI agents. Source: [package.json:1-20]()

The architecture is intentionally local-first. Embedding, generation, and reranking models are loaded through `node-llama-cpp`, which is pinned at `3.18.1` and ships with optional native binaries for `darwin-arm64`, `darwin-x64`, `linux-arm64`, `linux-x64`, and `windows-x64`. Vectors are stored using `sqlite-vec` (`0.1.9`) on top of `better-sqlite3` (`12.10.0`). Source: [package.json:60-78]()

## System Architecture

The codebase is organised around three layers: a CLI surface, an MCP server surface, and a shared store that owns the SQLite database, FTS indexes, vector indexes, and the llama.cpp model session.

```mermaid
flowchart LR
    User[User / Agent] --> CLI[qmd CLI<br/>src/cli/qmd.ts]
    User --> MCP[MCP Server<br/>src/mcp/server.ts]
    CLI --> Store[QMDStore<br/>src/store.ts]
    MCP --> Store
    Store --> SQLite[(SQLite<br/>+ sqlite-vec)]
    Store --> LLama[node-llama-cpp<br/>local GGUF]
    Store --> FTS[BM25 FTS5 Index]
    MCP -->|qmd:// URI| Client[MCP Client]
```

The MCP server (`src/mcp/server.ts`) registers the `qmd://{+path}` resource template, decoding URL-encoded paths and delegating to `store.get(...)`, which returns the markdown body with line numbers by default. Source: [src/mcp/server.ts:42-65]() It also exposes a `query` tool that accepts an array of typed sub-queries (`lex`, `vec`, `hyde`) and an optional `intent` string. The first sub-query is given a 2× weight, and the schema is validated with `zod`. Source: [src/mcp/server.ts:80-160]()

The store layer is the single source of truth: it creates collections, runs hybrid search, manages the llama.cpp session, and exposes typed result objects (`SearchResult`, `MultiGetResult`, `DocumentResult`) that the formatter then renders as CLI, JSON, CSV, XML, or Markdown. Source: [src/cli/formatter.ts:1-40]()

## Search Pipeline

QMD implements a three-stage hybrid retrieval pipeline:

1. **BM25 lexical search** over a SQLite FTS5 index. Supports quoted phrases, `-negation` operators, and is fast enough for keyword lookups. Source: [src/mcp/server.ts:70-78]()
2. **Vector similarity search** against chunked document embeddings. Each chunk is embedded and stored as a fixed-dimension float vector; queries are embedded at runtime and matched by cosine distance. Source: [src/bench/bench.ts:10-25]()
3. **HyDE (Hypothetical Document Embeddings)**: an LLM writes a passage that *looks like* a plausible answer, which is then embedded and used as a vector query. This is most useful for semantic or cross-domain recall where vocabulary mismatch is high. Source: [finetune/README.md:8-18]()

A small Qwen3-1.7B model fine-tuned via LoRA generates the structured `lex:` / `vec:` / `hyde:` expansions from a raw query. Training is two-stage: SFT on ~2,290 examples for 5 epochs at LR `2e-4` (cosine schedule, LoRA r=16, α=32, effective batch size 16), optionally followed by GRPO. Source: [finetune/README.md:48-62]() The expansion grammar is enforced both in the CLI grammar and in the MCP tool schema, so clients always send typed sub-queries. Source: [src/cli/qmd.ts:220-240]()

Results are merged with reciprocal-rank fusion, optionally reranked by a local cross-encoder, and finally post-processed by `extractSnippet(...)` to produce a 300-character window anchored on the best chunk. Source: [src/cli/qmd.ts:130-160]()

## AI Model Integration

All models run locally through `node-llama-cpp`. The CLI exposes helpers such as `disposeDefaultLlamaCpp`, `getDefaultLlamaCpp`, `setDefaultLlamaCpp`, `pullModels`, and resolvers for embed / generate / rerank roles (`resolveEmbedModel`, `resolveGenerateModel`, `resolveRerankModel`, `resolveModels`). GGUF files can be inspected via `inspectGgufFile`, and Metal acceleration is detected with `isDarwinMetal`. Source: [src/cli/qmd.ts:60-80]()

The default models are exported as constants: `DEFAULT_EMBED_MODEL`, `DEFAULT_RERANK_MODEL`, `DEFAULT_QUERY_MODEL`. Embedding runs are batched with `DEFAULT_EMBED_MAX_BATCH_BYTES` and `DEFAULT_EMBED_MAX_DOCS_PER_BATCH`, and indexed tokens are produced by `chunkDocumentByTokens(...)`. Source: [src/cli/qmd.ts:30-50]()

A status check on the MCP `query` tool warns the agent when no vector index exists or when `needsEmbedding > 0`, prompting a `qmd embed` run. Source: [src/mcp/server.ts:120-140]() The benchmark harness exercises the pipeline end-to-end across `bm25`, `vector`, `hybrid`, and `full` backends, reporting precision@k, recall, MRR, F1, and latency. Source: [src/bench/types.ts:1-40](), [src/bench/score.ts:1-40]()

## Known Limitations and Community Demand

The strict local-only posture is the project's defining feature and its biggest source of community friction. Issue #114 requests LiteLLM Client SDK support for Ollama, LM Studio, OpenRouter, DeepInfra, and LiteLLM Proxy across embeddings, LLM generation, OCR, and reranking. Issue #521 asks specifically for OpenAI-compatible / Ollama-compatible embedding endpoints, and issue #620 frames the same ask for all three roles. Source: Community Issues #114, #521, #620

Windows compatibility is the other recurring pain point (#31): Unix-specific assumptions in shebangs, build scripts, and the `optionalDependencies` matrix complicate installation on Windows 11 even with Bun 1.3.6, despite the presence of a `sqlite-vec-windows-x64` binary. Source: [package.json:68-77](), Community Issue #31

Together, these signals suggest the most natural next architectural step is a pluggable model provider layer that keeps the local-first default but lets advanced users route any of the three roles — embed, generate, rerank — to an OpenAI-compatible endpoint, while the core BM25 + sqlite-vec + FTS5 stack continues to work offline.

## See Also

- [Query Syntax and Search Grammar](https://github.com/tobi/qmd/blob/main/docs/SYNTAX.md)
- [Collections and Configuration](https://github.com/tobi/qmd/blob/main/docs/COLLECTIONS.md)
- [MCP Server Reference](https://github.com/tobi/qmd/blob/main/src/mcp/server.ts)
- [Benchmark Harness](https://github.com/tobi/qmd/blob/main/src/bench/bench.ts)

---

<a id='page-3'></a>

## CLI Commands, SDK Usage, and MCP Server

### Related Pages

Related topics: [Overview and Getting Started](#page-1), [Deployment, Cross-Platform Support, and Extensibility](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [src/cli/qmd.ts](https://github.com/tobi/qmd/blob/main/src/cli/qmd.ts)
- [src/cli/formatter.ts](https://github.com/tobi/qmd/blob/main/src/cli/formatter.ts)
- [src/mcp/server.ts](https://github.com/tobi/qmd/blob/main/src/mcp/server.ts)
- [package.json](https://github.com/tobi/qmd/blob/main/package.json)
- [finetune/README.md](https://github.com/tobi/qmd/blob/main/finetune/README.md)
- [src/bench/score.ts](https://github.com/tobi/qmd/blob/main/src/bench/score.ts)
</details>

# CLI Commands, SDK Usage, and MCP Server

## Overview

QMD ("Query Markup Documents") exposes a single on-device hybrid search engine for markdown collections. The project ships three primary integration surfaces: a command-line interface (CLI), a programmatic SDK consumable from Node/Bun, and a Model Context Protocol (MCP) server for tool-aware agents. All three share a single core module (`src/store.ts` and friends) that wraps `better-sqlite3`, `sqlite-vec`, and `node-llama-cpp` for BM25, vector, and reranking workloads. Source: [package.json:2-4]().

The CLI is the canonical user interface, the SDK is used by tests and integrations, and the MCP server is the surface most LLM-driven workflows depend on. They are designed to be functionally equivalent — anything the CLI can do, an MCP tool can do, and the same store types appear in the SDK.

## CLI Command Surface

The CLI entry point is `src/cli/qmd.ts`. When the module is invoked as the main script it calls `enableProductionMode()` so that store defaults resolve to the user-level database path, and then dispatches to a subcommand parser. Source: [src/cli/qmd.ts]() (the `isMain` block at the bottom of the file).

### Query grammar

The `query` subcommand accepts a small formal grammar. Source: [src/cli/qmd.ts]() (the `grammar` array printed by `--help`):

```
query          = expand_query | query_document
expand_query   = text | "expand:" text
query_document = [ "intent:" text newline ] { type ":" text newline }
type           = "lex" | "vec" | "hyde"
text           = quoted_phrase | plain_text
```

A bare natural-language string is treated as an implicit `expand:` call: the server runs the default query-expansion model to produce a typed query document. A multi-line query document lets the caller hand-pick `lex:`, `vec:`, and `hyde:` lines and optionally prefix an `intent:` line for snippet disambiguation. Quoted phrases and `-negation` are supported in `lex:` lines.

Typical invocations printed by the CLI help:

```bash
qmd query "how does auth work"
qmd query $'lex: CAP theorem\nvec: consistency'
qmd query $'lex: "exact matches" sports -baseball'
qmd query $'hyde: Hypothetical answer text'
```

### Retrieval and line-range suffix

`qmd get <path>` and `qmd multi-get <pattern>` retrieve full documents. As of v2.5.3, `get` accepts a `:from:count` suffix on a path or `#docid`, e.g. `qmd get "#abc123:120:40"` returns 40 lines starting at line 120. Explicit `--from` / `-l` flags still override the suffix. The MCP `get` tool accepts the same suffix. Both `get` and `multi-get` print line-numbered output by default along with the document's `#docid` and `qmd://` URI. Source: changelog excerpt for v2.5.3 in the community context; mirror of the same logic is implemented in [src/cli/qmd.ts]() and the MCP server in [src/mcp/server.ts]().

### Other subcommands

`status` reports collection health, `embed` (re)builds the vector index, `pull` downloads GGUF models, and configuration is read from `qmd.yml` / `~/.config/qmd/config.yml` via `getConfigPath`. Source: [src/cli/qmd.ts]() (the imports list) and [src/mcp/server.ts]() (which imports `getConfigPath` and `getDefaultDbPath`).

## SDK Usage

The SDK is the same module the CLI imports. Its main entry is `dist/index.js` (built from `src/index.ts`) and is published as `@tobilu/qmd` per [package.json:2-9](). Programmatic consumers create a `QMDStore` via `createStore(path)`, then call methods such as `store.search()`, `store.get(path, { includeBody: true })`, `store.status()`, and `store.embed()`.

The MCP server is itself a thin SDK consumer — it imports `createStore`, `extractSnippet`, `addLineNumbers`, `getDefaultDbPath`, and the `QMDStore`/`ExpandedQuery`/`IndexStatus` types. Source: [src/mcp/server.ts]() (imports block near the top of the file). That means the SDK is the source of truth: the CLI, the MCP server, and any external Node/Bun integration all go through `createStore` and the same store types.

## MCP Server Interface

The MCP server follows MCP spec 2025-06-18 and is built with `@modelcontextprotocol/sdk@1.29.0`. Source: [package.json:38]() and [src/mcp/server.ts]() (the file-level JSDoc and `McpServer` import).

### Resources

A single resource template is registered:

| Resource | URI template | MIME type | Purpose |
|----------|--------------|-----------|---------|
| `document` | `qmd://{+path}` | `text/markdown` | Fetch a markdown document by collection-relative path; the body is line-numbered by default. |

When a client requests `qmd://journals/2025-05-12.md`, the server decodes the URI, calls `store.get(path, { includeBody: true })`, prepends any folder context as an HTML comment, and returns the body. If the document is missing it returns a `Document not found: <path>` text body. Source: [src/mcp/server.ts]() (the `registerResource("document", …)` block).

### Tools

The primary tool is `query`. Its input is a `searches` array of typed sub-queries, each of which is a Zod-validated object with a `type` of `"lex" | "vec" | "hyde"` and a `query` string. The first sub-query receives 2× weight. Optional fields include `limit` (default 10), `minScore` (default 0), and a global `intent` that disambiguates snippet extraction. Source: [src/mcp/server.ts]() (the `subSearchSchema` and tool registration).

```mermaid
flowchart LR
  A[Client] -->|typed searches| B[MCP query tool]
  B --> C[store.search]
  C --> D[(BM25 + sqlite-vec)]
  D --> E[Rerank via node-llama-cpp]
  E --> F[search results: docid, file, title, score, context, line, snippet]
  F --> A
  A -->|qmd:// URI| G[document resource]
  G --> H[store.get with line-range suffix]
  H --> A
```

Supporting tools include `get` and `multi_get` (mirroring the CLI), `status` (total documents, `hasVectorIndex`, `needsEmbedding`, per-collection counts), and `collections`. The server also exposes a guidance prompt on first contact that lists the available tools, the lex/vec/hyde semantics, and a reminder to run `qmd embed` if the vector index is empty. Source: [src/mcp/server.ts]() (the prompt text beginning with `${status.totalDocuments} markdown documents`).

### Transports

The server supports stdio (`StdioServerTransport`) and a Web-Standard streamable HTTP transport, plus a fallback `createServer` from `node:http` for streamable sessions. Source: [src/mcp/server.ts]() (imports of `@modelcontextprotocol/sdk/server/stdio.js` and `webStandardStreamableHttp.js`).

## Output Formatting

`src/cli/formatter.ts` provides a single `formatSearchResults(results, format, opts)` function that dispatches to `json`, `csv`, `files`, `md`, `xml`, or `cli` backends. Document output has a parallel `formatDocument(doc, format)` that supports `json`, `md`, and `xml`. Helper utilities include `addLineNumbers(text, startLine=1)`, `getDocid(hash)` (first six hex chars of the content hash), `escapeCSV`, and `escapeXml`. Source: [src/cli/formatter.ts]() (the `OutputFormat` union, the `formatDocument` switch, and the helper block).

CLI defaults to line-numbered Markdown and prints the `#docid` and `qmd://` URI for each result. Non-CLI formats disable color and produce machine-readable output suitable for piping into agents or CI. Source: [src/cli/formatter.ts]() (`FormatOptions.useColor` default and the per-format branches).

## Community Notes and Known Limitations

- **API-backed models are not yet supported.** Multiple issues (#114, #521, #620) request LiteLLM / OpenAI-compatible backends for embeddings, generation, and reranking. Today, QMD is centered on local GGUF models through `node-llama-cpp`, so users wanting API-based providers must run their own model servers.
- **Windows compatibility is incomplete.** Issue #31 documents multiple blocking failures on Windows 11 with Bun 1.3.6, primarily around Unix-specific assumptions in scripts and the `sqlite-vec` native binding. The `package.json` lists `sqlite-vec-windows-x64` as an *optional* dependency, so the Windows native module is only loaded if explicitly available.
- **Project status.** Issue #516 notes a high open-PR and open-issue count; the project is not archived and continues to ship releases (v2.5.3 as of the latest log).

## See Also

- Architecture and store internals — `src/store.ts`
- Configuration and collections — `src/collections.ts`
- Query-expansion fine-tuning pipeline — [finetune/README.md](https://github.com/tobi/qmd/blob/main/finetune/README.md)
- Benchmark scoring utilities — [src/bench/score.ts](https://github.com/tobi/qmd/blob/main/src/bench/score.ts)

---

<a id='page-4'></a>

## Deployment, Cross-Platform Support, and Extensibility

### Related Pages

Related topics: [Architecture, Search Pipeline, and AI Model Integration](#page-2), [CLI Commands, SDK Usage, and MCP Server](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [package.json](https://github.com/tobi/qmd/blob/main/package.json)
- [src/mcp/server.ts](https://github.com/tobi/qmd/blob/main/src/mcp/server.ts)
- [src/cli/qmd.ts](https://github.com/tobi/qmd/blob/main/src/cli/qmd.ts)
- [src/cli/formatter.ts](https://github.com/tobi/qmd/blob/main/src/cli/formatter.ts)
- [src/bench/bench.ts](https://github.com/tobi/qmd/blob/main/src/bench/bench.ts)
- [src/bench/score.ts](https://github.com/tobi/qmd/blob/main/src/bench/score.ts)
- [src/bench/types.ts](https://github.com/tobi/qmd/blob/main/src/bench/types.ts)
- [finetune/README.md](https://github.com/tobi/qmd/blob/main/finetune/README.md)
</details>

# Deployment, Cross-Platform Support, and Extensibility

QMD is a Node/Bun application that ships as a single `qmd` binary plus an embeddable MCP (Model Context Protocol) server. Its deployment model, native dependency footprint, and extension surfaces are concentrated in a small number of files, and each one directly shapes how the tool can be installed, ported, and extended.

## Deployment and Packaging

The package is published as `@tobilu/qmd` ([package.json:2](https://github.com/tobi/qmd/blob/main/package.json)) with an executable entrypoint declared under `bin.qmd` ([package.json:10](https://github.com/tobi/qmd/blob/main/package.json)) and a TypeScript build target at `dist/index.js` ([package.json:5-8](https://github.com/tobi/qmd/blob/main/package.json)). The `files` array ([package.json:12-23](https://github.com/tobi/qmd/blob/main/package.json)) lists the shipped assets: `bin/`, `dist/`, the `skills/` directory, the build/check/test scripts, the license, and the changelog. Build and verification are driven by `scripts/build.mjs`, `scripts/check-package-grammars.mjs`, `scripts/package-smoke.mjs`, and `scripts/test-all.mjs`, all referenced from the `files` whitelist and `scripts` block ([package.json:24-37](https://github.com/tobi/qmd/blob/main/package.json)). The repository also ships a Nix flake (referenced in the file listing) for reproducible environments on NixOS and macOS/Linux hosts.

Two transports are provided for the embedded MCP server ([src/mcp/server.ts:1-30](https://github.com/tobi/qmd/blob/main/src/mcp/server.ts)): `StdioServerTransport` for local editor/agent integration, and `WebStandardStreamableHTTPServerTransport` for remote/hosted deployments. The HTTP transport is wired to a Node `http.createServer` and an `isInitializeRequest` handshake ([src/mcp/server.ts:8-16](https://github.com/tobi/qmd/blob/main/src/mcp/server.ts)), and `enableProductionMode` is called on the store to harden SQLite behavior for long-running servers ([src/mcp/server.ts:24-28](https://github.com/tobi/qmd/blob/main/src/mcp/server.ts)).

## Cross-Platform Support

Native dependencies are the primary portability constraint. The package relies on `better-sqlite3` and the `sqlite-vec` extension for vector indexing, plus `node-llama-cpp` for local GGUF inference ([package.json:40-55](https://github.com/tobi/qmd/blob/main/package.json)). Because these have architecture-specific binaries, they are split into a hard `dependencies` block (the Node bindings) and an `optionalDependencies` block with one entry per supported platform ([package.json:57-64](https://github.com/tobi/qmd/blob/main/package.json)).

| Platform | Optional binary package |
| --- | --- |
| macOS (Apple Silicon) | `sqlite-vec-darwin-arm64` |
| macOS (Intel) | `sqlite-vec-darwin-x64` |
| Linux (ARM64) | `sqlite-vec-linux-arm64` |
| Linux (x64) | `sqlite-vec-linux-x64` |
| Windows (x64) | `sqlite-vec-windows-x64` |

The CLI itself is pure ESM/TypeScript and runs under either Node or Bun ([src/cli/qmd.ts](https://github.com/tobi/qmd/blob/main/src/cli/qmd.ts)). In practice, Windows users report multiple blocking issues: path handling, shell-quoting expectations in the embedded Bash grammar, and the absence of fully-tested prebuilt `sqlite-vec` and `node-llama-cpp` wheels for some Windows toolchains (community issue #31). Linux and macOS are the supported first-class platforms; Windows works only when both the optional native package and the underlying build prerequisites are present.

## Extensibility Surfaces

QMD exposes three composable extension points: the MCP server interface, the output formatter pipeline, and the benchmark/finetuning harnesses.

**MCP server.** The server registers a `document` resource via `ResourceTemplate("qmd://{+path}", ...)` ([src/mcp/server.ts:30-60](https://github.com/tobi/qmd/blob/main/src/mcp/server.ts)) and a set of typed tools, notably `query`, which accepts an array of `lex`/`vec`/`hyde` sub-queries described with Zod schemas ([src/mcp/server.ts:65-110](https://github.com/tobi/qmd/blob/main/src/mcp/server.ts)). Adding a new capability (a tool, a resource, or a new search backend) is therefore a matter of extending the Zod input schema, the handler, and the `searches` array; the rest of the protocol plumbing is reused.

**Output formatters.** The CLI supports six output formats (`cli`, `csv`, `md`, `xml`, `files`, `json`) declared in [src/cli/formatter.ts](https://github.com/tobi/qmd/blob/main/src/cli/formatter.ts). Each format is a small pure function over a `SearchResult` or `DocumentResult`, and a universal `formatSearchResults` dispatcher routes to the right encoder ([src/cli/formatter.ts](https://github.com/tobi/qmd/blob/main/src/cli/formatter.ts)). New formats (e.g. HTML, NDJSON) can be added by extending the `OutputFormat` union and adding a single `case` arm.

**Model providers and the HyDE/query-expansion layer.** The current default model paths (`DEFAULT_EMBED_MODEL`, `DEFAULT_QUERY_MODEL`, `DEFAULT_RERANK_MODEL`) are imported by the CLI from `store.ts` and resolved through the `resolveEmbedModel`, `resolveGenerateModel`, `resolveRerankModel`, and `resolveModels` helpers ([src/cli/qmd.ts](https://github.com/tobi/qmd/blob/main/src/cli/qmd.ts)). All of them currently target `node-llama-cpp`. Community issues #114, #521, and #620 explicitly request swapping these for OpenAI-/Ollama-/LiteLLM-compatible HTTP backends for embeddings, query expansion, and reranking; the request structure is already aligned with the resolver naming, suggesting a provider-pluggable rewrite is the natural extension path.

**Benchmark harness and fine-tuning.** The `bench` module ([src/bench/bench.ts](https://github.com/tobi/qmd/blob/main/src/bench/bench.ts)) loads a JSON fixture of `BenchmarkQuery` entries ([src/bench/types.ts](https://github.com/tobi/qmd/blob/main/src/bench/types.ts)) and runs each query through pluggable backends (`bm25`, `vector`, `hybrid`, `full`). Scoring is implemented as a standalone module ([src/bench/score.ts](https://github.com/tobi/qmd/blob/main/src/bench/score.ts)) computing precision@k, recall, MRR, and F1. The `finetune/` directory ([finetune/README.md](https://github.com/tobi/qmd/blob/main/finetune/README.md)) provides an SFT pipeline that trains a Qwen3-1.7B model to emit `lex:`/`vec:`/`hyde:` expansions, giving operators a path to customize the query-expansion stage without forking the runtime.

```mermaid
flowchart LR
  A[qmd CLI] --> B[QMDStore<br/>SQLite + sqlite-vec]
  B --> C[Search backends<br/>lex · vec · hyde]
  C --> D[Model resolvers<br/>resolveEmbedModel / resolveGenerateModel / resolveRerankModel]
  D --> E[node-llama-cpp<br/>local GGUF]
  D -.future.-> F[OpenAI-compatible<br/>HTTP backends]
  A --> G[MCP server<br/>stdio + WebStreamable HTTP]
  G --> H[Agents / Editors]
  A --> I[Bench harness]
  I --> J[Scoring module]
  A --> K[Finetune pipeline<br/>Qwen3-1.7B SFT]
```

## See Also

- [Project Overview and Architecture]()
- [Search Pipeline and Query Grammar]()
- [MCP Server and Agent Integration]()
- [Benchmarking and Evaluation]()
- [Fine-Tuning the Query Expansion Model]()

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: tobi/qmd

Summary: Found 14 structured pitfall item(s), including 4 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

## 1. Installation risk - Installation risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/tobi/qmd/issues/735

## 2. Configuration risk - Configuration risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/tobi/qmd/issues/645

## 3. Maintenance risk - Maintenance risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/tobi/qmd/issues/717

## 4. Security or permission risk - Security or permission risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/tobi/qmd/issues/699

## 5. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/tobi/qmd/issues/739

## 6. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/tobi/qmd/issues/736

## 7. Configuration risk - Configuration risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.host_targets | https://github.com/tobi/qmd

## 8. Configuration risk - Configuration risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/tobi/qmd/issues/738

## 9. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/tobi/qmd

## 10. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/tobi/qmd

## 11. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/tobi/qmd

## 12. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/tobi/qmd

## 13. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/tobi/qmd

## 14. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/tobi/qmd

<!-- canonical_name: tobi/qmd; human_manual_source: deepwiki_human_wiki -->