chunkhound Manual Preview

Doramagic Project Pack · Human Manual

chunkhound

Local first codebase intelligence

Architecture and System Overview

Related topics: Search, Research, and Code Mapping, MCP Integration and Deployment, Parsers, Providers, and Extensibility

Section Related Pages

Continue reading this section for the full explanation and source context.

Architecture and System Overview

ChunkHound is positioned as a "code research" tool that augments AI assistants with semantic understanding of a codebase. The README frames the project this way: *"Your AI assistant searches code but doesn't understand it. ChunkHound researches your codebase—extracting architecture, patterns, and institutional knowledge at any scale."* Source: README.md:1-3. The same file enumerates the primary capabilities: the cAST chunking algorithm, Multi-Hop Semantic Search, regex search (no API key required), support for 32 languages via Tree-sitter plus custom text-based parsers, MCP integration, and real-time indexing with explicit backend selection (watchdog, watchman, or polling). Source: README.md:11-29.

This page describes the moving parts that compose those capabilities and how they connect at a system level.

High-Level System Architecture

The repository follows a layered structure: a thin CLI/API surface, a domain-model core, infrastructure adapters for parsing/embedding/storage, and integration endpoints (MCP and editors). The CLI commands research, code_mapper, and autodoc each act as orchestrators that wire these layers together.

flowchart LR
    subgraph Clients["Clients"]
        MCP["MCP Servers<br/>(Claude, Cursor, Zed, …)"]
        Editor["Editor configs<br/>(.chunkhound.json)"]
    end

    subgraph CLI["ChunkHound CLI"]
        Research["research command"]
        Mapper["code_mapper command"]
        Autodoc["autodoc command"]
    end

    subgraph Core["Domain Core"]
        File["File model"]
        Chunk["Chunk model"]
        Emb["Embedding model"]
    end

    subgraph Infra["Infrastructure"]
        Parse["Tree-sitter parsers"]
        Embed["Embedding providers<br/>(VoyageAI / OpenAI / Ollama)"]
        Store["Persistence layer"]
    end

    MCP --> Research
    Editor --> CLI
    Research --> Chunk
    Research --> Embed
    Mapper --> Chunk
    Mapper --> Embed
    Autodoc --> Mapper
    Chunk --- File
    Embed --- Chunk
    Parse --> Chunk
    Embed --> Store

The CLI's progress and rendering utilities (RichOutputFormatter, TreeProgressDisplay) are shared across commands, keeping terminal behavior consistent. Source: rich_output.py:1-19, tree_progress.py:9-19.

Core Domain Models

The heart of the system is three immutable dataclasses declared in chunkhound/core/models/__init__.py:1-26 that re-export File, Chunk, Embedding, and EmbeddingResult.

Model	Purpose	Key Fields	Defined In
`File`	Metadata for an indexed source file	`path`, `mtime`, `language`, `size_bytes`, `id`, `content_hash`	file.py:1-50
`Chunk`	A semantically meaningful slice of a file	`symbol`, `start_line`, `end_line`, `code`, `chunk_type`, `file_id`, `language`, `start_byte`, `end_byte`, `metadata`	chunk.py:1-79
`Embedding`	A vector embedding of a chunk	`chunk_id`, `provider`, `model`, `dims`, `vector`, `created_at`	embedding.py:1-44

Chunk distinguishes code from documentation via is_code_chunk() / is_documentation_chunk() helpers and exposes structural predicates such as contains_line(), overlaps_with(), and size checks (is_small_chunk, is_large_chunk). Source: chunk.py:50-110. The models validate themselves in __post_init__, raising ValidationError for invariants like positive start_line / end_line. Source: chunk.py:60-78, embedding.py:38-44. This validation-in-construction pattern means callers can rely on the core objects being well-formed the moment they are created.

CLI, Research, and Code-Mapping Pipelines

The CLI exposes three top-level operations, each a long-running pipeline that streams progress through the tree-based display.

Research — wires EmbeddingManager and an optional LLMManager to the deep-research implementation; missing or invalid embedding configuration is surfaced early with explicit remediation messages. Source: research.py:21-59. Its parser accepts git diff/commit-range arguments plus common and config-specific argument groups (database, embedding, llm, research). Source: research_parser.py:1-19.
Code Mapper — runs a two-phase pipeline: a shallow research call that plans Points of Interest (the count keyed off a comprehensiveness setting), then a dedicated deep-research pass per PoI that is assembled into a single document. Source: code_mapper.py:1-15. The orchestrator supports concurrency via a -j/--jobs argument with env-var fallback to CH_CODE_MAPPER_*. Source: code_mapper_parser.py:22-78. Output directory handling warns when targeting a git-tracked path. Source: autodoc.py:31-38.
AutoDoc — generates an Astro documentation site from Code Mapper outputs, with prompts/overrides for non-interactive runs (e.g. --force, --assets-only). Source: autodoc.py:13-38, operations/README.md:1-19.

Both research and code_mapper rely on verify_database_exists before proceeding, mirroring the database-as-prerequisite architecture used by the indexer. Source: research.py:1-19, code_mapper.py:1-15.

Integration Points and Known Constraints

ChunkHound is editor- and assistant-agnostic at the integration boundary. The README lists MCP integrations with Claude, VS Code, Cursor, Windsurf, and Zed, and the hero terminal demo on the marketing site previews a chunkhound research "authentication architecture" invocation that emits a synthesized report. Source: README.md:25-27, hero-terminal.ts:1-13. Embedding and LLM providers are pluggable: VoyageAI (recommended), OpenAI, and local Ollama for embeddings; Claude Code CLI, Codex CLI, Anthropic, OpenAI, and Grok for LLMs. Source: README.md:33-39.

Several community discussions highlight the boundaries of this plug-in architecture and inform the design choices visible in the source:

Concurrent MCP instances — Issue #53 reports that running multiple Claude Code windows with stdio-mode chunkhound MCP crashes, while HTTP-mode semantic search returns empty. The parser and command layers currently assume single-tenant process state, which constrains concurrent editors to a single configurable transport.
Embedding model compatibility — Issue #41 (Ollama dengcao/Qwen3-Embedding-8B:Q5_K_M) flows through the EmbeddingProviderFactory used by setup_embedding_llm. Source: research.py:26-32. Custom or quantized embedding models require provider-level support; the abstraction is in place, but coverage is defined per-provider.
Language coverage — Ruby support is requested in #35, consistent with the language list bundled with the parser registry; new languages require both a parser hookup and embedding-model mapping, since the Chunk model carries the resolved language enum into storage. Source: chunk.py:42-50.
New LLM providers — OpenCode (Issue #113) has been requested as a CLI-flavored provider alongside Claude Code; the LLM manager pattern is symmetric to the embedding-manager pattern in research.py, so adapter additions slot in at that seam. Source: research.py:33-58.
Worktree efficiency — Issue #83 reports successful use inside monorepo worktrees; the architecture supports this via real-time indexers with explicit watchdog/watchman/polling backend selection. Source: README.md:25-29.

The release notes for v5.1.0 note that MCP search responses now return lean markdown with similarity percentages instead of JSON, reducing token overhead for downstream models—a refinement to the same integration layer. Source: release notes excerpt in community context.

Search, Research, and Code Mapping

Related topics: Architecture and System Overview, MCP Integration and Deployment

Section Related Pages

Continue reading this section for the full explanation and source context.

Search, Research, and Code Mapping

ChunkHound exposes three progressively richer ways of querying an indexed codebase: lightweight search, multi-step research, and architectural code mapping. All three share the same domain models (File, Chunk, Embedding) and the same database backing, but they differ in depth, output shape, and the infrastructure they depend on.

Domain Models Backing Every Query

Every query path ultimately operates on the immutable domain models declared in chunkhound/core/models/. A File records path, mtime, language, size, and a content_hash used for change detection Source: [chunkhound/core/models/file.py:18-39]. A Chunk carries symbol, start_line, end_line, code, chunk_type, language, optional byte offsets, and language-specific metadata; helpers such as contains_line(), overlaps_with(), and is_code make range checks trivial Source: [chunkhound/core/models/chunk.py:39-110]. Embedding couples a chunk to a provider, model, dimension count, and vector, with validation enforcing non-empty provider and model strings Source: [chunkhound/core/models/embedding.py:25-52]. These three models are re-exported from chunkhound.core.models for typed use across the system Source: [chunkhound/core/models/__init__.py:23-30].

Search

Search is the entry point advertised in the README: regex lookup works without any API key, while semantic lookup requires an embedding provider such as VoyageAI, OpenAI, or local Ollama Source: [README.md:31-39]. As of v5.1.0, MCP search responses are returned as lean markdown—syntax-highlighted code fences with similarity percentages—instead of verbose JSON, which significantly reduces token usage on the client side Source: [README.md]. Search relies on the File/Chunk model pair and on Embedding when a semantic query is requested; community report #53 notes that running multiple MCP clients against the same database can produce empty semantic results, a known limitation around concurrent access rather than a defect in the Embedding model itself.

Research

Research elevates search from "find code matching X" to "explain how the codebase does X." The CLI subparser is registered by add_research_subparser, which attaches git diff/commit-range arguments, common arguments, and database/embedding/LLM/research configuration groups Source: [chunkhound/api/cli/parsers/research_parser.py]. At runtime, research_command constructs an EmbeddingManager, registers a provider via EmbeddingProviderFactory, and conditionally builds an LLMManager before delegating to deep_research_impl Source: [chunkhound/api/cli/commands/research.py:26-58].

Long-running research sessions are visualized with a hierarchical TreeProgressDisplay that streams ProgressEvent records (node_start, search_semantic, llm_call, node_complete, etc.) using box-drawing prefixes and relative timestamps Source: [chunkhound/api/cli/utils/tree_progress.py:24-87]. Pretty output is rendered by RichOutputFormatter, which detects terminal capability and supports a quiet mode (CHUNKHOUND_QUICKRESEARCH_QUIET) that redirects console output to stderr so captured payloads stay clean Source: [chunkhound/api/cli/utils/rich_output.py:20-42]. Community issue #113 requests OpenCode as a new LLM provider—currently the system supports Claude Code CLI, Codex CLI, Anthropic, OpenAI, and xAI Grok Source: [README.md:36-39]—and the embedding provider list has known gaps (e.g., Ollama's dengcao/Qwen3-Embedding-8B:Q5_K_M per issue #41).

Code Mapping

Code mapping produces scoped architectural/operational documentation for a directory. The add_map_subparser defines a mandatory --out directory, an optional --combined markdown flag (falling back to CH_CODE_MAPPER_WRITE_COMBINED=1), a -j/--jobs concurrency cap, and prompt-only/plan-only modes Source: [chunkhound/api/cli/parsers/code_mapper_parser.py]. The code_mapper_command runs a two-phase pipeline: a shallow deep-research pass to identify points of interest, then a dedicated deep-research pass per POI, with results assembled into per-topic markdown files plus an optional combined document Source: [chunkhound/api/cli/commands/code_mapper.py:11-19]. Coverage statistics—referenced vs. unreferenced files—are computed via compute_unreferenced_scope_files and embedded in the generation stats Source: [chunkhound/api/cli/commands/code_mapper.py:23-30].

Configuration flows from .chunkhound.json and CHUNKHOUND_* environment variables; workspace overrides let multiple projects share a single index under /workspaces/.chunkhound.json Source: [operations/README.md]. A dedicated HyDE planning provider/model/effort can be set via map_hyde_* keys or CHUNKHOUND_LLM_MAP_HYDE_* env vars; if unset, Code Mapper falls back to the synthesis provider Source: [operations/README.md]. Output is then handed to the autodoc command, which writes Astro docs assets to the configured output directory and can be re-run with --assets-only to refresh site assets without regenerating content Source: [chunkhound/api/cli/commands/autodoc.py:24-48].

End-to-End Flow

flowchart LR
    Repo[Source Files] --> Index[chunkhound index]
    Index --> DB[(ChunkHound DB<br/>File / Chunk / Embedding)]
    DB --> Search[search<br/>regex + semantic]
    DB --> Research[research<br/>deep_research_impl]
    DB --> Mapper[map<br/>2-phase POI pipeline]
    Research --> Out1[Markdown report + tree progress]
    Mapper --> Out2[Per-topic + combined markdown]
    Out2 --> AutoDoc[autodoc → Astro site]

MCP Integration and Deployment

Related topics: Architecture and System Overview, Search, Research, and Code Mapping, Parsers, Providers, and Extensibility

Section Related Pages

Continue reading this section for the full explanation and source context.

MCP Integration and Deployment

Overview

ChunkHound exposes its codebase intelligence (regex search, semantic search, file/chunk/embedding queries, and LLM-driven research) through the Model Context Protocol (MCP), allowing AI editors such as Claude Code, VS Code, Cursor, Windsurf, Zed, and Roo to call ChunkHound as a tool provider (README.md).

The MCP surface is launched via the chunkhound mcp subcommand. Source: chunkhound/api/cli/commands/mcp.py:1-10. The implementation is configurable across three orthogonal axes:

Transport: stdio (default) vs. daemon-coordinated IPC.
Coordination: single-process vs. multi-client daemon.
Write mode: read-write vs. read-only (--read-only / database.read_only).

These axes are decoded in chunkhound/api/cli/commands/mcp.py where the entry point sets CHUNKHOUND_MCP_MODE=1, pre-imports numpy for DuckDB threading safety, and decides whether to route through the ClientProxy daemon or stay in-process.

Runtime Modes and Transport Selection

The MCP CLI supports four logically distinct run modes. Mode selection is driven by CLI flags, JSON configuration, and environment variables:

Flag / Setting	Behavior
`chunkhound mcp` (default)	Spawns/route-attaches to the daemon, allowing multiple MCP clients to share one indexer
`--stdio`	Forces legacy single-process stdio mode (backwards compatible)
`--no-daemon`	Disables daemonization; runs in-process
`--read-only` (or `database.read_only=true`)	Forces single-process stdio, disables watcher/indexing (DuckDB only)
`CHUNKHOUND_DAEMON_MODE=false`	Environment override equivalent to `--no-daemon`

Source: chunkhound/api/cli/commands/mcp.py:30-60.

The relationship between these flags can be visualized as:

flowchart TD
    A["chunkhound mcp invoked"] --> B{"read_only or --stdio?"}
    B -- yes --> S["Single-process stdio server<br/>(StdioMCPServer)"]
    B -- no --> C{"--no-daemon or<br/>CHUNKHOUND_DAEMON_MODE=false?"}
    C -- yes --> S
    C -- no --> D["Route via ClientProxy<br/>to daemon"]
    D --> E["Shared daemon process<br/>(Unix socket / TCP)"]
    E --> F["Multiple MCP clients<br/>share one indexer"]

When the daemon path is selected but the user has not opted out of it, the CLI prints a warning that read-only mode is being forced to single-process: "read-only mode forces single-process stdio (daemon coordination is for writers)." Source: chunkhound/api/cli/commands/mcp.py:45-55. The --show-setup flag short-circuits the start-up path to display per-editor configuration snippets and exits. Source: chunkhound/api/cli/commands/mcp.py:18-22.

Configuration Surface

MCP behavior is composed from three layers: JSON config (.chunkhound.json), environment variables, and CLI flags. The unified factory in chunkhound/api/cli/utils/config_factory.py builds and validates a Config instance for the mcp command, returning any validation errors. When validation fails, _fallback_config constructs a minimal Config so the server can still start. Source: chunkhound/api/cli/utils/config_factory.py:18-50.

The MCP command parser exposes the path positional argument (default: .), the daemon/read-only toggles, and delegates standard --database, --embedding, --indexing, --llm, and --mcp argument groups. Source: chunkhound/api/cli/parsers/mcp_parser.py:14-60.

Key runtime configuration values (defined under database, embedding, llm, and mcp blocks) propagate into the MCP server. The --read-only flag sets database.read_only, and the daemon-mode decision also reads CHUNKHOUND_DAEMON_MODE. The shared MCP mode flag CHUNKHOUND_MCP_MODE=1 is set as the very first action of mcp_command. Source: chunkhound/api/cli/commands/mcp.py:23-25.

Domain Models Exposed Over MCP

The MCP tools return the same immutable domain models used throughout ChunkHound. Three models back nearly every response payload:

File — path, mtime, language, size, content hash, and timestamps. Source: chunkhound/core/models/file.py:30-50.
Chunk — symbol, line/byte ranges, code, chunk_type, language, optional parent_header, and metadata. Source: chunkhound/core/models/chunk.py:30-60.
Embedding — chunk_id, provider, model, dimensions, vector, and creation timestamp. Source: chunkhound/core/models/embedding.py:30-50.

All three are exported via chunkhound/core/models/__init__.py and are constructed as frozen dataclasses. Source: chunkhound/core/models/__init__.py:20-30. Starting in v5.1.0, search MCP responses are emitted as lean, syntax-highlighted Markdown fences with similarity percentages instead of verbose JSON, reducing token cost for LLM clients.

Deployment: Editor Integration

The marketing site ships a "configurator" that materializes per-editor MCP snippets. Source: site/src/components/configurator-data.ts:1-50. Each editor expects a different JSON file location and shape, but all invoke chunkhound mcp (optionally with --stdio for single-client setups). Project-local files (.mcp.json, .vscode/mcp.json) do not need a path argument; global configs (~/.claude/) require an absolute path. Source: chunkhound/api/cli/commands/mcp.py:90-120.

The CLI can also print these instructions interactively via chunkhound mcp --show-setup and copy the Claude Code variant to the clipboard through pyperclip if installed. Source: chunkhound/api/cli/commands/mcp.py:100-130. For Python consumers, ChunkHound is installed as a uv tool:

uv tool install chunkhound
chunkhound mcp

Source: README.md:55-65.

Common Failure Modes

The community has surfaced two patterns worth noting during deployment:

Multiple concurrent stdio clients on the same project. Running several Claude Code windows each with chunkhound mcp in stdio mode against a shared directory causes crashes, because each stdio instance tries to acquire exclusive DuckDB locks. The daemon mode is the supported mitigation, but HTTP semantic search across instances has historically returned empty results. Source: community issue #53. The implementation distinguishes these cases by routing through ClientProxy whenever the daemon path is enabled. Source: chunkhound/api/cli/commands/mcp.py:35-55.

Embedding provider incompatibility. Embedding features silently degrade if the configured provider/model pair is not supported. For example, dengcao/Qwen3-Embedding-8B:Q5_K_M on Ollama has been reported as non-functional (community issue #41). Because chunkhound mcp still starts in this case, users should check embeddings_disabled propagation from --no-embeddings and the validated Config returned by the factory. Source: chunkhound/api/cli/utils/config_factory.py:45-55.

Read-only mode requires single-process. Setting database.read_only=true (or --read-only) silently disables the daemon; users expecting to share an index across clients will find that no indexing happens. The CLI prints a warning to make this explicit. Source: chunkhound/api/cli/commands/mcp.py:45-55.

Parsers, Providers, and Extensibility

Related topics: Architecture and System Overview, MCP Integration and Deployment

Section Related Pages

Continue reading this section for the full explanation and source context.

Parsers, Providers, and Extensibility

ChunkHound is built around three orthogonal extension points: parsers (turning source files into semantic Chunk records), embedding providers (vectorizing chunks for semantic search), and LLM providers (driving deep-research synthesis). Each follows a factory-driven, plug-in style so the system can grow without rewiring the CLI, MCP server, or persistence layer. This page documents the contracts you implement when adding a new parser, embedding backend, or LLM backend, and links each extension point to the surrounding wiring.

1. The Parser Subsystem

Parsers convert raw file bytes into typed Chunk objects that the indexer persists. The canonical entry point is chunkhound/parsers/universal_parser.py, which dispatches to language-specific implementations based on the File.language attribute resolved by chunkhound/core/detection.py. Selection is centralized in chunkhound/parsers/parser_factory.py, so adding a language means registering a new parser class with the factory rather than touching call sites. Source: chunkhound/parsers/parser_factory.py.

The output contract is the frozen Chunk dataclass in chunkhound/core/models/chunk.py, which carries symbol, start_line, end_line, code, chunk_type, file_id, language, byte offsets, timestamps, and a free-form metadata dict for language-specific properties such as visibility or mutability. Validators run in __post_init__ and raise ValidationError for invalid line numbers or empty symbols, which the indexer surfaces as recoverable errors. Source: chunkhound/core/models/chunk.py:1-120.

Community requests frequently target this layer: issue #35 ("Ruby Support") asks for a new language parser, and issue #83 ("Efficient Worktree Support?") leans on parser change-detection to skip unchanged content quickly. Adding a parser therefore typically means (a) implementing the parser class, (b) extending parser_factory, and (c) ensuring the File.language detector recognizes the new extension.

2. Embedding Providers

Embedding providers implement the contract satisfied by chunkhound/providers/embeddings/openai_provider.py and its peers. They are wired through EmbeddingProviderFactory and registered with EmbeddingManager at CLI startup. The CLI research command in chunkhound/api/cli/commands/research.py illustrates the pattern:

provider = EmbeddingProviderFactory.create_provider(config.embedding)
embedding_manager.register_provider(provider, set_default=True)

Source: chunkhound/api/cli/commands/research.py:21-45. Failures during provider construction are translated into friendly stderr messages via RichOutputFormatter, so a misconfigured provider does not silently disable semantic search.

The persistence-side counterpart is the Embedding model in chunkhound/core/models/embedding.py, which stores chunk_id, provider, model, dims, and the raw vector. Validators reject empty provider names and dimension/vector mismatches, so providers must keep dims consistent with the returned vector length. Source: chunkhound/core/models/embedding.py:1-60.

Issue #41 ("dengcao/Qwen3-Embedding-8B:Q5_K_M model doesn't support embedding") demonstrates a recurring integration problem: local Ollama models sometimes return payloads in a shape the OpenAI-compatible client cannot parse, so providers typically need defensive response parsing or a thin shim that normalizes Ollama output before it reaches the vector store.

3. LLM Providers and the CLI Contract

LLM providers power deep-research synthesis and the Code Mapper/AutoDoc cleanup passes. The Claude Code CLI provider (chunkhound/providers/llm/claude_code_cli_provider.py) and the OpenCode CLI provider (chunkhound/providers/llm/opencode_cli_provider.py) both shell out to a local binary rather than calling an HTTP API, which means auth is delegated to the user's local CLI session. This design is what issue #113 ("OpenCode LLM provider") requests: an OpenCode-backed provider following the same shape as the Claude Code CLI provider.

Providers are surfaced through LLMManager, which the research command resolves after embeddings:

if config.llm is not None:
    llm_manager = LLMManager()
    llm_manager.register_provider(...)

Source: chunkhound/api/cli/commands/research.py:46-65. Reasoning-effort and model overrides are exposed through environment variables (e.g., CHUNKHOUND_LLM_MAP_HYDE_PROVIDER, CHUNKHOUND_LLM_MAP_HYDE_MODEL, CHUNKHOUND_LLM_MAP_HYDE_REASONING_EFFORT) so AutoDoc and Code Mapper can use a cheaper planning model than the synthesis model.

The CLI surface for these providers is built compositionally in chunkhound/api/cli/parsers/. The research_parser registers database, embedding, LLM, and research options together via add_config_arguments, while code_mapper_parser and autodoc_parser add their own flags (such as --combined, --map-comprehensiveness, and --audience) on top of the shared add_common_arguments. Sources: chunkhound/api/cli/parsers/research_parser.py, chunkhound/api/cli/parsers/code_mapper_parser.py, chunkhound/api/cli/parsers/main_parser.py.

4. Extensibility Patterns at a Glance

flowchart LR
    CLI[CLI / MCP entrypoint] --> CMDS[commands/*.py]
    CMDS --> Factory[Parser / Provider factories]
    Factory --> Parsers[Parsers<br/>universal_parser + lang plugins]
    Factory --> Emb[Embedding providers<br/>openai, voyageai, ollama]
    Factory --> LLM[LLM providers<br/>claude-code-cli, opencode-cli]
    Parsers --> Models[core/models<br/>File, Chunk, Embedding]
    Emb --> Models
    LLM --> Output[RichOutputFormatter<br/>TreeProgressDisplay]

Three patterns repeat across all three extension points:

Factory + registry. ParserFactory, EmbeddingProviderFactory, and LLMManager all keep an internal map keyed by name; you register once and the rest of the system resolves by string.
Frozen dataclass contracts. Adding a parser or provider never requires schema migration because the output conforms to existing Chunk / Embedding models with metadata as an escape hatch. Source: chunkhound/core/models/__init__.py.
Rich, terminal-aware output. Long-running operations emit structured events through TreeProgressDisplay and RichOutputFormatter, both of which detect terminal capability and redirect to stderr when stdout is captured (e.g., for MCP stdio). Sources: chunkhound/api/cli/utils/rich_output.py:1-60, chunkhound/api/cli/utils/tree_progress.py:1-80.

Extension point	Where to register	Output contract	Community signal
New language	`parsers/parser_factory.py`	`Chunk` list	#35 Ruby, #83 worktrees
New embedding backend	`EmbeddingProviderFactory`	`Embedding` records	#41 Qwen3/Ollama
New LLM backend	`LLMManager`	Plain text / structured completion	#113 OpenCode

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

medium Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Configuration risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Capability evidence risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Maintenance risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 10 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

1. Installation risk: Installation risk requires verification

Severity: medium
Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/chunkhound/chunkhound/issues/352

2. Configuration risk: Configuration risk requires verification

Severity: medium
Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: capability.host_targets | https://github.com/chunkhound/chunkhound

3. Capability evidence risk: Capability evidence risk requires verification

Severity: medium
Finding: README/documentation is current enough for a first validation pass.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: capability.assumptions | https://github.com/chunkhound/chunkhound

4. Maintenance risk: Maintenance risk requires verification

Severity: medium
Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: evidence.maintainer_signals | https://github.com/chunkhound/chunkhound

5. Security or permission risk: Security or permission risk requires verification

Severity: medium
Finding: no_demo
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: downstream_validation.risk_items | https://github.com/chunkhound/chunkhound

6. Security or permission risk: Security or permission risk requires verification

Severity: medium
Finding: no_demo
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: risks.scoring_risks | https://github.com/chunkhound/chunkhound

7. Security or permission risk: Security or permission risk requires verification

Severity: medium
Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/chunkhound/chunkhound/issues/349

8. Security or permission risk: Security or permission risk requires verification

Severity: medium
Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/chunkhound/chunkhound/issues/315

9. Maintenance risk: Maintenance risk requires verification

Severity: low
Finding: issue_or_pr_quality=unknown。
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: evidence.maintainer_signals | https://github.com/chunkhound/chunkhound

10. Maintenance risk: Maintenance risk requires verification

Severity: low
Finding: release_recency=unknown。
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: evidence.maintainer_signals | https://github.com/chunkhound/chunkhound

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using chunkhound with real data or production workflows.

bug: <|endoftext|> in scanned files causes failure in generating embeddi - github / github_issue
Makefile cAST chunks exceed max_chunk_size (1350) — test_all_parsers_res - github / github_issue
ruff check (per CONTRIBUTING) surfaces ~820 pre-existing violations, i - github / github_issue
ChunkHound v5.1.0 - github / github_release
ChunkHound v5.0.0 - github / github_release
ChunkHound v4.1.0b1 (Beta) - github / github_release
v4.0.1 - github / github_release
ChunkHound v4.0.0 - github / github_release
ChunkHound v3.3.1 - github / github_release
v3.3.0 - github / github_release
ChunkHound v3.2.0 - github / github_release
v3.1.0 - github / github_release

Source: Project Pack community evidence and pitfall evidence

chunkhound

Architecture and System Overview

Related Pages

Architecture and System Overview

High-Level System Architecture

Core Domain Models

CLI, Research, and Code-Mapping Pipelines

Integration Points and Known Constraints

See Also

Search, Research, and Code Mapping

Related Pages

Search, Research, and Code Mapping

Domain Models Backing Every Query

Search

Research

Code Mapping

End-to-End Flow

See Also

MCP Integration and Deployment

Related Pages

MCP Integration and Deployment

Overview

Runtime Modes and Transport Selection

Configuration Surface

Domain Models Exposed Over MCP

Deployment: Editor Integration

Common Failure Modes

See Also

Parsers, Providers, and Extensibility

Related Pages

Parsers, Providers, and Extensibility

1. The Parser Subsystem

2. Embedding Providers

3. LLM Providers and the CLI Contract

4. Extensibility Patterns at a Glance

See Also

Doramagic Pitfall Log

Doramagic Pitfall Log

1. Installation risk: Installation risk requires verification

2. Configuration risk: Configuration risk requires verification

3. Capability evidence risk: Capability evidence risk requires verification

4. Maintenance risk: Maintenance risk requires verification

5. Security or permission risk: Security or permission risk requires verification

6. Security or permission risk: Security or permission risk requires verification

7. Security or permission risk: Security or permission risk requires verification

8. Security or permission risk: Security or permission risk requires verification

9. Maintenance risk: Maintenance risk requires verification

10. Maintenance risk: Maintenance risk requires verification

Community Discussion Evidence

Community Discussion Evidence