# https://github.com/Inferensys/contextful 项目说明书

生成时间：2026-05-16 06:05:31 UTC

## 目录

- [Project Introduction](#project-introduction)
- [Quick Start Guide](#quick-start)
- [High-Level Architecture](#high-level-architecture)
- [Runtime Components](#runtime-components)
- [Search Engine](#search-engine)
- [Context Packs](#context-packs)
- [Memory Ledger](#memory-ledger)
- [Graph Traversal and Analysis](#graph-traversal)
- [SQLite Database Schema](#sqlite-database)
- [Workspace Indexing System](#indexing-system)

<a id='project-introduction'></a>

## Project Introduction

### 相关页面

相关主题：[High-Level Architecture](#high-level-architecture), [Quick Start Guide](#quick-start)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [README.md](https://github.com/Inferensys/contextful/blob/main/README.md)
- [package.json](https://github.com/Inferensys/contextful/blob/main/package.json)
- [src/cli.ts](https://github.com/Inferensys/contextful/blob/main/src/cli.ts)
- [src/extract.ts](https://github.com/Inferensys/contextful/blob/main/src/extract.ts)
- [src/search.ts](https://github.com/Inferensys/contextful/blob/main/src/search.ts)
- [src/mcp-server.ts](https://github.com/Inferensys/contextful/blob/main/src/mcp-server.ts)
- [src/report.ts](https://github.com/Inferensys/contextful/blob/main/src/report.ts)
</details>

# Project Introduction

Contextful is an intelligent code context management system designed to provide AI agents with compact, evidence-backed information for codebase navigation and understanding. The project serves as a bridge between large codebases and AI-powered development tools by indexing source code, extracting symbols, tracking dependencies, and generating token-budgeted evidence packs for queries.

## Purpose and Scope

Contextful solves the fundamental problem that AI coding assistants face when working with large repositories: excessive context requirements that lead to token waste and degraded performance. Instead of forcing agents to read dozens of random files, Contextful enables targeted, cited, and ranked context retrieval that maximizes the value of each token spent.

The system operates in three primary modes:

1. **Indexing Mode** - Scans and indexes source code, extracting symbols, dependencies, and semantic chunks
2. **Query Mode** - Creates evidence packs for natural language queries with token budgets
3. **Search Mode** - Provides lightweight search across code, docs, symbols, and memory without full evidence compilation

资料来源：[README.md:1-15]()

## Architecture Overview

The Contextful system consists of several interconnected components that work together to provide context management capabilities.

```mermaid
graph TD
    A[Source Code] --> B[Indexing Engine]
    B --> C[SQLite Kernel DB]
    C --> D[Search Module]
    C --> E[Graph Analysis]
    C --> F[Memory Ledger]
    
    G[CLI / MCP Server] --> D
    G --> E
    G --> F
    
    D --> H[Evidence Pack]
    E --> H
    F --> H
    
    H --> I[AI Agent / User]
```

### Component Responsibilities

| Component | File | Responsibility |
|-----------|------|----------------|
| Indexing Engine | `src/extract.ts` | Parse source files, extract symbols and dependencies |
| Search Module | `src/search.ts` | Full-text search, intent classification, ranking |
| Graph Analysis | `src/search.ts` | Trace dependencies and code paths |
| Memory Ledger | `src/memory.ts` | Store evidence-backed lessons across sessions |
| CLI Interface | `src/cli.ts` | Command-line interface for all operations |
| MCP Server | `src/mcp-server.ts` | Model Context Protocol stdio server |

资料来源：[src/extract.ts:1-50](), [src/search.ts:1-30](), [src/cli.ts:1-40]()

## Supported Languages and File Types

Contextful supports multiple programming languages through pattern-based extraction. The indexing engine recognizes language-specific syntax for symbols and dependencies.

### Language Support Matrix

| Language | Functions | Classes | Types | Imports |
|----------|-----------|---------|-------|---------|
| TypeScript/JavaScript | ✓ | ✓ | ✓ | ✓ |
| Python | ✓ | ✓ | - | ✓ |
| Go | ✓ | ✓ | ✓ | ✓ |
| Rust | ✓ | ✓ | ✓ | ✓ |
| Markdown | - | - | Headings | - |
| JSON | - | - | Config keys | - |

资料来源：[src/extract.ts:15-80]()

## Core MCP Tools

Contextful exposes its capabilities through the Model Context Protocol (MCP), providing AI agents with a standardized tool interface. The primary tools are designed to keep the agent surface small while providing maximum utility.

```mermaid
graph LR
    A[Agent] -->|context_pack| B[Evidence Pack Generator]
    A -->|search_code| C[Code Search]
    A -->|trace_path| D[Graph Traversal]
    A -->|impact_analysis| E[Dependency Analyzer]
    A -->|why_changed| F[Git History]
    A -->|recall_memory| G[Memory Search]
    A -->|write_lesson| H[Lesson Writer]
```

### Tool Descriptions

| Tool | Purpose | Key Parameters |
|------|---------|----------------|
| `context_pack` | Returns ranked, cited, token-budgeted context bundles | `query`, `budget`, `scope` |
| `search_code` | Powerful search across code, docs, symbols, and memory | `query`, `mode`, `filters` |
| `trace_path` | Graph traversal across files, symbols, modules, and config | `from`, `to`, `edge_types` |
| `impact_analysis` | Reverse dependencies and likely tests | `symbol_or_file` |
| `why_changed` | Current evidence plus git history | `symbol_or_file` |
| `recall_memory` | Search session learnings and durable lessons | `query`, `scope` |
| `write_lesson` | Store evidence-backed lessons | `claim`, `evidence_refs`, `confidence` |

资料来源：[README.md:25-45](), [src/mcp-server.ts:1-80]()

## CLI Interface

Contextful provides a command-line interface through the `cxf` binary (with `contextful` as a readable alias). The CLI supports both one-shot operations and daemon mode for continuous indexing.

### Command Reference

| Command | Description | Key Options |
|---------|-------------|-------------|
| `index` | Index a workspace | `--workspace`, `--watch` |
| `daemon` | Run local indexing daemon | `--workspace` |
| `query` | Create evidence pack for query | `--workspace`, `--budget`, `--json` |
| `search` | Search without full evidence pack | `--workspace`, `--limit`, `--kind` |
| `report` | Generate context report | `--workspace`, `--format` |
| `memory add` | Store evidence-backed lesson | `--claim`, `--evidence`, `--scope`, `--confidence` |
| `server` | Run MCP stdio server | - |

资料来源：[src/cli.ts:40-120](), [README.md:15-35]()

### Example Usage

```bash
# Index a workspace
npx @inferensys/contextful index --workspace .

# Query with token budget
npx @inferensys/contextful query "where is user auth handled" --workspace . --budget 2000

# Run as MCP server
npx @inferensys/contextful server
```

资料来源：[README.md:8-15]()

## Data Models

### Evidence Pack Structure

The `EvidencePack` is the core data structure returned by query operations. It contains all necessary context for an agent to answer a query.

```typescript
interface EvidencePack {
  id: string;                    // Unique pack identifier
  query: string;                 // Original query
  scope: string;                 // Scope of the context
  intent: SearchIntent;          // Classified query intent
  summary: string;               // Human-readable summary
  citations: SearchHit[];        // Ranked evidence items
  files: FileContext[];          // Grouped file references
  symbols: SymbolRecord[];       // Relevant symbols
  graphPaths: GraphPath[];       // Dependency paths
  memoryHits: SearchHit[];       // Memory matches
  confidence: number;            // Confidence score (0.1-0.92)
  tokenEstimate: number;         // Estimated token count
  budget: number;                // Token budget
  createdAt: string;             // ISO timestamp
}
```

资料来源：[src/search.ts:200-250]()

### Search Hit Structure

Each search result is represented as a `SearchHit` with relevance ranking and excerpt information.

| Field | Type | Description |
|-------|------|-------------|
| `ref` | string | Reference identifier (e.g., `file:src/auth.ts:1-20`) |
| `path` | string | File path |
| `title` | string | Display title |
| `excerpt` | string | Relevant text snippet |
| `kind` | string | Type: `code`, `doc`, `symbol`, `memory` |
| `rank` | number | BM25 relevance score |

资料来源：[src/search.ts:50-80]()

## Dependencies and Technology Stack

Contextful is built on a carefully selected set of dependencies that enable efficient code indexing and search.

| Dependency | Version | Purpose |
|------------|---------|---------|
| `@modelcontextprotocol/sdk` | ^1.29.0 | MCP protocol implementation |
| `better-sqlite3` | ^12.10.0 | SQLite database for indexing |
| `commander` | ^14.0.3 | CLI argument parsing |
| `fast-glob` | ^3.3.3 | File pattern matching |
| `tree-sitter-wasms` | ^0.1.13 | Syntax parsing |
| `web-tree-sitter` | ^0.20.8 | Tree-sitter bindings |
| `zod` | ^4.4.3 | Schema validation |

资料来源：[package.json:20-40]()

### System Requirements

- **Node.js**: >= 20
- **License**: MIT
- **Repository**: [inferensys/contextful](https://github.com/Inferensys/contextful)

资料来源：[package.json:45-55]()

## Supported IDE Integration

Contextful is designed to integrate with a wide range of AI-powered development tools:

| IDE/Extension | Status |
|---------------|--------|
| GitHub Copilot | Supported |
| VS Code | Supported |
| Cursor | Supported |
| Windsurf | Supported |
| Cline | Supported |
| Roo Code | Supported |
| Continue | Supported |
| Zed | Supported |

资料来源：[package.json:10-20]()

## Workflow: From Indexing to Query

The complete workflow demonstrates how Contextful transforms raw source code into actionable intelligence for AI agents.

```mermaid
sequenceDiagram
    participant U as User/Agent
    participant CLI as CLI/MCP Server
    participant IDX as Indexer
    participant DB as SQLite Kernel
    participant SRCH as Search Engine
    participant MEM as Memory Ledger

    U->>CLI: index --workspace ./project
    CLI->>IDX: Extract symbols & dependencies
    IDX->>DB: Store in chunks_fts, symbols, edges
    DB-->>CLI: Index complete

    U->>CLI: query "how is auth handled"
    CLI->>SRCH: classifyQuery() intent=exact
    SRCH->>DB: FTS + BM25 search
    DB-->>SRCH: Ranked hits
    SRCH->>MEM: Check memory ledger
    MEM-->>SRCH: Related lessons
    CLI-->>U: EvidencePack (token-budgeted)

    U->>CLI: write_lesson --claim "Auth pattern" --evidence file:...
    CLI->>MEM: Store lesson with confidence
    MEM-->>CLI: Lesson saved
```

资料来源：[src/search.ts:100-150](), [src/report.ts:80-120]()

## Next Steps

To continue exploring Contextful:

1. **Installation Guide** - Set up Contextful in your development environment
2. **CLI Reference** - Detailed documentation of all CLI commands
3. **MCP Tools API** - Complete reference for MCP tool interfaces
4. **Configuration** - Workspace configuration and tuning options
5. **Memory System** - Using the evidence-backed lesson system

---

<a id='quick-start'></a>

## Quick Start Guide

### 相关页面

相关主题：[Project Introduction](#project-introduction)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [README.md](https://github.com/Inferensys/contextful/blob/main/README.md)
- [src/cli.ts](https://github.com/Inferensys/contextful/blob/main/src/cli.ts)
- [src/mcp-server.ts](https://github.com/Inferensys/contextful/blob/main/src/mcp-server.ts)
- [src/search.ts](https://github.com/Inferensys/contextful/blob/main/src/search.ts)
- [src/report.ts](https://github.com/Inferensys/contextful/blob/main/src/report.ts)
</details>

# Quick Start Guide

## Overview

Contextful is a contextual indexing and search system designed to help AI agents efficiently retrieve relevant code evidence. Instead of forcing agents to perform dozens of random file reads, Contextful returns compact, ranked, and cited evidence packs that fit within a token budget.

资料来源：[README.md:1-10]()

## Installation

Install Contextful using npm. The package provides both the `cxf` binary and the full `contextful` alias.

```bash
npm install -g @inferensys/contextful
```

Alternatively, run commands directly via `npx`:

```bash
npx @inferensys/contextful index --workspace .
```

资料来源：[README.md:11-14]()

## CLI Commands

Contextful provides a command-line interface with the following primary commands:

| Command | Description |
|---------|-------------|
| `cxf index` | Index a workspace for search |
| `cxf daemon` | Run a local indexing daemon |
| `cxf query` | Create an evidence pack for a query |
| `cxf search` | Search indexed context |
| `cxf report` | Generate a context report |
| `cxf memory add` | Store an evidence-backed lesson |
| `cxf server` | Run the MCP stdio server |

资料来源：[README.md:23-32]()

## Basic Workflow

### Step 1: Index Your Workspace

Before searching, you must index your codebase. This creates the searchable database:

```bash
cxf index --workspace .
```

For continuous indexing as files change, use the daemon mode:

```bash
cxf daemon --workspace .
```

资料来源：[src/cli.ts:1-20]()

### Step 2: Query for Context

Once indexed, ask questions about your codebase:

```bash
cxf query "where is user auth handled" --workspace . --budget 2000
```

The `query` command returns a ranked evidence pack with citations and file references.

#### Query Options

| Option | Description | Default |
|--------|-------------|---------|
| `--workspace <path>` | Workspace path | Current directory |
| `--budget <tokens>` | Approximate token budget | 2000 |
| `--json` | Output as JSON instead of Markdown | false |

资料来源：[src/cli.ts:22-30]()

### Step 3: Search Without Building Evidence Packs

For quick lookups without compiling full evidence packs, use `search`:

```bash
cxf search "authentication middleware" --workspace . --limit 10 --kind code
```

#### Search Options

| Option | Description | Default |
|--------|-------------|---------|
| `--workspace <path>` | Workspace path | Current directory |
| `--limit <count>` | Maximum hits | 10 |
| `--kind` | Filter: `all`, `code`, `docs`, `symbols`, `memory` | `all` |

资料来源：[src/cli.ts:32-42]()

### Step 4: Generate Reports

Generate comprehensive context reports in various formats:

```bash
cxf report --workspace . --format markdown
cxf report --workspace . --format json
cxf report --workspace . --format html
```

资料来源：[src/cli.ts:44-48]()

## MCP Server Integration

Contextful can run as a Model Context Protocol (MCP) server, providing tools directly to AI agents.

```bash
cxf server
```

### Available MCP Tools

| Tool | Purpose |
|------|---------|
| `context_pack` | Returns ranked, cited, token-budgeted evidence bundles |
| `search_code` | Code, docs, symbol, and memory search |
| `trace_path` | Graph traversal across files, symbols, modules, and config |
| `impact_analysis` | Reverse dependencies and likely tests |
| `why_changed` | Current evidence plus git history |
| `recall_memory` | Search session learnings and durable project lessons |
| `write_lesson` | Store evidence-backed lessons for future sessions |

资料来源：[README.md:40-48]()

### MCP Tool Parameters

#### context_pack

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `query` | string | Yes | Query to answer from indexed context |
| `budget` | number | No | Token budget for the response |
| `scope` | string | No | Search scope |

资料来源：[src/mcp-server.ts:1-25]()

#### search_code

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `query` | string | Yes | Search query |
| `mode` | string | No | Search mode |
| `filters` | object | No | Search filters |
| `workspace` | string | No | Workspace path |
| `limit` | number | No | Maximum results |

资料来源：[src/mcp-server.ts:26-40]()

#### write_lesson

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `claim` | string | Yes | Lesson claim |
| `evidence_refs` | array | Yes | Evidence references (e.g., `file:src/auth.ts:1-20`) |
| `scope` | string | No | Memory scope |
| `confidence` | number | No | Confidence from 0 to 1 |
| `supersedes` | string | No | Previous lesson ID to supersede |

资料来源：[src/mcp-server.ts:65-80]()

## Memory System

Contextful includes an evidence-backed memory system for storing lessons across sessions.

### Adding a Lesson

```bash
cxf memory add \
  --claim "Always validate tokens in middleware" \
  --evidence "file:src/auth.ts:1-20" \
  --workspace . \
  --confidence 0.8
```

#### Memory Command Options

| Option | Required | Description |
|--------|----------|-------------|
| `--claim <text>` | Yes | The lesson or claim |
| `--evidence <ref...>` | Yes | Evidence references |
| `--workspace <path>` | No | Workspace path |
| `--scope <scope>` | No | Memory scope (default: `repo`) |
| `--confidence <number>` | No | Confidence from 0 to 1 (default: 0.7) |

资料来源：[src/cli.ts:50-75]()

## Output Formats

### Markdown Output (Default)

```bash
cxf query "where is auth handled" --workspace .
```

Returns a formatted Markdown document with citations and graph paths.

### JSON Output

```bash
cxf query "where is auth handled" --workspace . --json
```

Returns structured JSON data suitable for programmatic processing.

资料来源：[src/cli.ts:22-30]()

### Report Formats

| Format | Description |
|--------|-------------|
| `markdown` | Human-readable Markdown report |
| `json` | Structured JSON data |
| `html` | Standalone HTML page |

资料来源：[src/cli.ts:44-48]()

## Architecture Overview

```mermaid
graph TD
    A[CLI / MCP Server] --> B[Workspace Indexer]
    B --> C[SQLite Kernel DB]
    C --> D[Full-Text Search]
    C --> E[Symbol Index]
    C --> F[Graph Edges]
    G[Query Request] --> H[Search Context]
    H --> I[Evidence Pack Builder]
    I --> D
    I --> E
    I --> F
    I --> J[Memory Ledger]
    I --> K[Evidence Pack Output]
    J --> J
```

## Common Usage Patterns

### Pattern 1: Initial Setup

```bash
# Index the workspace
cxf index --workspace /path/to/project --watch

# Generate initial report
cxf report --workspace /path/to/project --format html > report.html
```

### Pattern 2: Interactive Exploration

```bash
# Run as MCP server
cxf server

# Or use CLI directly
cxf query "how does the cache work" --workspace . --budget 3000
```

### Pattern 3: Agent Memory Persistence

```bash
# Store learned lessons
cxf memory add --claim "Config validation happens in validate.ts" --evidence "file:src/config/validate.ts:1-50"

# Recall past lessons
# Via MCP: recall_memory(query="config validation")
```

## Next Steps

- Explore [Architecture Documentation](architecture) for deep dive into indexing and search internals
- Learn about [Memory System](memory) for evidence-backed knowledge persistence
- Review [API Reference](api) for programmatic integration

---

<a id='high-level-architecture'></a>

## High-Level Architecture

### 相关页面

相关主题：[Runtime Components](#runtime-components), [Search Engine](#search-engine), [SQLite Database Schema](#sqlite-database)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [README.md](https://github.com/Inferensys/contextful/blob/main/README.md)
- [src/cli.ts](https://github.com/Inferensys/contextful/blob/main/src/cli.ts)
- [src/extract.ts](https://github.com/Inferensys/contextful/blob/main/src/extract.ts)
- [src/search.ts](https://github.com/Inferensys/contextful/blob/main/src/search.ts)
- [src/indexer.ts](https://github.com/Inferensys/contextful/blob/main/src/indexer.ts)
- [src/report.ts](https://github.com/Inferensys/contextful/blob/main/src/report.ts)
- [src/util.ts](https://github.com/Inferensys/contextful/blob/main/src/util.ts)
</details>

# High-Level Architecture

Contextful is a local-only indexing and context management tool designed to help AI coding assistants retrieve compact, evidence-backed context from workspace codebases. The system operates without external embedding APIs, instead relying on SQLite FTS5 full-text search, graph-based dependency tracking, and intent-classified query routing. 资料来源：[README.md](https://github.com/Inferensys/contextful/blob/main/README.md)

## System Overview

Contextful functions as a local daemon that continuously indexes workspace files, extracts code symbols and import relationships, and provides a structured context pack API to agents. The architecture follows a three-layer design:

1. **Indexing Layer** - File parsing, symbol extraction, edge detection
2. **Storage Layer** - SQLite kernel with FTS5 search and graph tables
3. **Query Layer** - Intent classification, ranked search, evidence pack assembly

资料来源：[src/indexer.ts](https://github.com/Inferensys/contextful/blob/main/src/indexer.ts)

## Component Architecture

```mermaid
graph TD
    A[Workspace Files] --> B[Indexer]
    B --> C[Symbol Extraction]
    B --> D[Edge Detection]
    B --> E[Chunk Generation]
    C --> F[SQLite Kernel DB]
    D --> F
    E --> F
    G[CLI / MCP Server] --> H[Search Module]
    H --> F
    H --> I[Context Pack Assembly]
    I --> J[Evidence Pack Output]
```

### Core Components

| Component | File | Responsibility |
|-----------|------|----------------|
| Indexer | `src/indexer.ts` | Recursively walks workspace, triggers file processing |
| Extractor | `src/extract.ts` | Parses symbols, edges, and code chunks per file |
| Search | `src/search.ts` | FTS5 queries, intent classification, ranking |
| CLI | `src/cli.ts` | Command-line interface and MCP server entry point |
| Report | `src/report.ts` | Generates workspace context reports |

资料来源：[src/indexer.ts](https://github.com/Inferensys/contextful/blob/main/src/indexer.ts), [src/extract.ts](https://github.com/Inferensys/contextful/blob/main/src/extract.ts), [src/search.ts](https://github.com/Inferensys/contextful/blob/main/src/search.ts)

## Indexing Pipeline

The indexing pipeline processes workspace files through multiple extraction stages. Each source file is read, classified by language, and passed through specialized extractors that produce structured records.

```mermaid
graph LR
    A[File Content] --> B[Language Detection]
    B --> C[Symbol Extraction]
    B --> D[Edge Extraction]
    B --> E[Chunk Extraction]
    C --> F[symbols table]
    D --> G[edges table]
    E --> H[chunks_fts table]
```

### Symbol Extraction

The `extractSymbols` function identifies named code entities based on language-specific patterns:

| Language | Supported Symbols |
|----------|-------------------|
| TypeScript/JavaScript | functions, classes, interfaces, types, const arrow functions |
| Python | functions, classes |
| Go | functions, structs, interfaces |
| Rust | functions, structs, enums, traits, impl blocks |
| Markdown | headings |
| JSON | config keys |

资料来源：[src/extract.ts:1-80](https://github.com/Inferensys/contextful/blob/main/src/extract.ts)

### Edge Detection

Import relationships are tracked as directed edges between modules. The `extractEdges` function processes different import syntaxes per language:

- **TypeScript/JavaScript**: ES6 `import` and `require()` statements
- **Python**: `from ... import` and `import` statements
- **Go**: Import strings within double quotes
- **Rust**: `use` and `mod` declarations
- **JSON**: Top-level keys in configuration files

资料来源：[src/extract.ts:100-160](https://github.com/Inferensys/contextful/blob/main/src/extract.ts)

### Chunk Generation

Code files are split into semantic chunks for full-text search. The `codeChunks` function segments content into logical blocks based on:
- Empty line boundaries
- Token budget (target: ~300 tokens per chunk)
- Language-specific token estimation via `estimateTokens`

资料来源：[src/extract.ts:180-220](https://github.com/Inferensys/contextful/blob/main/src/extract.ts)

## Storage Layer

### SQLite Kernel Schema

The kernel database uses SQLite with several specialized tables:

| Table | Purpose | Key Columns |
|-------|---------|-------------|
| `files` | Tracked workspace files | `path`, `language`, `hash`, `indexed_at` |
| `symbols` | Extracted code symbols | `ref`, `name`, `kind`, `file_path`, `line`, `signature`, `exported` |
| `edges` | Import/dependency graph | `source_file`, `target_name`, `target_type`, `edge_type`, `line` |
| `chunks_fts` | FTS5 virtual table for full-text search | `ref`, `path`, `title`, `text`, `kind` |
| `memory` | Evidence-backed lessons | `id`, `claim`, `scope`, `confidence`, `created_at` |

资料来源：[src/search.ts](https://github.com/Inferensys/contextful/blob/main/src/search.ts), [src/indexer.ts](https://github.com/Inferensys/contextful/blob/main/src/indexer.ts)

## Query and Search System

### Intent Classification

Queries are classified into intents to optimize search strategy:

| Intent | Trigger Keywords | Search Focus |
|--------|------------------|--------------|
| `code` | `function`, `class`, `implementation` | Symbol and code chunks |
| `memory` | `memory`, `lesson`, `session` | Memory ledger |
| `impact` | `impact`, `depends on`, `blast radius` | Dependency graph |
| `historical` | `why`, `changed`, `commit` | Git history |
| `architectural` | `architecture`, `flow`, `path`, `trace` | Graph traversal |
| `docs` | `documentation`, `readme`, `guide` | Markdown chunks |
| `exact` | symbols, paths, line references | Precise symbol matching |
| `vague` | Default fallback | Broad FTS search |

资料来源：[src/search.ts:1-50](https://github.com/Inferensys/contextful/blob/main/src/search.ts)

### Context Pack Assembly

The `createContextPack` function orchestrates the evidence gathering:

1. Classify query intent
2. Execute FTS5 search across chunks
3. Apply query expansion with domain-specific term additions
4. Score and rank hits using BM25 with intent-based bonuses
5. Select hits within token budget
6. Load related symbols and graph paths
7. Assemble and return `EvidencePack`

资料来源：[src/search.ts:200-280](https://github.com/Inferensys/contextful/blob/main/src/search.ts)

## CLI and MCP Integration

### Command Structure

| Command | Purpose | Key Options |
|---------|---------|-------------|
| `index` | Initial workspace indexing | `--workspace`, `--watch` |
| `daemon` | Continuous indexing with file watching | `--workspace` |
| `query` | Generate evidence pack | `--workspace`, `--budget`, `--json` |
| `search` | Direct search without packing | `--workspace`, `--limit`, `--kind` |
| `report` | Generate context report | `--workspace`, `--format` |
| `memory add` | Store evidence-backed lessons | `--claim`, `--evidence`, `--scope` |
| `server` | Start MCP stdio server | (none) |

资料来源：[src/cli.ts:20-100](https://github.com/Inferensys/contextful/blob/main/src/cli.ts)

### MCP Server Tools

The MCP server exposes standardized tools for agent integration:

- `context_pack(query, budget, scope)` - Primary killer tool returning ranked, cited evidence
- `search_code(query, mode, filters)` - Code, docs, symbol, and memory search
- `trace_path(from, to, edge_types)` - Graph traversal across the codebase
- `impact_analysis(symbol_or_file)` - Reverse dependency analysis
- `why_changed(symbol_or_file)` - Git history with current evidence
- `recall_memory(query, scope)` - Search persistent lessons
- `write_lesson(claim, evidence_refs, scope)` - Store new memories

资料来源：[README.md](https://github.com/Inferensys/contextful/blob/main/README.md)

## Report Generation

The report system aggregates workspace statistics and warnings:

```mermaid
graph TD
    A[generateReport] --> B[Index Status Check]
    B --> C[File Statistics]
    B --> D[Symbol Statistics]
    B --> E[Edge Statistics]
    B --> F[Warning Collection]
    C --> G[renderMarkdown / renderHtml]
    D --> G
    E --> G
    F --> G
```

Reports support three output formats:
- **markdown** - Plain text with markdown headings
- **json** - Structured JSON with all report fields
- **html** - Self-contained HTML document with styling

资料来源：[src/report.ts:1-80](https://github.com/Inferensys/contextful/blob/main/src/report.ts)

## Privacy and Security

Contextful operates entirely locally with no external API calls:

- No embedding API calls for vector search
- No source code uploads
- No file editing or auto-fixes
- No dependency installation in target workspace

Evidence references are validated and stale references are rejected to maintain integrity of the memory system.

资料来源：[README.md](https://github.com/Inferensys/contextful/blob/main/README.md)

## Data Flow Summary

```mermaid
sequenceDiagram
    participant User
    participant CLI as CLI/MCP Server
    participant Indexer
    participant Extractor
    participant Search
    participant Kernel as SQLite Kernel
    
    User->>CLI: index --workspace .
    CLI->>Indexer: indexWorkspace()
    Indexer->>Extractor: extractFile()
    Extractor->>Kernel: Insert symbols, edges, chunks
    Kernel-->>Indexer: Confirmation
    
    User->>CLI: query "where is auth handled"
    CLI->>Search: searchContext()
    Search->>Kernel: FTS5 query
    Search->>Kernel: Graph traversal
    Search->>Kernel: Memory search
    Kernel-->>Search: Ranked hits
    Search-->>CLI: EvidencePack
    CLI-->>User: Compact context output
```

## Key Design Decisions

| Decision | Rationale |
|----------|-----------|
| SQLite FTS5 over vector embeddings | Local-only operation, no external API dependencies |
| Intent-based query routing | Optimizes search strategy based on query semantics |
| BM25 scoring with bonuses | Balances relevance with domain-specific priorities |
| Token-budgeted evidence packs | Prevents context overflow in LLM contexts |
| Evidence refs as first-class citizens | Enables verifiable, traceable AI responses |

资料来源：[src/search.ts:50-150](https://github.com/Inferensys/contextful/blob/main/src/search.ts), [src/util.ts](https://github.com/Inferensys/contextful/blob/main/src/util.ts)

---

<a id='runtime-components'></a>

## Runtime Components

### 相关页面

相关主题：[High-Level Architecture](#high-level-architecture)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [src/cli.ts](https://github.com/Inferensys/contextful/blob/main/src/cli.ts)
- [src/indexer.ts](https://github.com/Inferensys/contextful/blob/main/src/indexer.ts)
- [src/search.ts](https://github.com/Inferensys/contextful/blob/main/src/search.ts)
- [src/report.ts](https://github.com/Inferensys/contextful/blob/main/src/report.ts)
- [src/util.ts](https://github.com/Inferensys/contextful/blob/main/src/util.ts)
</details>

# Runtime Components

## Overview

The **Runtime Components** in Contextful encompass the services, daemons, and server processes that enable real-time code indexing, search, and context-aware information retrieval. These components operate as the execution layer of the application, providing persistent indexing, live workspace monitoring, and MCP (Model Context Protocol) server capabilities for AI agent integration.

The runtime layer bridges the gap between static code analysis and dynamic query resolution, allowing users and AI agents to query indexed repositories with token-budgeted evidence packs.

---

## Core Runtime Services

### Indexing Daemon

The **Indexing Daemon** provides continuous workspace monitoring and automatic re-indexing when file changes are detected.

#### Architecture

```mermaid
graph TD
    A[File System] -->|fs.watch| B[Debounce Timer]
    B --> C{500ms elapsed?}
    C -->|Yes| D[indexWorkspace]
    D --> E[Kernel DB Update]
    C -->|No| B
    A -->|Initial| F[First Index]
    F --> E
```

#### Key Functions

| Function | Purpose | Location |
|----------|---------|----------|
| `watchWorkspace` | Monitors filesystem changes and triggers re-indexing | `src/indexer.ts:1-15` |
| `indexWorkspace` | Performs full or incremental workspace indexing | `src/indexer.ts` |

#### Implementation Details

The daemon uses Node.js `fs.watch()` with a 500ms debounce timer to batch rapid file changes into single indexing operations. This prevents excessive CPU usage during bulk file operations like git checkouts or build processes.

```typescript
// src/indexer.ts - Watch implementation pattern
fs.watch(resolved, { recursive: true }, () => {
  if (timer) clearTimeout(timer);
  timer = setTimeout(async () => {
    onIndex(await indexWorkspace({ workspace: resolved }));
  }, 500);
});
```

The daemon outputs index results as JSON to stdout, making it suitable for IPC communication with parent processes.

---

### MCP Server (stdio Mode)

The **MCP Server** exposes Contextful's capabilities through the Model Context Protocol standard, enabling integration with AI coding assistants.

#### Supported MCP Tools

| Tool Name | Purpose | Input Parameters |
|-----------|---------|------------------|
| `context_pack` | Returns token-budgeted evidence bundle | `query`, `budget`, `scope` |
| `search_code` | Code, docs, symbol, and memory search | `query`, `mode`, `filters` |
| `trace_path` | Graph traversal across codebase | `from`, `to`, `edge_types` |
| `impact_analysis` | Reverse dependency analysis | `symbol_or_file` |
| `why_changed` | Git history with current evidence | `symbol_or_file` |
| `recall_memory` | Search project lessons and sessions | `query`, `scope` |
| `write_lesson` | Store evidence-backed lessons | `claim`, `evidence`, `scope`, `confidence` |

资料来源：[README.md:1-30](https://github.com/Inferensys/contextful/blob/main/README.md)

#### Server Execution

```bash
# Run as MCP stdio server
npx @inferensys/contextful server
```

The server operates in stdio mode, accepting JSON-RPC requests and responding with JSON-RPC results through stdin/stdout streams.

---

## CLI Runtime Commands

The CLI provides multiple entry points for runtime operations.

### Command Reference

| Command | Description | Key Options |
|---------|-------------|-------------|
| `cxf daemon` | Run local indexing daemon | `--workspace <path>` |
| `cxf query` | Create evidence pack for query | `--workspace`, `--budget`, `--json` |
| `cxf search` | Search without evidence pack | `--workspace`, `--limit`, `--kind` |
| `cxf report` | Generate context report | `--workspace`, `--format` |
| `cxf server` | Run MCP stdio server | - |
| `cxf memory add` | Store evidence-backed lesson | `--claim`, `--evidence`, `--scope`, `--confidence` |

资料来源：[src/cli.ts:1-80](https://github.com/Inferensys/contextful/blob/main/src/cli.ts)

### Daemon Mode

```typescript
// src/cli.ts - Daemon command registration
program
  .command("daemon")
  .description("Run the local indexing daemon for a workspace.")
  .option("--workspace <path>", "Workspace path.", process.cwd())
  .action(async (options: { workspace: string }) => {
    await watchWorkspace(options.workspace, (result) => {
      process.stdout.write(`${JSON.stringify(result, null, 2)}\n`);
    });
  });
```

### Query Command

The query command compiles an evidence pack based on a natural language query and token budget:

```typescript
// src/cli.ts - Query command
program
  .command("query")
  .description("Create an evidence pack for a query.")
  .argument("<query>", "Query to answer from indexed context.")
  .option("--workspace <path>", "Workspace path.", process.cwd())
  .option("--budget <tokens>", "Approximate token budget.", parseInteger, 2000)
  .option("--json", "Print JSON instead of Markdown.")
  .action(async (query: string, options) => {
    const pack = await createContextPack({ workspace: options.workspace, query, budget: options.budget });
    process.stdout.write(options.json ? `${JSON.stringify(pack, null, 2)}\n` : renderEvidencePackMarkdown(pack));
  });
```

---

## Evidence Pack System

### Pack Creation Flow

```mermaid
graph LR
    A[Query Input] --> B[classifyQuery]
    B --> C[searchContext]
    C --> D{Results Available?}
    D -->|Yes| E[Select & Rank Hits]
    D -->|No| F[Expand Search Terms]
    F --> C
    E --> G[Load Symbols & Graph]
    G --> H[Build EvidencePack]
    H --> I[Save to Kernel DB]
    I --> J[Return Pack]
```

### Pack Structure

| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Unique pack identifier with `ctx_` prefix |
| `query` | string | Original query text |
| `intent` | SearchIntent | Classified query intent |
| `summary` | string | Natural language summary |
| `citations` | SearchHit[] | Ranked evidence items |
| `files` | FileInfo[] | Referenced files with reasons |
| `symbols` | SymbolRecord[] | Matched symbol definitions |
| `graphPaths` | GraphPath[] | Module/import relationships |
| `memoryHits` | SearchHit[] | Recallable memory matches |
| `confidence` | number | 0.1-0.92 confidence score |
| `tokenEstimate` | number | Actual token count used |
| `budget` | number | Maximum token budget |
| `createdAt` | string | ISO timestamp |

资料来源：[src/search.ts:150-200](https://github.com/Inferensys/contextful/blob/main/src/search.ts)

### Confidence Calculation

The confidence score is computed using a clamped formula:

```
confidence = clamp(0.25 + hits * 0.05 + graphPaths * 0.02 + memoryHits * 0.05, 0.1, 0.92)
```

This ensures a minimum confidence of 10% even with poor matches and a maximum of 92% to maintain epistemic humility.

资料来源：[src/search.ts:80-82](https://github.com/Inferensys/contextful/blob/main/src/search.ts)

---

## Workspace Resolution

### Path Resolution Flow

```mermaid
graph TD
    A[CLI Input Path] --> B{Is Absolute?}
    B -->|No| C[Resolve relative to cwd]
    B -->|Yes| D[Use as-is]
    C --> E[validateWorkspace]
    D --> E
    E --> F{Valid Directory?}
    F -->|Yes| G[Load Kernel DB]
    F -->|No| H[Create New Index]
```

The `resolveWorkspace()` utility normalizes all workspace paths, while `ensureIndexed()` guarantees the workspace has been indexed before search operations proceed.

资料来源：[src/util.ts:1-20](https://github.com/Inferensys/contextful/blob/main/src/util.ts)

---

## Report Generation

The report system generates comprehensive context reports in multiple formats.

### Supported Formats

| Format | Renderer Function |
|--------|-------------------|
| `markdown` | `renderMarkdown()` |
| `json` | `JSON.stringify()` |
| `html` | `renderHtml()` |

### Report Contents

- **Summary**: Overview of indexed state
- **Statistics**: Token counts, file counts, index timestamps
- **Warnings**: Potential issues (up to 20)
- **Token Savings**: Estimated efficiency metrics

资料来源：[src/report.ts:1-50](https://github.com/Inferensys/contextful/blob/main/src/report.ts)

---

## Error Handling

### Workspace Validation

Runtime components validate workspace paths before operations:

```typescript
// Validation checks include:
// 1. Directory exists and is readable
// 2. Kernel DB can be opened or created
// 3. Index state is consistent
```

### Broken JSON Handling

When parsing `package.json` during indexing, broken JSON is handled gracefully:

```typescript
// src/extract.ts - JSON error handling
try {
  const parsed = JSON.parse(content) as Record<string, unknown>;
  // Process dependencies, devDependencies, scripts
} catch {
  // Broken JSON receives text chunks; syntax diagnostics out of scope
}
```

---

## Memory and Lessons

### Lesson Storage

Lessons are evidence-backed statements stored for recall during future queries:

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `claim` | string | required | The lesson statement |
| `evidence` | string[] | required | File refs (e.g., `file:src/auth.ts:1-20`) |
| `scope` | string | "repo" | Memory scope (repo, global) |
| `confidence` | number | 0.7 | Confidence score (0-1) |

资料来源：[src/cli.ts:60-80](https://github.com/Inferensys/contextful/blob/main/src/cli.ts)

### Memory Recall

Memory hits are weighted in evidence pack generation, providing higher confidence when prior lessons match the query context.

---

## See Also

- [CLI Reference](../cli.md) - Complete CLI command documentation
- [Indexing System](../indexing.md) - Code analysis and symbol extraction
- [Search API](../search-api.md) - Query classification and ranking

---

<a id='search-engine'></a>

## Search Engine

### 相关页面

相关主题：[Context Packs](#context-packs), [SQLite Database Schema](#sqlite-database)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [src/search.ts](https://github.com/Inferensys/contextful/blob/main/src/search.ts)
- [src/util.ts](https://github.com/Inferensys/contextful/blob/main/src/util.ts)
- [src/cli.ts](https://github.com/Inferensys/contextful/blob/main/src/cli.ts)
- [src/mcp-server.ts](https://github.com/Inferensys/contextful/blob/main/src/mcp-server.ts)
- [src/extract.ts](https://github.com/Inferensys/contextful/blob/main/src/extract.ts)
</details>

# Search Engine

## Overview

The Search Engine is the core retrieval system in Contextful, designed to provide intelligent, evidence-backed context for agent queries. It combines full-text search (FTS), symbol indexing, dependency graph traversal, and memory recall to deliver ranked, cited results within a configurable token budget.

The system serves as the foundation for multiple interfaces: CLI commands (`query`, `search`), MCP server tools (`search_code`, `context_pack`), and report generation.

资料来源：[src/search.ts:1-50]()

## Architecture

```mermaid
graph TD
    A[Query Input] --> B[Query Classification]
    B --> C{Intent Type}
    C -->|code/docs| D[Full-Text Search]
    C -->|symbols| E[Symbol Lookup]
    C -->|memory| F[Memory Ledger Search]
    C -->|impact| G[Graph Traversal]
    C -->|historical| H[Git History + Search]
    D --> I[BM25 Ranking]
    E --> J[Symbol Index]
    F --> K[Memory DB]
    G --> L[Edge Database]
    H --> M[Git Operations]
    I --> N[Result Scoring]
    J --> N
    K --> N
    L --> N
    N --> O[Context Pack]
```

### Core Components

| Component | File | Responsibility |
|-----------|------|----------------|
| Search Kernel | `src/search.ts` | Core search logic and ranking |
| Query Classifier | `src/search.ts` | Intent detection |
| FTS Engine | `src/search.ts` | Full-text search using SQLite FTS5 |
| Graph Tracer | `src/search.ts` | Dependency graph traversal |
| Memory Store | `src/memory.ts` | Evidence-backed memory recall |

资料来源：[src/search.ts:50-120]()

## Query Classification

The search engine classifies each query into one of seven intent types to optimize retrieval strategy.

### SearchIntent Types

| Intent | Trigger Keywords | Search Strategy |
|--------|------------------|-----------------|
| `code` | `code`, `function`, `class`, `impl` | FTS + symbol lookup |
| `docs` | `resource`, `docs`, `readme`, `how to` | FTS on markdown/json |
| `symbols` | `define`, `interface`, `type`, `symbol` | Direct symbol index |
| `memory` | `remember`, `lesson`, `learned`, `session` | Memory ledger query |
| `impact` | `impact`, `affected`, `depends`, `blast radius` | Reverse dependency graph |
| `historical` | `why`, `changed`, `commit`, `history` | Git history + current search |
| `architectural` | `architecture`, `flow`, `trace`, `connects` | Graph path analysis |
| `exact` | Code patterns, paths, line refs | Direct file/symbol lookup |
| `vague` | Default | Broad FTS + graph |

```typescript
function classifyQuery(query: string): SearchIntent {
  const q = query.toLowerCase();
  if (/\b(code|function|class|implement|module)\b/.test(q)) return "code";
  if (/\b(define|interface|type|symbol)\b/.test(q)) return "symbols";
  if (/\b(memory|remember|lesson|learned|sessions?)\b/.test(q)) return "memory";
  // ... additional classifications
}
```

资料来源：[src/search.ts:1-30]()

## Search Flow

### Main Search Pipeline

```mermaid
sequenceDiagram
    participant CLI as CLI/MCP
    participant Search as searchContext()
    participant Kernel as Kernel DB
    participant FTS as FTS5 Engine
    participant Graph as Graph DB
    participant Memory as Memory Store

    CLI->>Search: query, workspace, limit
    Search->>Kernel: ensureIndexed()
    Search->>Kernel: addQuery()
    Search->>FTS: ftsQuery(expandedTerms)
    FTS-->>Search: ranked rows (BM25)
    Search->>Search: scoreFromRank()
    Search->>Graph: loadGraphPaths()
    Search-->>CLI: {intent, hits}
```

### Full-Text Search Query Builder

The `ftsQuery` function transforms user queries into FTS5-compatible search strings:

```typescript
function ftsQuery(query: string): string {
  const terms = expandedTerms(query);
  return Array.from(new Set(terms.map((term) => term.toLowerCase())))
    .filter((term) => !STOPWORDS.has(term))
    .slice(0, 14)
    .map((term) => `${term}*`)
    .join(" OR ");
}
```

Key behaviors:
- Expands terms based on query context (e.g., "tool" → "server", "tool", "callTool")
- Filters stopwords: `where`, `what`, `which`, `when`, `how`, `are`, `the`, `for`, `with`, `and`, `or`, `to`
- Limits to 14 terms maximum
- Appends wildcard `*` for prefix matching

资料来源：[src/search.ts:200-280]()

## Scoring System

### Rank-to-Score Transformation

The `scoreFromRank` function converts BM25 ranks into relevance scores (0-10 scale) with domain-specific bonuses:

```typescript
function scoreFromRank(rank: number, query: string, corpus: string): number {
  const base = 10 / (1 + Math.abs(rank));
  let bonus = 0;
  
  // Domain-specific bonuses
  if (/\b(tool|tools|registered|register)\b/.test(q) && corpus.includes("server.tool(")) {
    bonus += 9;
  }
  if (/\bmcp\b/.test(q) && corpus.includes("mcp-server")) {
    bonus += 4;
  }
  
  return clamp(base + bonus, 0.1, 10);
}
```

### Scoring Bonuses Matrix

| Query Pattern | Content Match | Bonus |
|---------------|---------------|-------|
| `tool/tools/register` | `server.tool(` | +9 |
| `mcp` | `mcp-server` | +4 |
| `where registered` | `function runMcpServer` | +4 |
| `tool` query | `src/search.ts` | -8 |
| `memory` query | `src/memory.ts` | +5 |
| `memory` query | `src/search.ts` | -16 |

This anti-gaming mechanism penalizes results from the search implementation itself when irrelevant.

资料来源：[src/search.ts:240-320]()

## Term Expansion

The `expandedTerms` function intelligently expands query terms based on semantic context:

```typescript
function expandedTerms(query: string): string[] {
  const lower = query.toLowerCase();
  const additions: string[] = [];
  
  if (/\b(tool|tools|registered|register)\b/.test(lower)) {
    additions.push("server", "tool", "tools", "callTool");
  }
  if (/\bmcp\b/.test(lower)) {
    additions.push("mcp", "server", "stdio");
  }
  if (/\bmemory|memories|remember|remembers|lesson|lessons|learned|session|sessions\b/.test(lower)) {
    additions.push("memory", "memories", "lesson", "lessons", "claim", "ledger", "evidence");
  }
  if (/\bimpact|depends|dependents|uses\b/.test(lower)) {
    additions.push("imports", "tests", "edges");
  }
  
  return [...terms, ...additions];
}
```

资料来源：[src/search.ts:320-380]()

## CLI Commands

### Query Command

```bash
cxf query "<query>" --workspace <path> --budget <tokens> --json
```

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `query` | string | required | Query to answer from indexed context |
| `--workspace` | path | `cwd()` | Workspace path |
| `--budget` | number | 2000 | Approximate token budget |
| `--json` | flag | false | Output JSON instead of Markdown |

### Search Command

```bash
cxf search "<query>" --workspace <path> --limit <count> --kind <kind>
```

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `query` | string | required | Search query |
| `--workspace` | path | `cwd()` | Workspace path |
| `--limit` | number | 10 | Maximum hits |
| `--kind` | enum | `all` | Search category: `all\|code\|docs\|symbols\|memory` |

资料来源：[src/cli.ts:40-80]()

## MCP Server Tools

The search engine exposes the following MCP tools:

### search_code

```typescript
server.tool("search_code", "Search indexed code, docs, symbols, and stored context", {
  query: z.string(),
  mode: z.enum(["all", "code", "docs", "symbols", "memory"]).optional(),
  limit: z.number().optional(),
  filters: z.record(z.string(), z.unknown()).optional()
});
```

### trace_path

```typescript
server.tool("trace_path", "Trace graph relationships between files, symbols, modules", {
  from: z.string(),
  to: z.string().optional(),
  edge_types: z.array(z.string()).optional(),
  limit: z.number().optional()
});
```

### impact_analysis

```typescript
server.tool("impact_analysis", "Find likely dependents and tests", {
  symbol_or_file: z.string(),
  limit: z.number().optional()
});
```

### why_changed

```typescript
server.tool("why_changed", "Explain why a file/symbol may have changed", {
  symbol_or_file: z.string(),
  limit: z.number().optional()
});
```

资料来源：[src/mcp-server.ts:1-80]()

## Context Pack

The `createContextPack` function assembles comprehensive evidence bundles:

```typescript
export async function createContextPack(options: {
  workspace?: string;
  query: string;
  budget?: number;
  scope?: string;
}): Promise<EvidencePack>
```

### EvidencePack Structure

| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Unique pack identifier (`ctx_<hash>`) |
| `query` | string | Original query |
| `scope` | string | Search scope (default: `repo`) |
| `intent` | SearchIntent | Classified intent |
| `summary` | string | Human-readable summary |
| `citations` | SearchHit[] | Ranked search results |
| `files` | FileContext[] | Grouped file references |
| `symbols` | SymbolRecord[] | Relevant symbols (≤20) |
| `graphPaths` | GraphPath[] | Dependency connections (≤20) |
| `memoryHits` | SearchHit[] | Memory matches |
| `confidence` | number | Confidence score (0.1-0.92) |
| `tokenEstimate` | number | Estimated token count |
| `budget` | number | Token budget used |
| `createdAt` | string | ISO timestamp |

### Confidence Calculation

```typescript
function confidenceFor(hits: SearchHit[], graphPaths: GraphPath[], memoryHits: SearchHit[]): number {
  return clamp(
    0.25 + 
    hits.length * 0.05 + 
    graphPaths.length * 0.02 + 
    memoryHits.length * 0.05,
    0.1,
    0.92
  );
}
```

资料来源：[src/search.ts:400-480]()

## Graph Traversal

The `traceGraph` function performs dependency graph analysis:

```typescript
export async function traceGraph(options: {
  workspace?: string;
  from: string;
  to?: string;
  edgeTypes?: string[];
  limit?: number;
}): Promise<GraphPath[]>
```

### Edge Types

| Edge Type | Direction | Description |
|-----------|-----------|-------------|
| `IMPORTS` | File → Module | Import/require statements |
| `DEFINES` | File → Symbol | Symbol definitions |
| `CONFIGURES` | File → Config | Configuration keys |
| `TESTS` | Test → Source | Test file relationships |

### Impact Analysis

```typescript
export async function impactAnalysis(options: {
  workspace?: string;
  target: string;
  limit?: number;
}): Promise<{
  target: string;
  forward: string[];
  reverse: string[];
  tests: string[];
}>
```

Returns forward dependencies, reverse dependents, and likely test files for a given symbol or file.

资料来源：[src/search.ts:480-550]()

## Utility Functions

### lineRange

Extracts a specific line range from text:

```typescript
export function lineRange(text: string, startLine: number, endLine: number): string {
  const lines = text.split(/\r?\n/);
  return lines.slice(Math.max(0, startLine - 1), Math.min(lines.length, endLine)).join("\n");
}
```

### clamp

Constrains values within bounds:

```typescript
export function clamp(value: number, min: number, max: number): number {
  return Math.max(min, Math.min(max, value));
}
```

### unique

Deduplicates arrays:

```typescript
export function unique<T>(items: T[]): T[] {
  return Array.from(new Set(items));
}
```

### isLikelyBinary

Detects binary files by checking for null bytes:

```typescript
export function isLikelyBinary(buffer: Buffer): boolean {
  const sample = buffer.subarray(0, Math.min(buffer.length, 4096));
  return sample.includes(0);
}
```

资料来源：[src/util.ts:1-50]()

## Data Models

### SearchHit

```typescript
interface SearchHit {
  ref: string;        // Format: "file:path:start-end"
  path: string;       // File path
  kind: string;       // "chunk", "symbol", "memory", "doc"
  title: string;      // Display title
  text: string;       // Content snippet
  score: number;      // Relevance score
  line?: number;      // Starting line number
}
```

### SymbolRecord

```typescript
interface SymbolRecord {
  ref: string;
  name: string;
  kind: string;       // "function", "class", "interface", "type", etc.
  filePath: string;
  line: number;
  signature?: string;
  exported?: boolean;
}
```

资料来源：[src/search.ts:100-150]()

## Index Status

The `getIndexStatus` function returns workspace indexing metadata:

```typescript
export async function getIndexStatus(options: { workspace?: string }): Promise<IndexStatus>
```

### IndexStatus Structure

| Field | Type | Description |
|-------|------|-------------|
| `workspace` | string | Workspace path |
| `languageCounts` | Record<string, number> | File count per language |
| `warnings` | string[] | Index warnings |
| `lastIndexed` | string | ISO timestamp of last index |
| `totalChunks` | number | Total indexed chunks |

资料来源：[src/search.ts:550-600]()

## Summary

The Search Engine provides Contextful's intelligent retrieval capabilities through:

1. **Intent Classification** - Automatically routes queries to optimal search strategies
2. **Full-Text Search** - SQLite FTS5 with BM25 ranking and domain-specific scoring
3. **Symbol Index** - Fast lookup of code definitions across languages
4. **Graph Traversal** - Dependency analysis and impact tracking
5. **Memory Integration** - Recall of past lessons and evidence-backed claims
6. **Token Budgeting** - Constrains output to specified budget limits
7. **Confidence Scoring** - Quantifies result reliability

All search operations flow through a unified kernel database that combines FTS chunks, symbol records, and edge relationships for comprehensive context retrieval.

---

<a id='context-packs'></a>

## Context Packs

### 相关页面

相关主题：[Search Engine](#search-engine), [Memory Ledger](#memory-ledger)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [src/search.ts](https://github.com/Inferensys/contextful/blob/main/src/search.ts)
- [src/types.ts](https://github.com/Inferensys/contextful/blob/main/src/types.ts)
- [src/extract.ts](https://github.com/Inferensys/contextful/blob/main/src/extract.ts)
- [src/cli.ts](https://github.com/Inferensys/contextful/blob/main/src/cli.ts)
- [src/report.ts](https://github.com/Inferensys/contextful/blob/main/src/report.ts)
</details>

# Context Packs

Context Packs are the core output format of Contextful, providing AI agents with compact, ranked, and cited evidence bundles that fit within a specified token budget. Instead of forcing agents to read dozens of arbitrary files, Context Packs deliver precisely the evidence needed to answer a specific query.

## Overview

A Context Pack is a structured evidence package generated by the `context_pack()` MCP tool or the `cxf query` CLI command. It contains:

- Ranked code and documentation citations matching the query
- Related symbols (functions, classes, interfaces) from matching files
- Graph paths connecting related components
- Memory hits from evidence-backed lessons
- A confidence score and token budget accounting

The pack is designed to be consumed directly by an LLM agent, providing traceable citations and a clear summary of what evidence was found.

## Data Model

### EvidencePack Structure

| Field | Type | Description |
|-------|------|-------------|
| `id` | `string` | Unique identifier (format: `ctx_<hash>`) |
| `query` | `string` | The original search query |
| `scope` | `string` | Search scope (e.g., "repo") |
| `intent` | `SearchIntent` | Classified query intent |
| `summary` | `string` | Human-readable summary of findings |
| `citations` | `SearchHit[]` | Ranked evidence items |
| `files` | `FileContext[]` | Grouped file references with reasons |
| `symbols` | `SymbolRecord[]` | Relevant symbols from matched files |
| `graphPaths` | `GraphPath[]` | Graph traversals between components |
| `memoryHits` | `SearchHit[]` | Memory/lesson hits |
| `confidence` | `number` | Estimated confidence (0.1-0.92) |
| `tokenEstimate` | `number` | Estimated token count of pack |
| `budget` | `number` | Requested token budget |
| `createdAt` | `string` | ISO timestamp of creation |

资料来源：[src/search.ts:search.ts]()

### SearchHit Structure

| Field | Type | Description |
|-------|------|-------------|
| `ref` | `string` | Reference identifier (e.g., `file:src/auth.ts:1-20`) |
| `path` | `string` | File path |
| `title` | `string` | Display title |
| `kind` | `string` | Hit kind: code, doc, symbol, memory |
| `excerpt` | `string` | Relevant text excerpt |
| `score` | `number` | Relevance score |
| `rank` | `number` | BM25 rank |

### SearchIntent Enum

| Intent | Trigger Keywords |
|--------|-----------------|
| `exact` | Code patterns, paths, symbol names with special chars |
| `symbol` | Function names, class names, method calls |
| `test` | test, spec, mock, fixture, unit |
| `memory` | memory, lesson, learned, session |
| `impact` | impact, affected, depends, blast radius |
| `historical` | why, changed, commit, history, regression |
| `architectural` | architecture, flow, trace, connects, imports |
| `docs` | resource, docs, documentation, guide, readme |
| `vague` | Default for generic queries |

资料来源：[src/search.ts:search.ts]()

## Creation Flow

The `createContextPack` function orchestrates the entire pack creation process:

```mermaid
graph TD
    A[createContextPack] --> B[searchContext]
    B --> C[classifyQuery]
    C --> D[ftsQuery + expandedTerms]
    D --> E[FTS Search on chunks_fts]
    E --> F[scoreFromRank]
    F --> G[Select Hits within Budget]
    G --> H[loadSymbolsForPaths]
    G --> I[loadGraphPaths]
    G --> J[Filter memoryHits]
    H --> K[Build EvidencePack]
    I --> K
    J --> K
    K --> L[saveEvidencePack]
    L --> M[Return EvidencePack]
```

### Step 1: Search Context

The process begins by classifying the query intent and executing full-text search:

```typescript
const search = await searchContext({ workspace, query, limit: budget * 2 });
const selected = selectWithinBudget(search.hits, budget);
```

资料来源：[src/search.ts:search.ts]()

### Step 2: Budget-Aware Selection

Hits are selected greedily until the token estimate exceeds the budget:

```typescript
function selectWithinBudget(hits: SearchHit[], budget: number): SearchHit[] {
  const selected: SearchHit[] = [];
  let tokenEstimate = 0;
  for (const hit of hits) {
    const est = estimateTokens(hit.excerpt || hit.title);
    if (tokenEstimate + est >= budget) break;
    selected.push(hit);
    tokenEstimate += est;
  }
  return selected;
}
```

资料来源：[src/search.ts:search.ts]()

### Step 3: Symbol Loading

For each selected file, related symbols are loaded (up to 20 total):

```typescript
const symbols = loadSymbolsForPaths(kernel.db, paths).slice(0, 20);
```

The symbols query joins against the `symbols` table:

```typescript
SELECT ref, name, kind, file_path, line, signature, exported 
FROM symbols 
WHERE file_path IN (...)
```

资料来源：[src/search.ts:search.ts]()

### Step 4: Graph Path Loading

Graph paths connect files through import/dependency relationships:

```typescript
const graphPaths = loadGraphPaths(kernel.db, paths, 20);
```

资料来源：[src/search.ts:search.ts]()

### Step 5: Memory Hit Extraction

Memory hits are filtered from selected hits by kind:

```typescript
const memoryHits = selected.filter((hit) => hit.kind === "memory");
```

### Step 6: Confidence Calculation

Confidence is calculated using a clamped formula:

```typescript
function confidenceFor(hits, graphPaths, memoryHits): number {
  return clamp(
    0.25 + hits.length * 0.05 + graphPaths.length * 0.02 + memoryHits.length * 0.05,
    0.1,
    0.92
  );
}
```

- Base: 0.25
- Each hit: +0.05
- Each graph path: +0.02
- Each memory hit: +0.05
- Clamped to [0.1, 0.92]

资料来源：[src/search.ts:search.ts]()

## Query Classification

The `classifyQuery` function determines the search intent based on keywords:

```typescript
function classifyQuery(q: string): SearchIntent {
  const lower = q.toLowerCase();
  if (/[`"'#.:/]/.test(q) || /\b[A-Z][A-Za-z0-9_]{2,}\b/.test(q)) return "exact";
  if (/\b(test|spec|mock|fixture)\b/.test(q)) return "test";
  if (/\b(memory|lesson|learned|session|sessions)\b/.test(q)) return "memory";
  if (/\b(impact|affected|depends|dependents|blast radius)\b/.test(q)) return "impact";
  if (/\b(why|changed|commit|history|regression|introduced)\b/.test(q)) return "historical";
  if (/\b(architecture|flow|path|trace|connects|calls|imports)\b/.test(q)) return "architectural";
  if (/\b(resource|docs|documentation|guide|readme|how to|setup)\b/.test(q)) return "docs";
  return "vague";
}
```

资料来源：[src/search.ts:search.ts]()

## Term Expansion

The `expandedTerms` function adds related terms to improve recall for specific domains:

```typescript
function expandedTerms(query: string): string[] {
  const additions: string[] = [];
  if (/\b(tool|tools|registered|register)\b/.test(lower)) {
    additions.push("server", "tool", "tools", "callTool");
  }
  if (/\bmcp\b/.test(lower)) {
    additions.push("mcp", "server", "stdio");
  }
  if (/\bmemory|memories|remember|remembers|lesson|lessons\b/.test(lower)) {
    additions.push("memory", "memories", "lesson", "lessons", "claim", "ledger", "evidence");
  }
  if (/\bimpact|depends|dependents|uses\b/.test(lower)) {
    additions.push("imports", "tests", "edges");
  }
  return [...terms, ...additions];
}
```

资料来源：[src/search.ts:search.ts]()

## Scoring Algorithm

The `scoreFromRank` function calculates relevance scores:

```typescript
function scoreFromRank(rank: number, q: string): number {
  let bonus = 0;
  const lower = q.toLowerCase();
  
  if (/\bmemory|memories|remember|remembers|lesson|lessons|sessions\b/.test(q)) {
    if (lower.includes("memory ledger")) bonus += 7;
    if (lower.includes("src/memory.ts")) bonus += 5;
    if (lower.includes("readme.md")) bonus += 4;
    if (lower.includes("src/search.ts")) bonus -= 16;
  }
  if (/\b(where|how)\b/.test(q) && lower.includes("config-key")) bonus -= 2;
  
  return 10 / (1 + Math.abs(rank)) + bonus;
}
```

资料来源：[src/search.ts:search.ts]()

## CLI Usage

The `query` command creates Context Packs via CLI:

```bash
cxf query "<query>" --workspace <path> --budget 2000 --json
```

### Options

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `--workspace` | `path` | `cwd` | Workspace path |
| `--budget` | `number` | `2000` | Approximate token budget |
| `--json` | `flag` | `false` | Output as JSON instead of Markdown |

### Example Output

```
# Context Pack ctx_abc123

Query: where is user auth handled
Intent: architectural
Confidence: 65%
Token estimate: 1850/2000

Found 5 evidence items for a architectural query, with 2 graph connections and 1 memory hit.

## Citations
- file:src/auth.ts:1-50 (auth module)
  Handles user authentication via JWT tokens...
- file:src/middleware/auth.ts:1-30 (auth middleware)
  Express middleware for auth validation...

## Graph Paths
- src/auth.ts --IMPORTS--> src/utils/jwt.ts (src/auth.ts:5)
- src/middleware/auth.ts --IMPORTS--> src/auth.ts (src/middleware/auth.ts:3)

## Memory Hits
- memory:lesson:1: JWT tokens should be validated on every protected route.
```

资料来源：[src/cli.ts:cli.ts]()

## Rendering

Context Packs can be rendered in multiple formats via `renderEvidencePackMarkdown`:

```typescript
export function renderEvidencePackMarkdown(pack: EvidencePack): string {
  const lines = [
    `# Context Pack ${pack.id}`,
    "",
    `Query: ${pack.query}`,
    `Intent: ${pack.intent}`,
    `Confidence: ${Math.round(pack.confidence * 100)}%`,
    `Token estimate: ${pack.tokenEstimate}/${pack.budget}`,
    "",
    pack.summary,
    "",
    "## Citations"
  ];
  // ... citations, graph paths, memory hits
}
```

资料来源：[src/report.ts:report.ts]()

## Chunk Extraction

Contextual chunks are extracted during indexing for searchability:

```mermaid
graph LR
    A[Source File] --> B[Language Detection]
    B --> C[extractSymbols]
    B --> D[extractEdges]
    B --> E[extractChunks]
    C --> F[Symbol Table]
    D --> G[Edge Table]
    E --> H[Chunk Table]
```

### Supported Languages

| Language | Symbol Patterns |
|----------|-----------------|
| TypeScript/JavaScript | function, class, interface, type, const arrow |
| Python | def, class |
| Go | func, type struct/interface |
| Rust | fn, struct, enum, trait, impl |
| Markdown | headings (H1-H6) |
| JSON | top-level keys |

资料来源：[src/extract.ts:extract.ts]()

### Chunking Strategy

- **Code files**: Divided into blocks of ~60 lines, with overlap for context
- **Markdown files**: Split by headings, with the heading as the chunk title
- **Token estimation**: Used for both selection and budget accounting

```typescript
function codeChunks(relativePath: string, content: string): ChunkRecord[] {
  const lines = content.split(/\r?\n/);
  const chunks: ChunkRecord[] = [];
  // Split into ~60-line blocks with overlap
  for (let start = 1; start <= lines.length; start += 50) {
    const end = Math.min(start + 60 - 1, lines.length);
    const text = lineRange(content, start, end);
    chunks.push({
      ref: fileRef(relativePath, start, end),
      filePath: relativePath,
      startLine: start,
      endLine: end,
      kind: "file",
      title: `${relativePath}:${start}-${end}`,
      text,
      tokenEstimate: estimateTokens(text)
    });
  }
  return chunks;
}
```

资料来源：[src/extract.ts:extract.ts]()

## Summary Generation

The `summarizePack` function generates human-readable summaries:

```typescript
function summarizePack(
  query: string,
  intent: SearchIntent,
  hits: SearchHit[],
  graphPaths: GraphPath[],
  memoryHits: SearchHit[]
): string {
  if (hits.length === 0) {
    return `No indexed evidence matched "${query}". Re-index or broaden the query.`;
  }
  return `Found ${hits.length} evidence item${hits.length === 1 ? "" : "s"} ` +
    `for a ${intent} query, with ${graphPaths.length} graph connection${graphPaths.length === 1 ? "" : "s"} ` +
    `and ${memoryHits.length} memory hit${memoryHits.length === 1 ? "" : "s"}.`;
}
```

资料来源：[src/search.ts:search.ts]()

## Persistence

Evidence packs are saved to the kernel database for audit and retrieval:

```typescript
saveEvidencePack(kernel.db, { 
  id: pack.id, 
  query: pack.query, 
  tokenEstimate, 
  json: JSON.stringify(pack) 
});
```

资料来源：[src/search.ts:search.ts]()

## Design Principles

1. **Token budget awareness**: Never exceed the requested budget; select the most relevant items first
2. **Cited evidence**: Every piece of information is traceable to a specific file and line range
3. **Intent-driven**: Query classification shapes what gets searched and how results are interpreted
4. **Graph connectivity**: Beyond matching files, show how they connect through imports and dependencies
5. **Memory integration**: Blend indexed content with evidence-backed lessons from prior sessions

---

<a id='memory-ledger'></a>

## Memory Ledger

### 相关页面

相关主题：[Context Packs](#context-packs), [Search Engine](#search-engine)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [src/cli.ts](https://github.com/Inferensys/contextful/blob/main/src/cli.ts)
- [src/search.ts](https://github.com/Inferensys/contextful/blob/main/src/search.ts)
- [src/mcp-server.ts](https://github.com/Inferensys/contextful/blob/main/src/mcp-server.ts)
- [src/extract.ts](https://github.com/Inferensys/contextful/blob/main/src/extract.ts)
- [src/report.ts](https://github.com/Inferensys/contextful/blob/main/src/report.ts)
- [src/util.ts](https://github.com/Inferensys/contextful/blob/main/src/util.ts)
- [README.md](https://github.com/Inferensys/contextful/blob/main/README.md)
</details>

# Memory Ledger

The Memory Ledger is Contextful's evidence-backed persistent memory system that enables AI agents to retain and recall learned lessons across sessions. Unlike ephemeral context that disappears when a session ends, the Memory Ledger stores structured knowledge annotated with source evidence, allowing agents to build cumulative understanding of a codebase over time.

## Overview

The Memory Ledger solves a fundamental problem in AI-assisted development: knowledge gained during one session is lost in the next. When an agent discovers how authentication works, identifies a fragile dependency, or learns a non-obvious architectural pattern, that knowledge typically vanishes when the session ends.

Contextful's approach requires every stored memory to be anchored to concrete evidence—file references, code symbols, or prior context packs. This design prevents hallucinated or unsubstantiated memories from polluting the knowledge base and ensures that recalled lessons can be traced back to their source.

The system operates entirely locally with no external API calls, embedding services, or cloud dependencies. All memory data remains within the workspace's SQLite database.

## Architecture

```mermaid
graph TD
    A[Agent Session] -->|write_lesson| B[Memory Ledger]
    A -->|recall_memory| C[Memory Search]
    B -->|evidence refs| D[Evidence Pack]
    C -->|cited memories| A
    D -->|citations| E[Source Files]
    F[Workspace DB] -->|stores| B
    F -->|stores| C
```

### Core Components

| Component | Role | Source |
|-----------|------|--------|
| Memory Storage | SQLite-backed persistent storage for lessons | `src/db.ts` |
| Memory Search | FTS-enabled retrieval of memories by query | `src/search.ts` |
| Evidence Validation | Ensures evidence refs are valid before storage | `src/mcp-server.ts` |
| Confidence Scoring | Assigns credibility scores to stored memories | `src/cli.ts:85` |

## Data Model

### Memory Record Structure

Each memory in the ledger contains the following fields:

| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Unique identifier (prefixed with `memory:`) |
| `claim` | string | The substantive lesson or observation |
| `scope` | string | Granularity level: `repo`, `file`, `symbol`, or `session` |
| `evidenceRefs` | string[] | Validated references to source evidence |
| `confidence` | number | Credibility score from 0.0 to 1.0 |
| `status` | string | Current state: `active`, `superseded`, or `stale` |
| `supersedes` | string? | ID of the memory this replaces (if any) |

### Evidence Reference Formats

Valid evidence references that can be attached to memories:

| Format | Example | Purpose |
|--------|---------|---------|
| File range | `file:src/auth.ts:10-40` | Reference specific lines in a file |
| Symbol | `symbol:src/auth.ts#AuthService:12` | Point to a specific code symbol |
| Context pack | `pack:ctx_abc123` | Reference a prior evidence pack |

资料来源：[README.md:54-56]()

Evidence references must come from search results or context packs—arbitrary references are rejected. This prevents storing claims without verifiable backing.

## Memory Scopes

The scope field determines the durability and applicability of a memory:

| Scope | Description | Persistence |
|-------|-------------|-------------|
| `repo` | Project-wide lessons applicable across sessions | Permanent |
| `file` | File-specific knowledge | Permanent |
| `symbol` | Symbol-level lessons | Permanent |
| `session` | Ephemeral session-scoped learnings | Lost on session end |

The default scope is `repo`, reflecting the assumption that most valuable memories have project-wide relevance.

资料来源：[src/cli.ts:73]()

## Writing Memories

### CLI Usage

```bash
cxf memory add \
  --claim "AuthService.validateToken() throws on expired tokens without catching" \
  --evidence "file:src/auth.ts:45-67" \
  --evidence "file:src/api/middleware.ts:12-20" \
  --confidence 0.85 \
  --scope repo
```

### MCP Tool Usage

```typescript
await server.callTool("write_lesson", {
  claim: "The payment module requires initialization before use",
  evidence_refs: ["file:src/payment/core.ts:10-30", "symbol:src/payment/core.ts#initialize:15"],
  scope: "repo",
  confidence: 0.9
});
```

资料来源：[src/mcp-server.ts:79-94]()

### Validation Rules

Memories are subject to strict validation:

1. **Evidence required**: At least one valid evidence reference must be provided
2. **Evidence must be fresh**: References must originate from search results or context packs
3. **Claim must be substantive**: Empty or trivial claims are rejected
4. **Confidence in valid range**: Must be between 0.0 and 1.0

## Searching Memories

### Intent Classification

Contextful automatically classifies queries to determine when to search memories. The query classifier recognizes memory-related intents through keyword detection:

```typescript
const memoryPattern = /\bmemory|memories|remember|remembers|lesson|lessons|learned|session|sessions\b/;
```

When matched, the classifier returns `intent: "memory"` and the search system automatically queries the memories FTS index.

资料来源：[src/search.ts:14-17]()

### Query Expansion

Memory searches benefit from automatic term expansion. When a query mentions relevant concepts, additional search terms are added:

```typescript
if (/\bmemory|memories|remember|remembers|lesson|lessons|learned|session|sessions\b/.test(lower)) {
  additions.push("memory", "memories", "lesson", "lessons", "claim", "ledger", "evidence");
}
```

This ensures that queries like "what did we learn about auth" retrieve memory results even if those exact words don't appear in the stored claims.

资料来源：[src/search.ts:28-30]()

### Search Results

Memory hits in search results include:

| Field | Description |
|-------|-------------|
| `ref` | Memory reference in format `memory:<id>` |
| `kind` | Always `"memory"` for memory hits |
| `title` | Display title including scope |
| `excerpt` | Redacted claim text (secrets removed) |
| `evidence` | Original evidence references |
| `status` | Current memory status |
| `score` | Relevance score |

## Memory Lifecycle

```mermaid
stateDiagram-v2
    [*] --> Active: write_lesson
    Active --> Superseded: write_lesson with supersedes
    Active --> Stale: Evidence becomes invalid
    Superseded --> [*]
    Stale --> [*]
    Active --> [*]: Deleted
```

### Status Transitions

**Active** → Default state for newly written memories. Active memories are returned in search results and can supersede other memories.

**Superseded** → When a newer, more accurate memory replaces an older one, the superseded memory retains its ID and evidence but is excluded from search results. The `supersedes` field links to the replaced memory.

**Stale** → Memories become stale when their evidence references point to files or symbols that have changed significantly since the memory was written. The reporting system tracks stale memories for review.

资料来源：[src/report.ts:54-58]()

## Integration with Context Packs

The Memory Ledger integrates with Contextful's evidence pack system:

1. **Before writing**: Search context or create a context pack to get evidence references
2. **Writing lessons**: Use those evidence refs to anchor the memory claim
3. **Recalling**: Later sessions query the ledger, retrieving cited memories

```typescript
// During a session: create pack, identify lessons
const pack = await createContextPack({ query: "how is auth handled", budget: 2000 });

// Later session: recall what was learned
const result = await recallMemory({ query: "auth patterns", scope: "repo" });
```

This bidirectional relationship means memories enhance future context packs, and context packs provide evidence for future memories.

## Reporting

The `report` command includes memory statistics:

```bash
cxf report --workspace . --format markdown
```

Output includes a "Stale Memories" section listing memories whose evidence references may no longer be valid:

```
## Stale Memories
- memory_abc123: AuthService.validateToken() behavior changed in v2
- memory_def456: payment module initialization order is now reversed
```

资料来源：[src/report.ts:54-58]()

## Configuration Options

| Option | CLI Flag | Default | Description |
|--------|----------|---------|-------------|
| Workspace | `--workspace` | `process.cwd()` | Path to workspace with memory database |
| Claim | `--claim` | required | The memory content |
| Evidence | `--evidence` | required | One or more evidence refs |
| Scope | `--scope` | `repo` | Memory scope level |
| Confidence | `--confidence` | `0.7` | Credibility score |

## Privacy Considerations

The Memory Ledger is designed with privacy as a core principle:

- **Local only**: No data leaves the workspace
- **No cloud sync**: Memories remain on the local machine
- **Evidence-linked**: Claims cannot be stored without verifiable source
- **Content redaction**: Secrets are automatically redacted from stored claims using pattern matching for emails, API keys, and tokens

资料来源：[src/util.ts:12-18]()

## Related MCP Tools

| Tool | Purpose |
|------|---------|
| `recall_memory` | Search the memory ledger |
| `write_lesson` | Store a new evidence-backed memory |
| `context_pack` | Generate evidence packs that can feed into memories |

资料来源：[README.md:35-40]()

---

<a id='graph-traversal'></a>

## Graph Traversal and Analysis

### 相关页面

相关主题：[Search Engine](#search-engine), [SQLite Database Schema](#sqlite-database)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [src/search.ts](https://github.com/Inferensys/contextful/blob/main/src/search.ts)
- [src/types.ts](https://github.com/Inferensys/contextful/blob/main/src/types.ts)
- [src/extract.ts](https://github.com/Inferensys/contextful/blob/main/src/extract.ts)
- [src/cli.ts](https://github.com/Inferensys/contextful/blob/main/src/cli.ts)
- [src/mcp-server.ts](https://github.com/Inferensys/contextful/blob/main/src/mcp-server.ts)
</details>

# Graph Traversal and Analysis

Graph Traversal and Analysis is a core feature of Contextful that builds and queries a dependency graph from source code. This system tracks relationships between files, symbols, modules, and configuration nodes, enabling sophisticated impact analysis, change tracing, and dependency exploration.

## Overview

Contextful extracts code relationships during indexing and stores them in a SQLite database as a traversable graph. This enables agents to answer questions like:

- "What depends on this module?"
- "What tests cover this file?"
- "How does this symbol connect to other parts of the codebase?"

资料来源：[src/extract.ts:68-95]()

## Architecture

```mermaid
graph TD
    A[Source Files] --> B[extractEdges]
    B --> C[GraphEdge Records]
    C --> D[SQLite Kernel DB]
    E[CLI/MCP Query] --> F[searchContext]
    F --> G[traceGraph]
    G --> H[GraphPath Results]
    F --> I[impactAnalysis]
    I --> J[Impact Results]
    F --> K[whyChanged]
    K --> L[Git History + Evidence]
```

### Data Flow

1. **Extraction Phase**: During workspace indexing, `extractEdges()` parses source files to identify relationships 资料来源：[src/extract.ts:52-95]()
2. **Storage Phase**: Edge data is stored in the `edges` table within the kernel SQLite database 资料来源：[src/search.ts:1-30]()
3. **Query Phase**: CLI commands and MCP tools query the graph using traversal algorithms 资料来源：[src/search.ts:180-220]()

## Graph Data Model

### Core Types

```typescript
interface GraphEdge {
  sourceType: "file" | "symbol";
  sourceName: string;
  targetType: "file" | "symbol" | "module" | "config";
  targetName: string;
  edgeType: EdgeType;
  filePath: string;
  line: number;
}

interface GraphPath {
  edges: Array<{
    sourceName: string;
    sourceType: string;
    edgeType: string;
    targetName: string;
    targetType: string;
  }>;
  totalHops: number;
}

interface GraphNode {
  name: string;
  type: "file" | "symbol" | "module" | "config";
  path?: string;
  kind?: string;
}
```

资料来源：[src/types.ts:45-70]()

### Edge Types

| Edge Type | Description | Source Detection |
|-----------|-------------|------------------|
| `DEFINES` | File defines a symbol | Function/class declarations |
| `IMPORTS` | File imports a module | `import`, `require`, `from` statements |
| `CONFIGURES` | File/config references a key | JSON keys, package.json fields |
| `TESTS` | Test file tests imports | Auto-generated for test files |

资料来源：[src/extract.ts:75-100]()

### Language-Specific Detection

The extraction layer supports multiple languages:

| Language | Import Patterns | Symbol Patterns |
|----------|-----------------|-----------------|
| TypeScript/JavaScript | `from "module"`, `require("module")` | `export function/class/interface` |
| Python | `from module import` | `def`, `class` |
| Go | `"package"` | `func`, `type struct/interface` |
| Rust | `use module;`, `mod name;` | `fn`, `struct`, `enum`, `trait` |

资料来源：[src/extract.ts:70-95]()

## Graph Traversal API

### traceGraph

Performs graph traversal starting from a source node, optionally filtering by edge types and limiting results.

```typescript
export async function traceGraph(options: {
  workspace?: string;
  from: string;
  to?: string;
  edgeTypes?: string[];
  limit?: number;
}): Promise<GraphPath[]>
```

#### Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `workspace` | `string` | No | Workspace path (defaults to CWD) |
| `from` | `string` | Yes | Starting node name |
| `to` | `string` | No | Target node for path finding |
| `edgeTypes` | `string[]` | No | Filter by specific edge types |
| `limit` | `number` | No | Maximum paths to return (default: 10) |

资料来源：[src/search.ts:180-190]()

### loadGraphPaths

Loads graph paths from the database for a set of file paths.

```typescript
function loadGraphPaths(
  db: Database,
  paths: string[],
  limit: number
): GraphPath[]
```

资料来源：[src/search.ts:60-80]()

## Impact Analysis

Impact analysis identifies reverse dependencies—what depends on a given file or symbol—and finds relevant test coverage.

```mermaid
graph LR
    A[Target File/Symbol] --> B[Find All Edges Pointing TO Target]
    B --> C[Group by Source File]
    C --> D[Identify Test Files]
    D --> E[Return Impact Set]
```

### impactAnalysis Function

```typescript
export async function impactAnalysis(options: {
  workspace?: string;
  target: string;
  limit?: number;
}): Promise<ImpactResult>
```

#### Impact Result Structure

| Field | Type | Description |
|-------|------|-------------|
| `target` | `string` | The analyzed symbol or file |
| `dependents` | `DependentInfo[]` | Files/symbols that depend on target |
| `tests` | `SearchHit[]` | Related test files |

```typescript
interface DependentInfo {
  path: string;
  type: string;
  imports: string[];
}

interface ImpactResult {
  target: string;
  dependents: DependentInfo[];
  tests: SearchHit[];
}
```

资料来源：[src/search.ts:130-175]()

### Test Detection Logic

Test files are identified by path patterns and edges with `TESTS` type:

```typescript
const testPaths = paths.filter(
  (path) => path.edgeType === "TESTS" || 
            /(^|\/)(tests?|__tests__)\/|(\.|-)(test|spec)\./.test(path.filePath)
);
```

资料来源：[src/search.ts:165-170]()

## Change Analysis

### whyChanged

Combines current code evidence with git history to explain why a file or symbol may have changed.

```typescript
export async function whyChanged(options: {
  workspace?: string;
  target: string;
  limit?: number
}): Promise<{
  target: string;
  currentEvidence: SearchHit[];
  commits: Array<{
    hash: string;
    subject: string;
    date?: string;
    files: string[];
  }>;
}>
```

#### Workflow

```mermaid
graph TD
    A[whyChanged] --> B[searchContext for target]
    B --> C[Extract file paths from hits]
    C --> D[readGitHistory with file paths]
    D --> E[Combine evidence + commits]
    E --> F[Return structured result]
```

资料来源：[src/search.ts:200-230]()

### Git History Integration

The system reads git history for affected files:

```typescript
function readGitHistory(
  workspace: string,
  filePaths: string[],
  limit: number
): Array<{
  hash: string;
  subject: string;
  date?: string;
  files: string[];
}>
```

资料来源：[src/search.ts:85-100]()

## CLI Commands

### trace Command

```bash
cxf trace --from <symbol_or_file> [--to <target>] [--edge-types <types>] [--limit <count>]
```

#### Options

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `--from` | `string` | Required | Starting node |
| `--to` | `string` | - | Target node |
| `--edge-types` | `string` | all | Comma-separated edge types |
| `--limit` | `number` | 10 | Maximum paths |
| `--workspace` | `string` | CWD | Workspace path |

资料来源：[src/cli.ts:45-60]()

### report Command

Generates a comprehensive context report including graph statistics:

```bash
cxf report --workspace <path> --format markdown|json|html
```

#### Report Includes

- Index status with graph node/edge counts
- Top queries by intent type
- Stale memory detection
- Recent evidence packs

资料来源：[src/cli.ts:70-85]()

## MCP Server Tools

Contextful exposes graph traversal as MCP tools for integration with AI coding assistants.

### trace_path

```json
{
  "name": "trace_path",
  "description": "Trace graph relationships between files, symbols, modules, and config nodes.",
  "inputSchema": {
    "from": "string",
    "to": "string (optional)",
    "edge_types": ["string"] (optional),
    "limit": "number (optional)"
  }
}
```

资料来源：[src/mcp-server.ts:45-55]()

### impact_analysis

```json
{
  "name": "impact_analysis",
  "description": "Find likely dependents and tests for a file, symbol, or module.",
  "inputSchema": {
    "symbol_or_file": "string",
    "limit": "number (optional)"
  }
}
```

资料来源：[src/mcp-server.ts:56-65]()

### why_changed

```json
{
  "name": "why_changed",
  "description": "Explain why a file or symbol may have changed by combining current evidence with git history.",
  "inputSchema": {
    "symbol_or_file": "string",
    "limit": "number (optional)"
  }
}
```

资料来源：[src/mcp-server.ts:66-75]()

## Usage Examples

### Direct CLI Usage

```bash
# Trace dependencies of auth module
cxf trace --from src/auth.ts --edge-types IMPORTS

# Find what tests cover a file
cxf impact --target src/parser.ts

# Get change history for a symbol
cxf why --target AuthService
```

### MCP Integration

```json
{
  "mcpServers": {
    "contextful": {
      "command": "npx",
      "args": ["-y", "@inferensys/contextful", "server"]
    }
  }
}
```

```typescript
// In an MCP client
const result = await client.callTool("trace_path", {
  from: "src/auth.ts",
  to: "src/database.ts",
  edgeTypes: ["IMPORTS", "DEFINES"]
});
```

## Query Intent Classification

Graph queries are automatically classified to route to appropriate traversal strategies:

| Intent | Keywords | Graph Relevance |
|--------|----------|-----------------|
| `architectural` | architecture, flow, path, connects, calls | High priority |
| `impact` | impact, affected, depends, blast radius | Direct edge query |
| `historical` | why, changed, history, regression | Graph + git history |
| `exact` | Symbol names, file paths | Symbol-level traversal |

资料来源：[src/search.ts:115-130]()

## Limitations and Design Decisions

### Privacy Guarantees

- All processing is local-only
- No external embedding APIs used
- No source code upload
- No file editing capabilities

资料来源：[README.md:45-50]()

### v1 Scope Boundaries

- Broken JSON during indexing produces warnings but continues processing
- Syntax diagnostics are intentionally out of scope
- Git history is read-only

资料来源：[src/extract.ts:120-125]()

## Summary

The Graph Traversal and Analysis system in Contextful provides:

1. **Automatic Relationship Extraction** - Builds a dependency graph during indexing
2. **Multiple Query Entry Points** - CLI commands and MCP tools
3. **Path Finding** - Trace connections between any two nodes
4. **Impact Analysis** - Identify dependents and test coverage
5. **Change Attribution** - Combine current state with git history

This enables AI coding assistants to answer sophisticated questions about code relationships without requiring manual documentation or extensive file reading.

---

<a id='sqlite-database'></a>

## SQLite Database Schema

### 相关页面

相关主题：[Workspace Indexing System](#indexing-system), [Search Engine](#search-engine)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [src/db.ts](https://github.com/Inferensys/contextful/blob/main/src/db.ts)
- [src/types.ts](https://github.com/Inferensys/contextful/blob/main/src/types.ts)
- [src/util.ts](https://github.com/Inferensys/contextful/blob/main/src/util.ts)
- [src/extract.ts](https://github.com/Inferensys/contextful/blob/main/src/extract.ts)
- [src/search.ts](https://github.com/Inferensys/contextful/blob/main/src/search.ts)
</details>

# SQLite Database Schema

## Overview

Contextful uses SQLite as its primary storage engine for indexing codebase artifacts. The database schema is designed to support full-text search, symbol indexing, dependency graph traversal, and evidence pack generation for AI-assisted queries. All operations are managed through `better-sqlite3` for synchronous, high-performance access.

资料来源：[src/db.ts:1-50]()

## Schema Tables

### Primary Storage Tables

#### `chunks`

Stores indexed code and documentation segments extracted from source files. Each chunk represents a logical unit of content bounded by language-specific rules (functions, classes, headings, etc.).

| Column | Type | Description |
|--------|------|-------------|
| `ref` | TEXT | Unique reference identifier (format: `file:path:start-end`) |
| `file_path` | TEXT | Relative path to the source file |
| `start_line` | INTEGER | Starting line number (1-indexed) |
| `end_line` | INTEGER | Ending line number |
| `kind` | TEXT | Chunk classification: `code`, `doc`, `file` |
| `title` | TEXT | Display title for the chunk |
| `text` | TEXT | Full content of the chunk |
| `token_estimate` | INTEGER | Estimated token count using GPT tokenizer |

资料来源：[src/db.ts:23-36]()

#### `symbols`

Captures programming constructs (functions, classes, interfaces, types) extracted from source files.

| Column | Type | Description |
|--------|------|-------------|
| `ref` | TEXT | Unique symbol reference |
| `name` | TEXT | Symbol name |
| `kind` | TEXT | Symbol type: `function`, `class`, `interface`, `type`, `struct`, `enum`, `trait`, `impl` |
| `file_path` | TEXT | Source file path |
| `line` | INTEGER | Line number where symbol is defined |
| `signature` | TEXT | First 160 characters of symbol declaration |
| `exported` | INTEGER | Boolean flag (1 = exported, 0 = local) |

资料来源：[src/db.ts:47-60]()

#### `edges`

Represents relationships between code entities, including imports, module dependencies, and configuration references.

| Column | Type | Description |
|--------|------|-------------|
| `source_name` | TEXT | Name of the importing/configuring entity |
| `target_name` | TEXT | Name or path of the imported/dependency target |
| `edge_type` | TEXT | Relationship type: `IMPORTS`, `CONFIGURES` |
| `file_path` | TEXT | File where the relationship is defined |
| `line` | INTEGER | Line number of the relationship definition |

资料来源：[src/db.ts:38-45]()

### Full-Text Search Index

#### `chunks_fts`

Virtual FTS5 table providing fast full-text search across all indexed content. Mirrors core chunk data for BM25-ranked retrieval.

| Column | Type | Description |
|--------|------|-------------|
| `ref` | TEXT | Chunk reference |
| `path` | TEXT | File path for filtering |
| `title` | TEXT | Searchable title field |
| `text` | TEXT | Full searchable content |

资料来源：[src/db.ts:37-42]()

The FTS table is queried using BM25 ranking in search operations:

```sql
SELECT ref, path, title, text, bm25(chunks_fts) AS rank 
FROM chunks_fts WHERE chunks_fts MATCH ?
```

资料来源：[src/search.ts:45-47]()

### Graph and Metadata Tables

#### `nodes`

Represents graph vertices for dependency analysis and traversal operations.

| Column | Type | Description |
|--------|------|-------------|
| `id` | INTEGER | Auto-incrementing primary key |
| `ref` | TEXT | Node reference |
| `kind` | TEXT | Node classification: `file`, `symbol`, `chunk`, `module`, `config` |
| `name` | TEXT | Display name |
| `file_path` | TEXT | Associated file path (nullable) |

资料来源：[src/db.ts:12-22]()

#### `files`

Stores metadata about indexed source files.

| Column | Type | Description |
|--------|------|-------------|
| `absolute_path` | TEXT | Full absolute file path |
| `language` | TEXT | Detected programming language |
| `hash` | TEXT | SHA-based content hash for change detection |
| `size` | TEXT | File size in bytes |

资料来源：[src/db.ts:13-17]()

#### `fingerprints`

Stores content fingerprints for deduplication and incremental indexing.

| Column | Type | Description |
|--------|------|-------------|
| `ref` | TEXT | Reference to the content chunk |
| `kind` | TEXT | Content type |
| `fingerprint` | TEXT | Hash of the content |

#### `evidence_packs`

Persists generated evidence packs for audit and replay.

| Column | Type | Description |
|--------|------|-------------|
| `id` | TEXT | Unique pack identifier |
| `query` | TEXT | Original search query |
| `token_estimate` | INTEGER | Total token count |
| `json` | TEXT | Serialized pack data |

#### `query_log`

Records search history for analysis and debugging.

| Column | Type | Description |
|--------|------|-------------|
| `query` | TEXT | Search query text |
| `intent` | TEXT | Classified search intent |
| `timestamp` | TEXT | ISO timestamp |

资料来源：[src/db.ts:1-10]()

## Data Flow Architecture

```mermaid
graph TD
    A[Source Files] --> B[extractSymbols]
    A --> C[extractEdges]
    A --> D[extractChunks]
    
    B --> E[symbols table]
    C --> F[edges table]
    D --> G[chunks table]
    D --> H[chunks_fts index]
    
    G --> I[Full-Text Search]
    E --> J[Symbol Lookup]
    F --> K[Graph Traversal]
    
    I --> L[searchContext]
    J --> L
    K --> L
    
    L --> M[Evidence Pack]
    M --> N[evidence_packs]
```

资料来源：[src/extract.ts:1-150]()

## Supported Symbol Kinds

The indexer extracts and classifies symbols based on language-specific patterns:

| Language | Supported Kinds |
|----------|-----------------|
| TypeScript/JavaScript | `function`, `class`, `interface`, `type` |
| Python | `function`, `class` |
| Go | `function`, `struct`, `interface` |
| Rust | `function`, `struct`, `enum`, `trait`, `impl` |

资料来源：[src/extract.ts:30-60]()

## Supported Edge Types

| Edge Type | Description | Example |
|-----------|-------------|---------|
| `IMPORTS` | Module/dependency import | `import { foo } from './bar'` |
| `CONFIGURES` | Configuration key reference | `"dependencies": { ... }` in package.json |

The `CONFIGURES` edge type is specifically generated for package.json dependency sections and JSON configuration keys.

资料来源：[src/extract.ts:70-120]()

## Query Classification and Intent

The search system classifies queries into intent categories that influence result ranking:

| Intent | Trigger Keywords | Purpose |
|--------|-----------------|---------|
| `symbol` | Class/function names, exact identifiers | Find symbol definitions |
| `code` | Code-related terms | Locate implementation |
| `memory` | memory, lessons, session | Search evidence-backed memory |
| `impact` | depends, affected, blast radius | Reverse dependency analysis |
| `historical` | why, changed, history, commit | Git history queries |
| `architectural` | architecture, flow, imports | Dependency tracing |
| `docs` | docs, documentation, readme | Documentation lookup |
| `exact` | File paths, line refs, symbols | Precise file/line access |
| `vague` | Default fallback | Broad search |

资料来源：[src/search.ts:15-30]()

## Token Estimation

Token counts are estimated using a heuristic approximation:

```typescript
export function estimateTokens(text: string): number {
  return Math.ceil(text.length / 4);
}
```

This provides a rough approximation where 1 token ≈ 4 characters, suitable for budget management in evidence pack generation.

资料来源：[src/util.ts:1-10]()

## Key Database Operations

### Chunk Insertion

```typescript
db.prepare(`
  INSERT INTO chunks (ref, file_path, start_line, end_line, kind, title, text, token_estimate)
  VALUES (?, ?, ?, ?, ?, ?, ?, ?)
`).run(chunk.ref, chunk.filePath, chunk.startLine, chunk.endLine, chunk.kind, chunk.title, chunk.text, chunk.tokenEstimate);
```

同步写入 `chunks` 表和 `chunks_fts` FTS 索引。

### Symbol Loading

```typescript
db.prepare(`SELECT ref, name, kind, file_path, line, signature, exported 
FROM symbols WHERE file_path IN (${paths.map(() => "?").join(",")})`)
  .all(...paths)
```

资料来源：[src/db.ts:23-42]()
资料来源：[src/search.ts:180-195]()

## Schema Version and Metadata

The database stores schema version and workspace metadata:

| Key | Description |
|-----|-------------|
| `schema_version` | Current schema version number |
| `workspace` | Workspace root path |
| `indexed_at` | Last indexing timestamp |
| `parser_backend` | Parser backend description |
| `warnings` | Last 50 indexing warnings |

资料来源：[src/indexer.ts:80-90]()

## Conclusion

The SQLite schema in Contextful provides a normalized, queryable representation of source code structure and content. The dual-table approach for chunks (storage + FTS index) enables both efficient storage and fast full-text retrieval. The edges and symbols tables together support graph traversal for dependency analysis, while the evidence pack system enables persistent, ranked context generation for AI queries.

---

<a id='indexing-system'></a>

## Workspace Indexing System

### 相关页面

相关主题：[SQLite Database Schema](#sqlite-database), [Search Engine](#search-engine)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [src/indexer.ts](https://github.com/Inferensys/contextful/blob/main/src/indexer.ts)
- [src/extract.ts](https://github.com/Inferensys/contextful/blob/main/src/extract.ts)
- [src/cli.ts](https://github.com/Inferensys/contextful/blob/main/src/cli.ts)
- [src/search.ts](https://github.com/Inferensys/contextful/blob/main/src/search.ts)
- [src/report.ts](https://github.com/Inferensys/contextful/blob/main/src/report.ts)
- [src/util.ts](https://github.com/Inferensys/contextful/blob/main/src/util.ts)
</details>

# Workspace Indexing System

## Overview

The Workspace Indexing System is the core indexing engine of Contextful. It scans, parses, and stores representations of source code files from a workspace into a local SQLite database, enabling semantic search, dependency graph traversal, and evidence-backed context retrieval.

**Primary responsibilities:**

| Responsibility | Description |
|----------------|-------------|
| File Discovery | Recursively traverse workspace directories, filtering by language and ignore rules |
| Symbol Extraction | Parse and catalog functions, classes, interfaces, types, enums, traits |
| Edge Extraction | Track import/export relationships between modules and dependencies |
| Content Chunking | Split large files into manageable, line-numbered chunks for retrieval |
| Watch Mode | Monitor file system changes and incrementally re-index on modifications |

资料来源：[src/cli.ts:1-20](https://github.com/Inferensys/contextful/blob/main/src/cli.ts)

## Architecture

```mermaid
graph TD
    A[Workspace Directory] --> B[File Discovery]
    B --> C[Language Detection]
    C --> D[Content Extraction]
    D --> E[Symbol Extraction]
    D --> F[Edge Extraction]
    D --> G[Chunk Generation]
    E --> H[SQLite DB]
    F --> H
    G --> H
    I[Search/Query] --> H
    J[Watch Mode] --> B
```

The system is built around a SQLite database that stores three core entities: symbols, edges, and chunks. The indexer processes files in a single pass, extracting all three data types simultaneously to minimize I/O overhead.

资料来源：[src/extract.ts:1-50](https://github.com/Inferensys/contextful/blob/main/src/extract.ts)

## Supported Languages

The indexer natively supports symbol and edge extraction for the following languages:

| Language | Symbol Patterns | Import Patterns |
|----------|----------------|-----------------|
| TypeScript / JavaScript | `function`, `class`, `interface`, `type`, `const` arrow/function | `import from`, `require()` |
| Python | `def`, `class` | `from ... import`, `import` |
| Go | `func`, `type struct/interface` | `"..."` (quoted imports) |
| Rust | `fn`, `struct`, `enum`, `trait`, `impl` | `use`, `mod` |
| Markdown | Headings (`#{1,6}`) | N/A |
| JSON | Config keys (`"key":`) | N/A |

资料来源：[src/extract.ts:15-45](https://github.com/Inferensys/contextful/blob/main/src/extract.ts)

## Indexing Process

### Phase 1: File Discovery

The indexer recursively scans the workspace directory, applying language-specific filtering and Gitignore-style ignore rules. Binary files are detected and skipped using a simple null-byte heuristic.

```typescript
export function isLikelyBinary(buffer: Buffer): boolean {
  const sample = buffer.subarray(0, Math.min(buffer.length, 4096));
  return sample.includes(0);
}
```

资料来源：[src/util.ts:20-22](https://github.com/Inferensys/contextful/blob/main/src/util.ts)

### Phase 2: Symbol Extraction

Symbols are extracted using language-specific regular expression patterns. Each symbol record includes:

| Field | Type | Description |
|-------|------|-------------|
| `name` | string | Symbol identifier |
| `kind` | string | Category: function, class, interface, type, struct, enum, trait, impl |
| `line` | number | Declaration line number |
| `signature` | string | First 160 characters of the declaration line |
| `exported` | boolean | Whether the symbol is exported |

```typescript
const push = (name: string, kind: string, exported = false) =>
  symbols.push({ name, kind, line: lineNumber, signature: excerpt(line, 160), exported });
```

资料来源：[src/extract.ts:5-7](https://github.com/Inferensys/contextful/blob/main/src/extract.ts)

For TypeScript and JavaScript, the extractor captures export modifiers:

```typescript
matchPush(line, /^\s*(export\s+)?(?:async\s+)?function\s+([A-Za-z_$][\w$]*)/, push, "function");
matchPush(line, /^\s*(export\s+)?class\s+([A-Za-z_$][\w$]*)/, push, "class");
```

资料来源：[src/extract.ts:12-15](https://github.com/Inferensys/contextful/blob/main/src/extract.ts)

### Phase 3: Edge Extraction

Edges represent dependency relationships between modules. The extractor identifies:

- **IMPORTS**: Direct import statements for each language
- **CONFIGURES**: Dependencies declared in configuration files (package.json, Cargo.toml, etc.)

```typescript
if (language === "typescript" || language === "javascript") {
  for (const match of line.matchAll(/(?:from\s+|import\s*)["']([^"']+)["']/g))
    addImport(match[1]);
  for (const match of line.matchAll(/require\(["']([^"']+)["']\)/g))
    addImport(match[1]);
}
```

资料来源：[src/extract.ts:67-72](https://github.com/Inferensys/contextful/blob/main/src/extract.ts)

For `package.json`, dependencies and scripts are indexed as CONFIGURES edges:

```typescript
for (const section of ["dependencies", "devDependencies", "peerDependencies", "scripts"]) {
  const values = parsed[section];
  if (!values || typeof values !== "object") continue;
  for (const key of Object.keys(values)) {
    edges.push({ targetName: `${section}:${key}`, targetType: "config", edgeType: "CONFIGURES", line: 1 });
  }
}
```

资料来源：[src/extract.ts:105-114](https://github.com/Inferensys/contextful/blob/main/src/extract.ts)

### Phase 4: Chunk Generation

Large files are split into overlapping chunks to enable granular retrieval. The system uses a sliding window approach with overlap between consecutive chunks:

```mermaid
graph LR
    A[File Lines 1-200] --> B[Chunk 1: 1-80]
    A --> C[Chunk 2: 60-140]
    A --> D[Chunk 3: 120-200]
    B --> E[Token Estimate]
    C --> E
    D --> E
```

Each chunk includes:

| Field | Description |
|-------|-------------|
| `ref` | Unique reference string (`file:path:start-end`) |
| `filePath` | Relative path to source file |
| `startLine` | Starting line number |
| `endLine` | Ending line number |
| `kind` | Chunk type: `code`, `doc`, `file` |
| `title` | Human-readable title |
| `tokenEstimate` | Estimated token count |

资料来源：[src/extract.ts:145-160](https://github.com/Inferensys/contextful/blob/main/src/extract.ts)

### Phase 5: Markdown Document Chunking

Markdown files receive special treatment. Instead of fixed-size chunks, the indexer uses headings as natural section boundaries:

```typescript
lines.forEach((line, index) => {
  const match = line.match(/^(#{1,6})\s+(.+)$/);
  if (match) headings.push({ title: match[2].trim(), line: index + 1 });
});
return headings.map((heading, index) => {
  const next = headings[index + 1];
  const endLine = next ? next.line - 1 : lines.length;
  // ... create chunk for section
});
```

资料来源：[src/extract.ts:174-185](https://github.com/Inferensys/contextful/blob/main/src/extract.ts)

## Watch Mode

The indexer supports continuous monitoring via file system watchers:

```typescript
export async function watchWorkspace(workspace: string, onIndex: (result: IndexResult) => void): Promise<void> {
  const resolved = path.resolve(workspace);
  onIndex(await indexWorkspace({ workspace: resolved }));
  let timer: NodeJS.Timeout | undefined;
  fs.watch(resolved, { recursive: true }, () => {
    if (timer) clearTimeout(timer);
    timer = setTimeout(async () => {
      onIndex(await indexWorkspace({ workspace: resolved }));
    }, 500);
  });
}
```

资料来源：[src/indexer.ts:80-91](https://github.com/Inferensys/contextful/blob/main/src/indexer.ts)

Key characteristics:
- Debounces file change events with a 500ms delay to batch rapid successive changes
- Re-runs full indexing on each trigger
- Outputs JSON results to stdout for consumption by other processes

## CLI Commands

The indexing system exposes three primary CLI commands:

| Command | Description |
|---------|-------------|
| `cxf index --workspace <path> [--watch]` | Initial or incremental indexing of a workspace |
| `cxf daemon --workspace <path>` | Run as a long-lived daemon that outputs index results on file changes |
| `cxf report --workspace <path> --format markdown\|json\|html` | Generate an index status report |

```bash
# Index a workspace
npx @inferensys/contextful index --workspace .

# Watch for changes and print results
npx @inferensys/contextful daemon --workspace .
```

资料来源：[src/cli.ts:22-35](https://github.com/Inferensys/contextful/blob/main/src/cli.ts)

## Search Integration

The indexing system powers Contextful's search capabilities. After indexing, users can query the database using natural language:

```typescript
export async function searchContext(options: SearchOptions): Promise<{ intent: SearchIntent; hits: SearchHit[] }> {
  const workspace = resolveWorkspace(options.options.workspace);
  await ensureIndexed(workspace);
  const intent = classifyQuery(options.query);
  // ... perform FTS and semantic search
}
```

资料来源：[src/search.ts:45-55](https://github.com/Inferensys/contextful/blob/main/src/search.ts)

Query intents are automatically classified to optimize search behavior:

| Intent | Trigger Keywords | Description |
|--------|-----------------|-------------|
| `code` | function names, variable names | Code and implementation search |
| `exact` | Backticks, quotes, `#`, file paths | Literal symbol/identifier lookup |
| `impact` | impact, affected, depends, blast radius | Dependency and change analysis |
| `historical` | why, changed, commit, history | Git history and regression tracking |
| `architectural` | architecture, flow, trace, connects | Dependency graph traversal |
| `docs` | resource, documentation, guide, how to | Documentation and README search |
| `memory` | remember, session, lesson, learned | Agent memory recall |

资料来源：[src/search.ts:5-18](https://github.com/Inferensys/contextful/blob/main/src/search.ts)

## Token Estimation

Every chunk and evidence pack includes a token estimate for budget management:

```typescript
export function packTokenCount(text: string): number {
  return estimateTokens(text);
}
```

The system uses this estimate to enforce budget limits when building context packs for LLM consumption, ensuring responses stay within token budgets.

资料来源：[src/report.ts:50-52](https://github.com/Inferensys/contextful/blob/main/src/report.ts)

## Data Models

### Symbol Record

```typescript
interface SymbolRecord {
  ref: string;
  name: string;
  kind: "function" | "class" | "interface" | "type" | "struct" | "enum" | "trait" | "impl";
  filePath: string;
  line: number;
  signature: string;
  exported: boolean;
}
```

### Edge Record

```typescript
interface RawEdge {
  targetName: string;
  targetType: "module" | "config" | "symbol";
  edgeType: "IMPORTS" | "CONFIGURES" | "DEFINES";
  line: number;
}
```

### Chunk Record

```typescript
interface ChunkRecord {
  ref: string;
  filePath: string;
  startLine: number;
  endLine: number;
  kind: "code" | "doc" | "file";
  title: string;
  text: string;
  tokenEstimate: number;
}
```

## Extension Points

### Adding New Language Support

To add support for a new language:

1. Add language detection in the file scanner
2. Implement symbol extraction patterns in `extractSymbols()`
3. Implement edge extraction patterns in `extractEdges()`
4. Update the chunking logic if special handling is needed

Example pattern structure:

```typescript
} else if (language === "newlang") {
  matchPush(line, /^\s*(pub\s+)?fn\s+([A-Za-z_][\w]*)/, push, "function");
  const use = line.match(/^\s*use\s+([^;]+);/);
  if (use) addImport(use[1].trim());
}
```

资料来源：[src/extract.ts:35-44](https://github.com/Inferensys/contextful/blob/main/src/extract.ts)

---

---

## Doramagic Pitfall Log

Project: Inferensys/contextful

Summary: Found 7 potential pitfall items; 0 are high/blocking. Highest priority: configuration - 可能修改宿主 AI 配置.

## 1. configuration · 可能修改宿主 AI 配置

- Severity: medium
- Evidence strength: source_linked
- Finding: 项目面向 Claude/Cursor/Codex/Gemini/OpenCode 等宿主，或安装命令涉及用户配置目录。
- User impact: 安装可能改变本机 AI 工具行为，用户需要知道写入位置和回滚方法。
- Suggested check: 列出会写入的配置文件、目录和卸载/回滚步骤。
- Guardrail action: 涉及宿主配置目录时必须给回滚路径，不能只给安装命令。
- Evidence: capability.host_targets | github_repo:1240001007 | https://github.com/Inferensys/contextful | host_targets=claude, claude_code

## 2. capability · 能力判断依赖假设

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: 假设不成立时，用户拿不到承诺的能力。
- Suggested check: 将假设转成下游验证清单。
- Guardrail action: 假设必须转成验证项；没有验证结果前不能写成事实。
- Evidence: capability.assumptions | github_repo:1240001007 | https://github.com/Inferensys/contextful | README/documentation is current enough for a first validation pass.

## 3. maintenance · 维护活跃度未知

- Severity: medium
- Evidence strength: source_linked
- Finding: 未记录 last_activity_observed。
- User impact: 新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- Suggested check: 补 GitHub 最近 commit、release、issue/PR 响应信号。
- Guardrail action: 维护活跃度未知时，推荐强度不能标为高信任。
- Evidence: evidence.maintainer_signals | github_repo:1240001007 | https://github.com/Inferensys/contextful | last_activity_observed missing

## 4. security_permissions · 下游验证发现风险项

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: 下游已经要求复核，不能在页面中弱化。
- Suggested check: 进入安全/权限治理复核队列。
- Guardrail action: 下游风险存在时必须保持 review/recommendation 降级。
- Evidence: downstream_validation.risk_items | github_repo:1240001007 | https://github.com/Inferensys/contextful | no_demo; severity=medium

## 5. security_permissions · 存在评分风险

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: 风险会影响是否适合普通用户安装。
- Suggested check: 把风险写入边界卡，并确认是否需要人工复核。
- Guardrail action: 评分风险必须进入边界卡，不能只作为内部分数。
- Evidence: risks.scoring_risks | github_repo:1240001007 | https://github.com/Inferensys/contextful | no_demo; severity=medium

## 6. maintenance · issue/PR 响应质量未知

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: 用户无法判断遇到问题后是否有人维护。
- Suggested check: 抽样最近 issue/PR，判断是否长期无人处理。
- Guardrail action: issue/PR 响应未知时，必须提示维护风险。
- Evidence: evidence.maintainer_signals | github_repo:1240001007 | https://github.com/Inferensys/contextful | issue_or_pr_quality=unknown

## 7. maintenance · 发布节奏不明确

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: 安装命令和文档可能落后于代码，用户踩坑概率升高。
- Suggested check: 确认最近 release/tag 和 README 安装命令是否一致。
- Guardrail action: 发布节奏未知或过期时，安装说明必须标注可能漂移。
- Evidence: evidence.maintainer_signals | github_repo:1240001007 | https://github.com/Inferensys/contextful | release_recency=unknown

<!-- canonical_name: Inferensys/contextful; human_manual_source: deepwiki_human_wiki -->