Doramagic Project Pack · Human Manual

contextful

The Contextful system consists of several interconnected components that work together to provide context management capabilities.

Project Introduction

Related topics: High-Level Architecture, Quick Start Guide

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Component Responsibilities

Continue reading this section for the full explanation and source context.

Section Language Support Matrix

Continue reading this section for the full explanation and source context.

Section Tool Descriptions

Continue reading this section for the full explanation and source context.

Related topics: High-Level Architecture, Quick Start Guide

Project Introduction

Contextful is an intelligent code context management system designed to provide AI agents with compact, evidence-backed information for codebase navigation and understanding. The project serves as a bridge between large codebases and AI-powered development tools by indexing source code, extracting symbols, tracking dependencies, and generating token-budgeted evidence packs for queries.

Purpose and Scope

Contextful solves the fundamental problem that AI coding assistants face when working with large repositories: excessive context requirements that lead to token waste and degraded performance. Instead of forcing agents to read dozens of random files, Contextful enables targeted, cited, and ranked context retrieval that maximizes the value of each token spent.

The system operates in three primary modes:

  1. Indexing Mode - Scans and indexes source code, extracting symbols, dependencies, and semantic chunks
  2. Query Mode - Creates evidence packs for natural language queries with token budgets
  3. Search Mode - Provides lightweight search across code, docs, symbols, and memory without full evidence compilation

Sources: README.md:1-15

Architecture Overview

The Contextful system consists of several interconnected components that work together to provide context management capabilities.

graph TD
    A[Source Code] --> B[Indexing Engine]
    B --> C[SQLite Kernel DB]
    C --> D[Search Module]
    C --> E[Graph Analysis]
    C --> F[Memory Ledger]
    
    G[CLI / MCP Server] --> D
    G --> E
    G --> F
    
    D --> H[Evidence Pack]
    E --> H
    F --> H
    
    H --> I[AI Agent / User]

Component Responsibilities

ComponentFileResponsibility
Indexing Enginesrc/extract.tsParse source files, extract symbols and dependencies
Search Modulesrc/search.tsFull-text search, intent classification, ranking
Graph Analysissrc/search.tsTrace dependencies and code paths
Memory Ledgersrc/memory.tsStore evidence-backed lessons across sessions
CLI Interfacesrc/cli.tsCommand-line interface for all operations
MCP Serversrc/mcp-server.tsModel Context Protocol stdio server

Sources: src/extract.ts:1-50, src/search.ts:1-30, src/cli.ts:1-40

Supported Languages and File Types

Contextful supports multiple programming languages through pattern-based extraction. The indexing engine recognizes language-specific syntax for symbols and dependencies.

Language Support Matrix

LanguageFunctionsClassesTypesImports
TypeScript/JavaScript
Python-
Go
Rust
Markdown--Headings-
JSON--Config keys-

Sources: src/extract.ts:15-80

Core MCP Tools

Contextful exposes its capabilities through the Model Context Protocol (MCP), providing AI agents with a standardized tool interface. The primary tools are designed to keep the agent surface small while providing maximum utility.

graph LR
    A[Agent] -->|context_pack| B[Evidence Pack Generator]
    A -->|search_code| C[Code Search]
    A -->|trace_path| D[Graph Traversal]
    A -->|impact_analysis| E[Dependency Analyzer]
    A -->|why_changed| F[Git History]
    A -->|recall_memory| G[Memory Search]
    A -->|write_lesson| H[Lesson Writer]

Tool Descriptions

ToolPurposeKey Parameters
context_packReturns ranked, cited, token-budgeted context bundlesquery, budget, scope
search_codePowerful search across code, docs, symbols, and memoryquery, mode, filters
trace_pathGraph traversal across files, symbols, modules, and configfrom, to, edge_types
impact_analysisReverse dependencies and likely testssymbol_or_file
why_changedCurrent evidence plus git historysymbol_or_file
recall_memorySearch session learnings and durable lessonsquery, scope
write_lessonStore evidence-backed lessonsclaim, evidence_refs, confidence

Sources: README.md:25-45, src/mcp-server.ts:1-80

CLI Interface

Contextful provides a command-line interface through the cxf binary (with contextful as a readable alias). The CLI supports both one-shot operations and daemon mode for continuous indexing.

Command Reference

CommandDescriptionKey Options
indexIndex a workspace--workspace, --watch
daemonRun local indexing daemon--workspace
queryCreate evidence pack for query--workspace, --budget, --json
searchSearch without full evidence pack--workspace, --limit, --kind
reportGenerate context report--workspace, --format
memory addStore evidence-backed lesson--claim, --evidence, --scope, --confidence
serverRun MCP stdio server-

Sources: src/cli.ts:40-120, README.md:15-35

Example Usage

# Index a workspace
npx @inferensys/contextful index --workspace .

# Query with token budget
npx @inferensys/contextful query "where is user auth handled" --workspace . --budget 2000

# Run as MCP server
npx @inferensys/contextful server

Sources: README.md:8-15

Data Models

Evidence Pack Structure

The EvidencePack is the core data structure returned by query operations. It contains all necessary context for an agent to answer a query.

interface EvidencePack {
  id: string;                    // Unique pack identifier
  query: string;                 // Original query
  scope: string;                 // Scope of the context
  intent: SearchIntent;          // Classified query intent
  summary: string;               // Human-readable summary
  citations: SearchHit[];        // Ranked evidence items
  files: FileContext[];          // Grouped file references
  symbols: SymbolRecord[];       // Relevant symbols
  graphPaths: GraphPath[];       // Dependency paths
  memoryHits: SearchHit[];       // Memory matches
  confidence: number;            // Confidence score (0.1-0.92)
  tokenEstimate: number;         // Estimated token count
  budget: number;                // Token budget
  createdAt: string;             // ISO timestamp
}

Sources: src/search.ts:200-250

Search Hit Structure

Each search result is represented as a SearchHit with relevance ranking and excerpt information.

FieldTypeDescription
refstringReference identifier (e.g., file:src/auth.ts:1-20)
pathstringFile path
titlestringDisplay title
excerptstringRelevant text snippet
kindstringType: code, doc, symbol, memory
ranknumberBM25 relevance score

Sources: src/search.ts:50-80

Dependencies and Technology Stack

Contextful is built on a carefully selected set of dependencies that enable efficient code indexing and search.

DependencyVersionPurpose
@modelcontextprotocol/sdk^1.29.0MCP protocol implementation
better-sqlite3^12.10.0SQLite database for indexing
commander^14.0.3CLI argument parsing
fast-glob^3.3.3File pattern matching
tree-sitter-wasms^0.1.13Syntax parsing
web-tree-sitter^0.20.8Tree-sitter bindings
zod^4.4.3Schema validation

Sources: package.json:20-40

System Requirements

Sources: package.json:45-55

Supported IDE Integration

Contextful is designed to integrate with a wide range of AI-powered development tools:

IDE/ExtensionStatus
GitHub CopilotSupported
VS CodeSupported
CursorSupported
WindsurfSupported
ClineSupported
Roo CodeSupported
ContinueSupported
ZedSupported

Sources: package.json:10-20

Workflow: From Indexing to Query

The complete workflow demonstrates how Contextful transforms raw source code into actionable intelligence for AI agents.

sequenceDiagram
    participant U as User/Agent
    participant CLI as CLI/MCP Server
    participant IDX as Indexer
    participant DB as SQLite Kernel
    participant SRCH as Search Engine
    participant MEM as Memory Ledger

    U->>CLI: index --workspace ./project
    CLI->>IDX: Extract symbols & dependencies
    IDX->>DB: Store in chunks_fts, symbols, edges
    DB-->>CLI: Index complete

    U->>CLI: query "how is auth handled"
    CLI->>SRCH: classifyQuery() intent=exact
    SRCH->>DB: FTS + BM25 search
    DB-->>SRCH: Ranked hits
    SRCH->>MEM: Check memory ledger
    MEM-->>SRCH: Related lessons
    CLI-->>U: EvidencePack (token-budgeted)

    U->>CLI: write_lesson --claim "Auth pattern" --evidence file:...
    CLI->>MEM: Store lesson with confidence
    MEM-->>CLI: Lesson saved

Sources: src/search.ts:100-150, src/report.ts:80-120

Next Steps

To continue exploring Contextful:

  1. Installation Guide - Set up Contextful in your development environment
  2. CLI Reference - Detailed documentation of all CLI commands
  3. MCP Tools API - Complete reference for MCP tool interfaces
  4. Configuration - Workspace configuration and tuning options
  5. Memory System - Using the evidence-backed lesson system

Sources: README.md:1-15

Quick Start Guide

Related topics: Project Introduction

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Step 1: Index Your Workspace

Continue reading this section for the full explanation and source context.

Section Step 2: Query for Context

Continue reading this section for the full explanation and source context.

Section Step 3: Search Without Building Evidence Packs

Continue reading this section for the full explanation and source context.

Related topics: Project Introduction

Quick Start Guide

Overview

Contextful is a contextual indexing and search system designed to help AI agents efficiently retrieve relevant code evidence. Instead of forcing agents to perform dozens of random file reads, Contextful returns compact, ranked, and cited evidence packs that fit within a token budget.

Sources: README.md:1-10

Installation

Install Contextful using npm. The package provides both the cxf binary and the full contextful alias.

npm install -g @inferensys/contextful

Alternatively, run commands directly via npx:

npx @inferensys/contextful index --workspace .

Sources: README.md:11-14

CLI Commands

Contextful provides a command-line interface with the following primary commands:

CommandDescription
cxf indexIndex a workspace for search
cxf daemonRun a local indexing daemon
cxf queryCreate an evidence pack for a query
cxf searchSearch indexed context
cxf reportGenerate a context report
cxf memory addStore an evidence-backed lesson
cxf serverRun the MCP stdio server

Sources: README.md:23-32

Basic Workflow

Step 1: Index Your Workspace

Before searching, you must index your codebase. This creates the searchable database:

cxf index --workspace .

For continuous indexing as files change, use the daemon mode:

cxf daemon --workspace .

Sources: src/cli.ts:1-20

Step 2: Query for Context

Once indexed, ask questions about your codebase:

cxf query "where is user auth handled" --workspace . --budget 2000

The query command returns a ranked evidence pack with citations and file references.

#### Query Options

OptionDescriptionDefault
--workspace <path>Workspace pathCurrent directory
--budget <tokens>Approximate token budget2000
--jsonOutput as JSON instead of Markdownfalse

Sources: src/cli.ts:22-30

Step 3: Search Without Building Evidence Packs

For quick lookups without compiling full evidence packs, use search:

cxf search "authentication middleware" --workspace . --limit 10 --kind code

#### Search Options

OptionDescriptionDefault
--workspace <path>Workspace pathCurrent directory
--limit <count>Maximum hits10
--kindFilter: all, code, docs, symbols, memoryall

Sources: src/cli.ts:32-42

Step 4: Generate Reports

Generate comprehensive context reports in various formats:

cxf report --workspace . --format markdown
cxf report --workspace . --format json
cxf report --workspace . --format html

Sources: src/cli.ts:44-48

MCP Server Integration

Contextful can run as a Model Context Protocol (MCP) server, providing tools directly to AI agents.

cxf server

Available MCP Tools

ToolPurpose
context_packReturns ranked, cited, token-budgeted evidence bundles
search_codeCode, docs, symbol, and memory search
trace_pathGraph traversal across files, symbols, modules, and config
impact_analysisReverse dependencies and likely tests
why_changedCurrent evidence plus git history
recall_memorySearch session learnings and durable project lessons
write_lessonStore evidence-backed lessons for future sessions

Sources: README.md:40-48

MCP Tool Parameters

#### context_pack

ParameterTypeRequiredDescription
querystringYesQuery to answer from indexed context
budgetnumberNoToken budget for the response
scopestringNoSearch scope

Sources: src/mcp-server.ts:1-25

#### search_code

ParameterTypeRequiredDescription
querystringYesSearch query
modestringNoSearch mode
filtersobjectNoSearch filters
workspacestringNoWorkspace path
limitnumberNoMaximum results

Sources: src/mcp-server.ts:26-40

#### write_lesson

ParameterTypeRequiredDescription
claimstringYesLesson claim
evidence_refsarrayYesEvidence references (e.g., file:src/auth.ts:1-20)
scopestringNoMemory scope
confidencenumberNoConfidence from 0 to 1
supersedesstringNoPrevious lesson ID to supersede

Sources: src/mcp-server.ts:65-80

Memory System

Contextful includes an evidence-backed memory system for storing lessons across sessions.

Adding a Lesson

cxf memory add \
  --claim "Always validate tokens in middleware" \
  --evidence "file:src/auth.ts:1-20" \
  --workspace . \
  --confidence 0.8

#### Memory Command Options

OptionRequiredDescription
--claim <text>YesThe lesson or claim
--evidence <ref...>YesEvidence references
--workspace <path>NoWorkspace path
--scope <scope>NoMemory scope (default: repo)
--confidence <number>NoConfidence from 0 to 1 (default: 0.7)

Sources: src/cli.ts:50-75

Output Formats

Markdown Output (Default)

cxf query "where is auth handled" --workspace .

Returns a formatted Markdown document with citations and graph paths.

JSON Output

cxf query "where is auth handled" --workspace . --json

Returns structured JSON data suitable for programmatic processing.

Sources: src/cli.ts:22-30

Report Formats

FormatDescription
markdownHuman-readable Markdown report
jsonStructured JSON data
htmlStandalone HTML page

Sources: src/cli.ts:44-48

Architecture Overview

graph TD
    A[CLI / MCP Server] --> B[Workspace Indexer]
    B --> C[SQLite Kernel DB]
    C --> D[Full-Text Search]
    C --> E[Symbol Index]
    C --> F[Graph Edges]
    G[Query Request] --> H[Search Context]
    H --> I[Evidence Pack Builder]
    I --> D
    I --> E
    I --> F
    I --> J[Memory Ledger]
    I --> K[Evidence Pack Output]
    J --> J

Common Usage Patterns

Pattern 1: Initial Setup

# Index the workspace
cxf index --workspace /path/to/project --watch

# Generate initial report
cxf report --workspace /path/to/project --format html > report.html

Pattern 2: Interactive Exploration

# Run as MCP server
cxf server

# Or use CLI directly
cxf query "how does the cache work" --workspace . --budget 3000

Pattern 3: Agent Memory Persistence

# Store learned lessons
cxf memory add --claim "Config validation happens in validate.ts" --evidence "file:src/config/validate.ts:1-50"

# Recall past lessons
# Via MCP: recall_memory(query="config validation")

Next Steps

Sources: README.md:1-10

High-Level Architecture

Related topics: Runtime Components, Search Engine, SQLite Database Schema

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Components

Continue reading this section for the full explanation and source context.

Section Symbol Extraction

Continue reading this section for the full explanation and source context.

Section Edge Detection

Continue reading this section for the full explanation and source context.

Related topics: Runtime Components, Search Engine, SQLite Database Schema

High-Level Architecture

Contextful is a local-only indexing and context management tool designed to help AI coding assistants retrieve compact, evidence-backed context from workspace codebases. The system operates without external embedding APIs, instead relying on SQLite FTS5 full-text search, graph-based dependency tracking, and intent-classified query routing. Sources: README.md

System Overview

Contextful functions as a local daemon that continuously indexes workspace files, extracts code symbols and import relationships, and provides a structured context pack API to agents. The architecture follows a three-layer design:

  1. Indexing Layer - File parsing, symbol extraction, edge detection
  2. Storage Layer - SQLite kernel with FTS5 search and graph tables
  3. Query Layer - Intent classification, ranked search, evidence pack assembly

Sources: src/indexer.ts

Component Architecture

graph TD
    A[Workspace Files] --> B[Indexer]
    B --> C[Symbol Extraction]
    B --> D[Edge Detection]
    B --> E[Chunk Generation]
    C --> F[SQLite Kernel DB]
    D --> F
    E --> F
    G[CLI / MCP Server] --> H[Search Module]
    H --> F
    H --> I[Context Pack Assembly]
    I --> J[Evidence Pack Output]

Core Components

ComponentFileResponsibility
Indexersrc/indexer.tsRecursively walks workspace, triggers file processing
Extractorsrc/extract.tsParses symbols, edges, and code chunks per file
Searchsrc/search.tsFTS5 queries, intent classification, ranking
CLIsrc/cli.tsCommand-line interface and MCP server entry point
Reportsrc/report.tsGenerates workspace context reports

Sources: src/indexer.ts, src/extract.ts, src/search.ts

Indexing Pipeline

The indexing pipeline processes workspace files through multiple extraction stages. Each source file is read, classified by language, and passed through specialized extractors that produce structured records.

graph LR
    A[File Content] --> B[Language Detection]
    B --> C[Symbol Extraction]
    B --> D[Edge Extraction]
    B --> E[Chunk Extraction]
    C --> F[symbols table]
    D --> G[edges table]
    E --> H[chunks_fts table]

Symbol Extraction

The extractSymbols function identifies named code entities based on language-specific patterns:

LanguageSupported Symbols
TypeScript/JavaScriptfunctions, classes, interfaces, types, const arrow functions
Pythonfunctions, classes
Gofunctions, structs, interfaces
Rustfunctions, structs, enums, traits, impl blocks
Markdownheadings
JSONconfig keys

Sources: src/extract.ts:1-80

Edge Detection

Import relationships are tracked as directed edges between modules. The extractEdges function processes different import syntaxes per language:

  • TypeScript/JavaScript: ES6 import and require() statements
  • Python: from ... import and import statements
  • Go: Import strings within double quotes
  • Rust: use and mod declarations
  • JSON: Top-level keys in configuration files

Sources: src/extract.ts:100-160

Chunk Generation

Code files are split into semantic chunks for full-text search. The codeChunks function segments content into logical blocks based on:

  • Empty line boundaries
  • Token budget (target: ~300 tokens per chunk)
  • Language-specific token estimation via estimateTokens

Sources: src/extract.ts:180-220

Storage Layer

SQLite Kernel Schema

The kernel database uses SQLite with several specialized tables:

TablePurposeKey Columns
filesTracked workspace filespath, language, hash, indexed_at
symbolsExtracted code symbolsref, name, kind, file_path, line, signature, exported
edgesImport/dependency graphsource_file, target_name, target_type, edge_type, line
chunks_ftsFTS5 virtual table for full-text searchref, path, title, text, kind
memoryEvidence-backed lessonsid, claim, scope, confidence, created_at

Sources: src/search.ts, src/indexer.ts

Query and Search System

Intent Classification

Queries are classified into intents to optimize search strategy:

IntentTrigger KeywordsSearch Focus
codefunction, class, implementationSymbol and code chunks
memorymemory, lesson, sessionMemory ledger
impactimpact, depends on, blast radiusDependency graph
historicalwhy, changed, commitGit history
architecturalarchitecture, flow, path, traceGraph traversal
docsdocumentation, readme, guideMarkdown chunks
exactsymbols, paths, line referencesPrecise symbol matching
vagueDefault fallbackBroad FTS search

Sources: src/search.ts:1-50

Context Pack Assembly

The createContextPack function orchestrates the evidence gathering:

  1. Classify query intent
  2. Execute FTS5 search across chunks
  3. Apply query expansion with domain-specific term additions
  4. Score and rank hits using BM25 with intent-based bonuses
  5. Select hits within token budget
  6. Load related symbols and graph paths
  7. Assemble and return EvidencePack

Sources: src/search.ts:200-280

CLI and MCP Integration

Command Structure

CommandPurposeKey Options
indexInitial workspace indexing--workspace, --watch
daemonContinuous indexing with file watching--workspace
queryGenerate evidence pack--workspace, --budget, --json
searchDirect search without packing--workspace, --limit, --kind
reportGenerate context report--workspace, --format
memory addStore evidence-backed lessons--claim, --evidence, --scope
serverStart MCP stdio server(none)

Sources: src/cli.ts:20-100

MCP Server Tools

The MCP server exposes standardized tools for agent integration:

  • context_pack(query, budget, scope) - Primary killer tool returning ranked, cited evidence
  • search_code(query, mode, filters) - Code, docs, symbol, and memory search
  • trace_path(from, to, edge_types) - Graph traversal across the codebase
  • impact_analysis(symbol_or_file) - Reverse dependency analysis
  • why_changed(symbol_or_file) - Git history with current evidence
  • recall_memory(query, scope) - Search persistent lessons
  • write_lesson(claim, evidence_refs, scope) - Store new memories

Sources: README.md

Report Generation

The report system aggregates workspace statistics and warnings:

graph TD
    A[generateReport] --> B[Index Status Check]
    B --> C[File Statistics]
    B --> D[Symbol Statistics]
    B --> E[Edge Statistics]
    B --> F[Warning Collection]
    C --> G[renderMarkdown / renderHtml]
    D --> G
    E --> G
    F --> G

Reports support three output formats:

  • markdown - Plain text with markdown headings
  • json - Structured JSON with all report fields
  • html - Self-contained HTML document with styling

Sources: src/report.ts:1-80

Privacy and Security

Contextful operates entirely locally with no external API calls:

  • No embedding API calls for vector search
  • No source code uploads
  • No file editing or auto-fixes
  • No dependency installation in target workspace

Evidence references are validated and stale references are rejected to maintain integrity of the memory system.

Sources: README.md

Data Flow Summary

sequenceDiagram
    participant User
    participant CLI as CLI/MCP Server
    participant Indexer
    participant Extractor
    participant Search
    participant Kernel as SQLite Kernel
    
    User->>CLI: index --workspace .
    CLI->>Indexer: indexWorkspace()
    Indexer->>Extractor: extractFile()
    Extractor->>Kernel: Insert symbols, edges, chunks
    Kernel-->>Indexer: Confirmation
    
    User->>CLI: query "where is auth handled"
    CLI->>Search: searchContext()
    Search->>Kernel: FTS5 query
    Search->>Kernel: Graph traversal
    Search->>Kernel: Memory search
    Kernel-->>Search: Ranked hits
    Search-->>CLI: EvidencePack
    CLI-->>User: Compact context output

Key Design Decisions

DecisionRationale
SQLite FTS5 over vector embeddingsLocal-only operation, no external API dependencies
Intent-based query routingOptimizes search strategy based on query semantics
BM25 scoring with bonusesBalances relevance with domain-specific priorities
Token-budgeted evidence packsPrevents context overflow in LLM contexts
Evidence refs as first-class citizensEnables verifiable, traceable AI responses

Sources: src/search.ts:50-150, src/util.ts

Sources: src/indexer.ts

Runtime Components

Related topics: High-Level Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: High-Level Architecture

Runtime Components

Overview

The Runtime Components in Contextful encompass the services, daemons, and server processes that enable real-time code indexing, search, and context-aware information retrieval. These components operate as the execution layer of the application, providing persistent indexing, live workspace monitoring, and MCP (Model Context Protocol) server capabilities for AI agent integration.

The runtime layer bridges the gap between static code analysis and dynamic query resolution, allowing users and AI agents to query indexed repositories with token-budgeted evidence packs.

Source: https://github.com/Inferensys/contextful / Human Manual

Search Engine

Related topics: Context Packs, SQLite Database Schema

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Components

Continue reading this section for the full explanation and source context.

Section SearchIntent Types

Continue reading this section for the full explanation and source context.

Section Main Search Pipeline

Continue reading this section for the full explanation and source context.

Related topics: Context Packs, SQLite Database Schema

Search Engine

Overview

The Search Engine is the core retrieval system in Contextful, designed to provide intelligent, evidence-backed context for agent queries. It combines full-text search (FTS), symbol indexing, dependency graph traversal, and memory recall to deliver ranked, cited results within a configurable token budget.

The system serves as the foundation for multiple interfaces: CLI commands (query, search), MCP server tools (search_code, context_pack), and report generation.

Sources: src/search.ts:1-50

Architecture

graph TD
    A[Query Input] --> B[Query Classification]
    B --> C{Intent Type}
    C -->|code/docs| D[Full-Text Search]
    C -->|symbols| E[Symbol Lookup]
    C -->|memory| F[Memory Ledger Search]
    C -->|impact| G[Graph Traversal]
    C -->|historical| H[Git History + Search]
    D --> I[BM25 Ranking]
    E --> J[Symbol Index]
    F --> K[Memory DB]
    G --> L[Edge Database]
    H --> M[Git Operations]
    I --> N[Result Scoring]
    J --> N
    K --> N
    L --> N
    N --> O[Context Pack]

Core Components

ComponentFileResponsibility
Search Kernelsrc/search.tsCore search logic and ranking
Query Classifiersrc/search.tsIntent detection
FTS Enginesrc/search.tsFull-text search using SQLite FTS5
Graph Tracersrc/search.tsDependency graph traversal
Memory Storesrc/memory.tsEvidence-backed memory recall

Sources: src/search.ts:50-120

Query Classification

The search engine classifies each query into one of seven intent types to optimize retrieval strategy.

SearchIntent Types

IntentTrigger KeywordsSearch Strategy
codecode, function, class, implFTS + symbol lookup
docsresource, docs, readme, how toFTS on markdown/json
symbolsdefine, interface, type, symbolDirect symbol index
memoryremember, lesson, learned, sessionMemory ledger query
impactimpact, affected, depends, blast radiusReverse dependency graph
historicalwhy, changed, commit, historyGit history + current search
architecturalarchitecture, flow, trace, connectsGraph path analysis
exactCode patterns, paths, line refsDirect file/symbol lookup
vagueDefaultBroad FTS + graph
function classifyQuery(query: string): SearchIntent {
  const q = query.toLowerCase();
  if (/\b(code|function|class|implement|module)\b/.test(q)) return "code";
  if (/\b(define|interface|type|symbol)\b/.test(q)) return "symbols";
  if (/\b(memory|remember|lesson|learned|sessions?)\b/.test(q)) return "memory";
  // ... additional classifications
}

Sources: src/search.ts:1-30

Search Flow

Main Search Pipeline

sequenceDiagram
    participant CLI as CLI/MCP
    participant Search as searchContext()
    participant Kernel as Kernel DB
    participant FTS as FTS5 Engine
    participant Graph as Graph DB
    participant Memory as Memory Store

    CLI->>Search: query, workspace, limit
    Search->>Kernel: ensureIndexed()
    Search->>Kernel: addQuery()
    Search->>FTS: ftsQuery(expandedTerms)
    FTS-->>Search: ranked rows (BM25)
    Search->>Search: scoreFromRank()
    Search->>Graph: loadGraphPaths()
    Search-->>CLI: {intent, hits}

Full-Text Search Query Builder

The ftsQuery function transforms user queries into FTS5-compatible search strings:

function ftsQuery(query: string): string {
  const terms = expandedTerms(query);
  return Array.from(new Set(terms.map((term) => term.toLowerCase())))
    .filter((term) => !STOPWORDS.has(term))
    .slice(0, 14)
    .map((term) => `${term}*`)
    .join(" OR ");
}

Key behaviors:

  • Expands terms based on query context (e.g., "tool" → "server", "tool", "callTool")
  • Filters stopwords: where, what, which, when, how, are, the, for, with, and, or, to
  • Limits to 14 terms maximum
  • Appends wildcard * for prefix matching

Sources: src/search.ts:200-280

Scoring System

Rank-to-Score Transformation

The scoreFromRank function converts BM25 ranks into relevance scores (0-10 scale) with domain-specific bonuses:

function scoreFromRank(rank: number, query: string, corpus: string): number {
  const base = 10 / (1 + Math.abs(rank));
  let bonus = 0;
  
  // Domain-specific bonuses
  if (/\b(tool|tools|registered|register)\b/.test(q) && corpus.includes("server.tool(")) {
    bonus += 9;
  }
  if (/\bmcp\b/.test(q) && corpus.includes("mcp-server")) {
    bonus += 4;
  }
  
  return clamp(base + bonus, 0.1, 10);
}

Scoring Bonuses Matrix

Query PatternContent MatchBonus
tool/tools/registerserver.tool(+9
mcpmcp-server+4
where registeredfunction runMcpServer+4
tool querysrc/search.ts-8
memory querysrc/memory.ts+5
memory querysrc/search.ts-16

This anti-gaming mechanism penalizes results from the search implementation itself when irrelevant.

Sources: src/search.ts:240-320

Term Expansion

The expandedTerms function intelligently expands query terms based on semantic context:

function expandedTerms(query: string): string[] {
  const lower = query.toLowerCase();
  const additions: string[] = [];
  
  if (/\b(tool|tools|registered|register)\b/.test(lower)) {
    additions.push("server", "tool", "tools", "callTool");
  }
  if (/\bmcp\b/.test(lower)) {
    additions.push("mcp", "server", "stdio");
  }
  if (/\bmemory|memories|remember|remembers|lesson|lessons|learned|session|sessions\b/.test(lower)) {
    additions.push("memory", "memories", "lesson", "lessons", "claim", "ledger", "evidence");
  }
  if (/\bimpact|depends|dependents|uses\b/.test(lower)) {
    additions.push("imports", "tests", "edges");
  }
  
  return [...terms, ...additions];
}

Sources: src/search.ts:320-380

CLI Commands

Query Command

cxf query "<query>" --workspace <path> --budget <tokens> --json
OptionTypeDefaultDescription
querystringrequiredQuery to answer from indexed context
--workspacepathcwd()Workspace path
--budgetnumber2000Approximate token budget
--jsonflagfalseOutput JSON instead of Markdown

Search Command

cxf search "<query>" --workspace <path> --limit <count> --kind <kind>
OptionTypeDefaultDescription
querystringrequiredSearch query
--workspacepathcwd()Workspace path
--limitnumber10Maximum hits
--kindenumallSearch category: `all\code\docs\symbols\memory`

Sources: src/cli.ts:40-80

MCP Server Tools

The search engine exposes the following MCP tools:

search_code

server.tool("search_code", "Search indexed code, docs, symbols, and stored context", {
  query: z.string(),
  mode: z.enum(["all", "code", "docs", "symbols", "memory"]).optional(),
  limit: z.number().optional(),
  filters: z.record(z.string(), z.unknown()).optional()
});

trace_path

server.tool("trace_path", "Trace graph relationships between files, symbols, modules", {
  from: z.string(),
  to: z.string().optional(),
  edge_types: z.array(z.string()).optional(),
  limit: z.number().optional()
});

impact_analysis

server.tool("impact_analysis", "Find likely dependents and tests", {
  symbol_or_file: z.string(),
  limit: z.number().optional()
});

why_changed

server.tool("why_changed", "Explain why a file/symbol may have changed", {
  symbol_or_file: z.string(),
  limit: z.number().optional()
});

Sources: src/mcp-server.ts:1-80

Context Pack

The createContextPack function assembles comprehensive evidence bundles:

export async function createContextPack(options: {
  workspace?: string;
  query: string;
  budget?: number;
  scope?: string;
}): Promise<EvidencePack>

EvidencePack Structure

FieldTypeDescription
idstringUnique pack identifier (ctx_<hash>)
querystringOriginal query
scopestringSearch scope (default: repo)
intentSearchIntentClassified intent
summarystringHuman-readable summary
citationsSearchHit[]Ranked search results
filesFileContext[]Grouped file references
symbolsSymbolRecord[]Relevant symbols (≤20)
graphPathsGraphPath[]Dependency connections (≤20)
memoryHitsSearchHit[]Memory matches
confidencenumberConfidence score (0.1-0.92)
tokenEstimatenumberEstimated token count
budgetnumberToken budget used
createdAtstringISO timestamp

Confidence Calculation

function confidenceFor(hits: SearchHit[], graphPaths: GraphPath[], memoryHits: SearchHit[]): number {
  return clamp(
    0.25 + 
    hits.length * 0.05 + 
    graphPaths.length * 0.02 + 
    memoryHits.length * 0.05,
    0.1,
    0.92
  );
}

Sources: src/search.ts:400-480

Graph Traversal

The traceGraph function performs dependency graph analysis:

export async function traceGraph(options: {
  workspace?: string;
  from: string;
  to?: string;
  edgeTypes?: string[];
  limit?: number;
}): Promise<GraphPath[]>

Edge Types

Edge TypeDirectionDescription
IMPORTSFile → ModuleImport/require statements
DEFINESFile → SymbolSymbol definitions
CONFIGURESFile → ConfigConfiguration keys
TESTSTest → SourceTest file relationships

Impact Analysis

export async function impactAnalysis(options: {
  workspace?: string;
  target: string;
  limit?: number;
}): Promise<{
  target: string;
  forward: string[];
  reverse: string[];
  tests: string[];
}>

Returns forward dependencies, reverse dependents, and likely test files for a given symbol or file.

Sources: src/search.ts:480-550

Utility Functions

lineRange

Extracts a specific line range from text:

export function lineRange(text: string, startLine: number, endLine: number): string {
  const lines = text.split(/\r?\n/);
  return lines.slice(Math.max(0, startLine - 1), Math.min(lines.length, endLine)).join("\n");
}

clamp

Constrains values within bounds:

export function clamp(value: number, min: number, max: number): number {
  return Math.max(min, Math.min(max, value));
}

unique

Deduplicates arrays:

export function unique<T>(items: T[]): T[] {
  return Array.from(new Set(items));
}

isLikelyBinary

Detects binary files by checking for null bytes:

export function isLikelyBinary(buffer: Buffer): boolean {
  const sample = buffer.subarray(0, Math.min(buffer.length, 4096));
  return sample.includes(0);
}

Sources: src/util.ts:1-50

Data Models

SearchHit

interface SearchHit {
  ref: string;        // Format: "file:path:start-end"
  path: string;       // File path
  kind: string;       // "chunk", "symbol", "memory", "doc"
  title: string;      // Display title
  text: string;       // Content snippet
  score: number;      // Relevance score
  line?: number;      // Starting line number
}

SymbolRecord

interface SymbolRecord {
  ref: string;
  name: string;
  kind: string;       // "function", "class", "interface", "type", etc.
  filePath: string;
  line: number;
  signature?: string;
  exported?: boolean;
}

Sources: src/search.ts:100-150

Index Status

The getIndexStatus function returns workspace indexing metadata:

export async function getIndexStatus(options: { workspace?: string }): Promise<IndexStatus>

IndexStatus Structure

FieldTypeDescription
workspacestringWorkspace path
languageCountsRecord<string, number>File count per language
warningsstring[]Index warnings
lastIndexedstringISO timestamp of last index
totalChunksnumberTotal indexed chunks

Sources: src/search.ts:550-600

Summary

The Search Engine provides Contextful's intelligent retrieval capabilities through:

  1. Intent Classification - Automatically routes queries to optimal search strategies
  2. Full-Text Search - SQLite FTS5 with BM25 ranking and domain-specific scoring
  3. Symbol Index - Fast lookup of code definitions across languages
  4. Graph Traversal - Dependency analysis and impact tracking
  5. Memory Integration - Recall of past lessons and evidence-backed claims
  6. Token Budgeting - Constrains output to specified budget limits
  7. Confidence Scoring - Quantifies result reliability

All search operations flow through a unified kernel database that combines FTS chunks, symbol records, and edge relationships for comprehensive context retrieval.

Sources: src/search.ts:1-50

Context Packs

Related topics: Search Engine, Memory Ledger

Section Related Pages

Continue reading this section for the full explanation and source context.

Section EvidencePack Structure

Continue reading this section for the full explanation and source context.

Section SearchHit Structure

Continue reading this section for the full explanation and source context.

Section SearchIntent Enum

Continue reading this section for the full explanation and source context.

Related topics: Search Engine, Memory Ledger

Context Packs

Context Packs are the core output format of Contextful, providing AI agents with compact, ranked, and cited evidence bundles that fit within a specified token budget. Instead of forcing agents to read dozens of arbitrary files, Context Packs deliver precisely the evidence needed to answer a specific query.

Overview

A Context Pack is a structured evidence package generated by the context_pack() MCP tool or the cxf query CLI command. It contains:

  • Ranked code and documentation citations matching the query
  • Related symbols (functions, classes, interfaces) from matching files
  • Graph paths connecting related components
  • Memory hits from evidence-backed lessons
  • A confidence score and token budget accounting

The pack is designed to be consumed directly by an LLM agent, providing traceable citations and a clear summary of what evidence was found.

Data Model

EvidencePack Structure

FieldTypeDescription
idstringUnique identifier (format: ctx_<hash>)
querystringThe original search query
scopestringSearch scope (e.g., "repo")
intentSearchIntentClassified query intent
summarystringHuman-readable summary of findings
citationsSearchHit[]Ranked evidence items
filesFileContext[]Grouped file references with reasons
symbolsSymbolRecord[]Relevant symbols from matched files
graphPathsGraphPath[]Graph traversals between components
memoryHitsSearchHit[]Memory/lesson hits
confidencenumberEstimated confidence (0.1-0.92)
tokenEstimatenumberEstimated token count of pack
budgetnumberRequested token budget
createdAtstringISO timestamp of creation

Sources: src/search.ts:search.ts

SearchHit Structure

FieldTypeDescription
refstringReference identifier (e.g., file:src/auth.ts:1-20)
pathstringFile path
titlestringDisplay title
kindstringHit kind: code, doc, symbol, memory
excerptstringRelevant text excerpt
scorenumberRelevance score
ranknumberBM25 rank

SearchIntent Enum

IntentTrigger Keywords
exactCode patterns, paths, symbol names with special chars
symbolFunction names, class names, method calls
testtest, spec, mock, fixture, unit
memorymemory, lesson, learned, session
impactimpact, affected, depends, blast radius
historicalwhy, changed, commit, history, regression
architecturalarchitecture, flow, trace, connects, imports
docsresource, docs, documentation, guide, readme
vagueDefault for generic queries

Sources: src/search.ts:search.ts

Creation Flow

The createContextPack function orchestrates the entire pack creation process:

graph TD
    A[createContextPack] --> B[searchContext]
    B --> C[classifyQuery]
    C --> D[ftsQuery + expandedTerms]
    D --> E[FTS Search on chunks_fts]
    E --> F[scoreFromRank]
    F --> G[Select Hits within Budget]
    G --> H[loadSymbolsForPaths]
    G --> I[loadGraphPaths]
    G --> J[Filter memoryHits]
    H --> K[Build EvidencePack]
    I --> K
    J --> K
    K --> L[saveEvidencePack]
    L --> M[Return EvidencePack]

Step 1: Search Context

The process begins by classifying the query intent and executing full-text search:

const search = await searchContext({ workspace, query, limit: budget * 2 });
const selected = selectWithinBudget(search.hits, budget);

Sources: src/search.ts:search.ts

Step 2: Budget-Aware Selection

Hits are selected greedily until the token estimate exceeds the budget:

function selectWithinBudget(hits: SearchHit[], budget: number): SearchHit[] {
  const selected: SearchHit[] = [];
  let tokenEstimate = 0;
  for (const hit of hits) {
    const est = estimateTokens(hit.excerpt || hit.title);
    if (tokenEstimate + est >= budget) break;
    selected.push(hit);
    tokenEstimate += est;
  }
  return selected;
}

Sources: src/search.ts:search.ts

Step 3: Symbol Loading

For each selected file, related symbols are loaded (up to 20 total):

const symbols = loadSymbolsForPaths(kernel.db, paths).slice(0, 20);

The symbols query joins against the symbols table:

SELECT ref, name, kind, file_path, line, signature, exported 
FROM symbols 
WHERE file_path IN (...)

Sources: src/search.ts:search.ts

Step 4: Graph Path Loading

Graph paths connect files through import/dependency relationships:

const graphPaths = loadGraphPaths(kernel.db, paths, 20);

Sources: src/search.ts:search.ts

Step 5: Memory Hit Extraction

Memory hits are filtered from selected hits by kind:

const memoryHits = selected.filter((hit) => hit.kind === "memory");

Step 6: Confidence Calculation

Confidence is calculated using a clamped formula:

function confidenceFor(hits, graphPaths, memoryHits): number {
  return clamp(
    0.25 + hits.length * 0.05 + graphPaths.length * 0.02 + memoryHits.length * 0.05,
    0.1,
    0.92
  );
}
  • Base: 0.25
  • Each hit: +0.05
  • Each graph path: +0.02
  • Each memory hit: +0.05
  • Clamped to [0.1, 0.92]

Sources: src/search.ts:search.ts

Query Classification

The classifyQuery function determines the search intent based on keywords:

function classifyQuery(q: string): SearchIntent {
  const lower = q.toLowerCase();
  if (/[`"'#.:/]/.test(q) || /\b[A-Z][A-Za-z0-9_]{2,}\b/.test(q)) return "exact";
  if (/\b(test|spec|mock|fixture)\b/.test(q)) return "test";
  if (/\b(memory|lesson|learned|session|sessions)\b/.test(q)) return "memory";
  if (/\b(impact|affected|depends|dependents|blast radius)\b/.test(q)) return "impact";
  if (/\b(why|changed|commit|history|regression|introduced)\b/.test(q)) return "historical";
  if (/\b(architecture|flow|path|trace|connects|calls|imports)\b/.test(q)) return "architectural";
  if (/\b(resource|docs|documentation|guide|readme|how to|setup)\b/.test(q)) return "docs";
  return "vague";
}

Sources: src/search.ts:search.ts

Term Expansion

The expandedTerms function adds related terms to improve recall for specific domains:

function expandedTerms(query: string): string[] {
  const additions: string[] = [];
  if (/\b(tool|tools|registered|register)\b/.test(lower)) {
    additions.push("server", "tool", "tools", "callTool");
  }
  if (/\bmcp\b/.test(lower)) {
    additions.push("mcp", "server", "stdio");
  }
  if (/\bmemory|memories|remember|remembers|lesson|lessons\b/.test(lower)) {
    additions.push("memory", "memories", "lesson", "lessons", "claim", "ledger", "evidence");
  }
  if (/\bimpact|depends|dependents|uses\b/.test(lower)) {
    additions.push("imports", "tests", "edges");
  }
  return [...terms, ...additions];
}

Sources: src/search.ts:search.ts

Scoring Algorithm

The scoreFromRank function calculates relevance scores:

function scoreFromRank(rank: number, q: string): number {
  let bonus = 0;
  const lower = q.toLowerCase();
  
  if (/\bmemory|memories|remember|remembers|lesson|lessons|sessions\b/.test(q)) {
    if (lower.includes("memory ledger")) bonus += 7;
    if (lower.includes("src/memory.ts")) bonus += 5;
    if (lower.includes("readme.md")) bonus += 4;
    if (lower.includes("src/search.ts")) bonus -= 16;
  }
  if (/\b(where|how)\b/.test(q) && lower.includes("config-key")) bonus -= 2;
  
  return 10 / (1 + Math.abs(rank)) + bonus;
}

Sources: src/search.ts:search.ts

CLI Usage

The query command creates Context Packs via CLI:

cxf query "<query>" --workspace <path> --budget 2000 --json

Options

OptionTypeDefaultDescription
--workspacepathcwdWorkspace path
--budgetnumber2000Approximate token budget
--jsonflagfalseOutput as JSON instead of Markdown

Example Output

# Context Pack ctx_abc123

Query: where is user auth handled
Intent: architectural
Confidence: 65%
Token estimate: 1850/2000

Found 5 evidence items for a architectural query, with 2 graph connections and 1 memory hit.

## Citations
- file:src/auth.ts:1-50 (auth module)
  Handles user authentication via JWT tokens...
- file:src/middleware/auth.ts:1-30 (auth middleware)
  Express middleware for auth validation...

## Graph Paths
- src/auth.ts --IMPORTS--> src/utils/jwt.ts (src/auth.ts:5)
- src/middleware/auth.ts --IMPORTS--> src/auth.ts (src/middleware/auth.ts:3)

## Memory Hits
- memory:lesson:1: JWT tokens should be validated on every protected route.

Sources: src/cli.ts:cli.ts

Rendering

Context Packs can be rendered in multiple formats via renderEvidencePackMarkdown:

export function renderEvidencePackMarkdown(pack: EvidencePack): string {
  const lines = [
    `# Context Pack ${pack.id}`,
    "",
    `Query: ${pack.query}`,
    `Intent: ${pack.intent}`,
    `Confidence: ${Math.round(pack.confidence * 100)}%`,
    `Token estimate: ${pack.tokenEstimate}/${pack.budget}`,
    "",
    pack.summary,
    "",
    "## Citations"
  ];
  // ... citations, graph paths, memory hits
}

Sources: src/report.ts:report.ts

Chunk Extraction

Contextual chunks are extracted during indexing for searchability:

graph LR
    A[Source File] --> B[Language Detection]
    B --> C[extractSymbols]
    B --> D[extractEdges]
    B --> E[extractChunks]
    C --> F[Symbol Table]
    D --> G[Edge Table]
    E --> H[Chunk Table]

Supported Languages

LanguageSymbol Patterns
TypeScript/JavaScriptfunction, class, interface, type, const arrow
Pythondef, class
Gofunc, type struct/interface
Rustfn, struct, enum, trait, impl
Markdownheadings (H1-H6)
JSONtop-level keys

Sources: src/extract.ts:extract.ts

Chunking Strategy

  • Code files: Divided into blocks of ~60 lines, with overlap for context
  • Markdown files: Split by headings, with the heading as the chunk title
  • Token estimation: Used for both selection and budget accounting
function codeChunks(relativePath: string, content: string): ChunkRecord[] {
  const lines = content.split(/\r?\n/);
  const chunks: ChunkRecord[] = [];
  // Split into ~60-line blocks with overlap
  for (let start = 1; start <= lines.length; start += 50) {
    const end = Math.min(start + 60 - 1, lines.length);
    const text = lineRange(content, start, end);
    chunks.push({
      ref: fileRef(relativePath, start, end),
      filePath: relativePath,
      startLine: start,
      endLine: end,
      kind: "file",
      title: `${relativePath}:${start}-${end}`,
      text,
      tokenEstimate: estimateTokens(text)
    });
  }
  return chunks;
}

Sources: src/extract.ts:extract.ts

Summary Generation

The summarizePack function generates human-readable summaries:

function summarizePack(
  query: string,
  intent: SearchIntent,
  hits: SearchHit[],
  graphPaths: GraphPath[],
  memoryHits: SearchHit[]
): string {
  if (hits.length === 0) {
    return `No indexed evidence matched "${query}". Re-index or broaden the query.`;
  }
  return `Found ${hits.length} evidence item${hits.length === 1 ? "" : "s"} ` +
    `for a ${intent} query, with ${graphPaths.length} graph connection${graphPaths.length === 1 ? "" : "s"} ` +
    `and ${memoryHits.length} memory hit${memoryHits.length === 1 ? "" : "s"}.`;
}

Sources: src/search.ts:search.ts

Persistence

Evidence packs are saved to the kernel database for audit and retrieval:

saveEvidencePack(kernel.db, { 
  id: pack.id, 
  query: pack.query, 
  tokenEstimate, 
  json: JSON.stringify(pack) 
});

Sources: src/search.ts:search.ts

Design Principles

  1. Token budget awareness: Never exceed the requested budget; select the most relevant items first
  2. Cited evidence: Every piece of information is traceable to a specific file and line range
  3. Intent-driven: Query classification shapes what gets searched and how results are interpreted
  4. Graph connectivity: Beyond matching files, show how they connect through imports and dependencies
  5. Memory integration: Blend indexed content with evidence-backed lessons from prior sessions

Sources: src/search.ts:search.ts

Memory Ledger

Related topics: Context Packs, Search Engine

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Components

Continue reading this section for the full explanation and source context.

Section Memory Record Structure

Continue reading this section for the full explanation and source context.

Section Evidence Reference Formats

Continue reading this section for the full explanation and source context.

Related topics: Context Packs, Search Engine

Memory Ledger

The Memory Ledger is Contextful's evidence-backed persistent memory system that enables AI agents to retain and recall learned lessons across sessions. Unlike ephemeral context that disappears when a session ends, the Memory Ledger stores structured knowledge annotated with source evidence, allowing agents to build cumulative understanding of a codebase over time.

Overview

The Memory Ledger solves a fundamental problem in AI-assisted development: knowledge gained during one session is lost in the next. When an agent discovers how authentication works, identifies a fragile dependency, or learns a non-obvious architectural pattern, that knowledge typically vanishes when the session ends.

Contextful's approach requires every stored memory to be anchored to concrete evidence—file references, code symbols, or prior context packs. This design prevents hallucinated or unsubstantiated memories from polluting the knowledge base and ensures that recalled lessons can be traced back to their source.

The system operates entirely locally with no external API calls, embedding services, or cloud dependencies. All memory data remains within the workspace's SQLite database.

Architecture

graph TD
    A[Agent Session] -->|write_lesson| B[Memory Ledger]
    A -->|recall_memory| C[Memory Search]
    B -->|evidence refs| D[Evidence Pack]
    C -->|cited memories| A
    D -->|citations| E[Source Files]
    F[Workspace DB] -->|stores| B
    F -->|stores| C

Core Components

ComponentRoleSource
Memory StorageSQLite-backed persistent storage for lessonssrc/db.ts
Memory SearchFTS-enabled retrieval of memories by querysrc/search.ts
Evidence ValidationEnsures evidence refs are valid before storagesrc/mcp-server.ts
Confidence ScoringAssigns credibility scores to stored memoriessrc/cli.ts:85

Data Model

Memory Record Structure

Each memory in the ledger contains the following fields:

FieldTypeDescription
idstringUnique identifier (prefixed with memory:)
claimstringThe substantive lesson or observation
scopestringGranularity level: repo, file, symbol, or session
evidenceRefsstring[]Validated references to source evidence
confidencenumberCredibility score from 0.0 to 1.0
statusstringCurrent state: active, superseded, or stale
supersedesstring?ID of the memory this replaces (if any)

Evidence Reference Formats

Valid evidence references that can be attached to memories:

FormatExamplePurpose
File rangefile:src/auth.ts:10-40Reference specific lines in a file
Symbolsymbol:src/auth.ts#AuthService:12Point to a specific code symbol
Context packpack:ctx_abc123Reference a prior evidence pack

Sources: README.md:54-56

Evidence references must come from search results or context packs—arbitrary references are rejected. This prevents storing claims without verifiable backing.

Memory Scopes

The scope field determines the durability and applicability of a memory:

ScopeDescriptionPersistence
repoProject-wide lessons applicable across sessionsPermanent
fileFile-specific knowledgePermanent
symbolSymbol-level lessonsPermanent
sessionEphemeral session-scoped learningsLost on session end

The default scope is repo, reflecting the assumption that most valuable memories have project-wide relevance.

Sources: src/cli.ts:73

Writing Memories

CLI Usage

cxf memory add \
  --claim "AuthService.validateToken() throws on expired tokens without catching" \
  --evidence "file:src/auth.ts:45-67" \
  --evidence "file:src/api/middleware.ts:12-20" \
  --confidence 0.85 \
  --scope repo

MCP Tool Usage

await server.callTool("write_lesson", {
  claim: "The payment module requires initialization before use",
  evidence_refs: ["file:src/payment/core.ts:10-30", "symbol:src/payment/core.ts#initialize:15"],
  scope: "repo",
  confidence: 0.9
});

Sources: src/mcp-server.ts:79-94

Validation Rules

Memories are subject to strict validation:

  1. Evidence required: At least one valid evidence reference must be provided
  2. Evidence must be fresh: References must originate from search results or context packs
  3. Claim must be substantive: Empty or trivial claims are rejected
  4. Confidence in valid range: Must be between 0.0 and 1.0

Searching Memories

Intent Classification

Contextful automatically classifies queries to determine when to search memories. The query classifier recognizes memory-related intents through keyword detection:

const memoryPattern = /\bmemory|memories|remember|remembers|lesson|lessons|learned|session|sessions\b/;

When matched, the classifier returns intent: "memory" and the search system automatically queries the memories FTS index.

Sources: src/search.ts:14-17

Query Expansion

Memory searches benefit from automatic term expansion. When a query mentions relevant concepts, additional search terms are added:

if (/\bmemory|memories|remember|remembers|lesson|lessons|learned|session|sessions\b/.test(lower)) {
  additions.push("memory", "memories", "lesson", "lessons", "claim", "ledger", "evidence");
}

This ensures that queries like "what did we learn about auth" retrieve memory results even if those exact words don't appear in the stored claims.

Sources: src/search.ts:28-30

Search Results

Memory hits in search results include:

FieldDescription
refMemory reference in format memory:<id>
kindAlways "memory" for memory hits
titleDisplay title including scope
excerptRedacted claim text (secrets removed)
evidenceOriginal evidence references
statusCurrent memory status
scoreRelevance score

Memory Lifecycle

stateDiagram-v2
    [*] --> Active: write_lesson
    Active --> Superseded: write_lesson with supersedes
    Active --> Stale: Evidence becomes invalid
    Superseded --> [*]
    Stale --> [*]
    Active --> [*]: Deleted

Status Transitions

Active → Default state for newly written memories. Active memories are returned in search results and can supersede other memories.

Superseded → When a newer, more accurate memory replaces an older one, the superseded memory retains its ID and evidence but is excluded from search results. The supersedes field links to the replaced memory.

Stale → Memories become stale when their evidence references point to files or symbols that have changed significantly since the memory was written. The reporting system tracks stale memories for review.

Sources: src/report.ts:54-58

Integration with Context Packs

The Memory Ledger integrates with Contextful's evidence pack system:

  1. Before writing: Search context or create a context pack to get evidence references
  2. Writing lessons: Use those evidence refs to anchor the memory claim
  3. Recalling: Later sessions query the ledger, retrieving cited memories
// During a session: create pack, identify lessons
const pack = await createContextPack({ query: "how is auth handled", budget: 2000 });

// Later session: recall what was learned
const result = await recallMemory({ query: "auth patterns", scope: "repo" });

This bidirectional relationship means memories enhance future context packs, and context packs provide evidence for future memories.

Reporting

The report command includes memory statistics:

cxf report --workspace . --format markdown

Output includes a "Stale Memories" section listing memories whose evidence references may no longer be valid:

## Stale Memories
- memory_abc123: AuthService.validateToken() behavior changed in v2
- memory_def456: payment module initialization order is now reversed

Sources: src/report.ts:54-58

Configuration Options

OptionCLI FlagDefaultDescription
Workspace--workspaceprocess.cwd()Path to workspace with memory database
Claim--claimrequiredThe memory content
Evidence--evidencerequiredOne or more evidence refs
Scope--scoperepoMemory scope level
Confidence--confidence0.7Credibility score

Privacy Considerations

The Memory Ledger is designed with privacy as a core principle:

  • Local only: No data leaves the workspace
  • No cloud sync: Memories remain on the local machine
  • Evidence-linked: Claims cannot be stored without verifiable source
  • Content redaction: Secrets are automatically redacted from stored claims using pattern matching for emails, API keys, and tokens

Sources: src/util.ts:12-18

ToolPurpose
recall_memorySearch the memory ledger
write_lessonStore a new evidence-backed memory
context_packGenerate evidence packs that can feed into memories

Sources: README.md:35-40

Sources: README.md:54-56

Graph Traversal and Analysis

Related topics: Search Engine, SQLite Database Schema

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Data Flow

Continue reading this section for the full explanation and source context.

Section Core Types

Continue reading this section for the full explanation and source context.

Section Edge Types

Continue reading this section for the full explanation and source context.

Related topics: Search Engine, SQLite Database Schema

Graph Traversal and Analysis

Graph Traversal and Analysis is a core feature of Contextful that builds and queries a dependency graph from source code. This system tracks relationships between files, symbols, modules, and configuration nodes, enabling sophisticated impact analysis, change tracing, and dependency exploration.

Overview

Contextful extracts code relationships during indexing and stores them in a SQLite database as a traversable graph. This enables agents to answer questions like:

  • "What depends on this module?"
  • "What tests cover this file?"
  • "How does this symbol connect to other parts of the codebase?"

Sources: src/extract.ts:68-95

Architecture

graph TD
    A[Source Files] --> B[extractEdges]
    B --> C[GraphEdge Records]
    C --> D[SQLite Kernel DB]
    E[CLI/MCP Query] --> F[searchContext]
    F --> G[traceGraph]
    G --> H[GraphPath Results]
    F --> I[impactAnalysis]
    I --> J[Impact Results]
    F --> K[whyChanged]
    K --> L[Git History + Evidence]

Data Flow

  1. Extraction Phase: During workspace indexing, extractEdges() parses source files to identify relationships Sources: src/extract.ts:52-95
  2. Storage Phase: Edge data is stored in the edges table within the kernel SQLite database Sources: src/search.ts:1-30
  3. Query Phase: CLI commands and MCP tools query the graph using traversal algorithms Sources: src/search.ts:180-220

Graph Data Model

Core Types

interface GraphEdge {
  sourceType: "file" | "symbol";
  sourceName: string;
  targetType: "file" | "symbol" | "module" | "config";
  targetName: string;
  edgeType: EdgeType;
  filePath: string;
  line: number;
}

interface GraphPath {
  edges: Array<{
    sourceName: string;
    sourceType: string;
    edgeType: string;
    targetName: string;
    targetType: string;
  }>;
  totalHops: number;
}

interface GraphNode {
  name: string;
  type: "file" | "symbol" | "module" | "config";
  path?: string;
  kind?: string;
}

Sources: src/types.ts:45-70

Edge Types

Edge TypeDescriptionSource Detection
DEFINESFile defines a symbolFunction/class declarations
IMPORTSFile imports a moduleimport, require, from statements
CONFIGURESFile/config references a keyJSON keys, package.json fields
TESTSTest file tests importsAuto-generated for test files

Sources: src/extract.ts:75-100

Language-Specific Detection

The extraction layer supports multiple languages:

LanguageImport PatternsSymbol Patterns
TypeScript/JavaScriptfrom "module", require("module")export function/class/interface
Pythonfrom module importdef, class
Go"package"func, type struct/interface
Rustuse module;, mod name;fn, struct, enum, trait

Sources: src/extract.ts:70-95

Graph Traversal API

traceGraph

Performs graph traversal starting from a source node, optionally filtering by edge types and limiting results.

export async function traceGraph(options: {
  workspace?: string;
  from: string;
  to?: string;
  edgeTypes?: string[];
  limit?: number;
}): Promise<GraphPath[]>

#### Parameters

ParameterTypeRequiredDescription
workspacestringNoWorkspace path (defaults to CWD)
fromstringYesStarting node name
tostringNoTarget node for path finding
edgeTypesstring[]NoFilter by specific edge types
limitnumberNoMaximum paths to return (default: 10)

Sources: src/search.ts:180-190

loadGraphPaths

Loads graph paths from the database for a set of file paths.

function loadGraphPaths(
  db: Database,
  paths: string[],
  limit: number
): GraphPath[]

Sources: src/search.ts:60-80

Impact Analysis

Impact analysis identifies reverse dependencies—what depends on a given file or symbol—and finds relevant test coverage.

graph LR
    A[Target File/Symbol] --> B[Find All Edges Pointing TO Target]
    B --> C[Group by Source File]
    C --> D[Identify Test Files]
    D --> E[Return Impact Set]

impactAnalysis Function

export async function impactAnalysis(options: {
  workspace?: string;
  target: string;
  limit?: number;
}): Promise<ImpactResult>

#### Impact Result Structure

FieldTypeDescription
targetstringThe analyzed symbol or file
dependentsDependentInfo[]Files/symbols that depend on target
testsSearchHit[]Related test files
interface DependentInfo {
  path: string;
  type: string;
  imports: string[];
}

interface ImpactResult {
  target: string;
  dependents: DependentInfo[];
  tests: SearchHit[];
}

Sources: src/search.ts:130-175

Test Detection Logic

Test files are identified by path patterns and edges with TESTS type:

const testPaths = paths.filter(
  (path) => path.edgeType === "TESTS" || 
            /(^|\/)(tests?|__tests__)\/|(\.|-)(test|spec)\./.test(path.filePath)
);

Sources: src/search.ts:165-170

Change Analysis

whyChanged

Combines current code evidence with git history to explain why a file or symbol may have changed.

export async function whyChanged(options: {
  workspace?: string;
  target: string;
  limit?: number
}): Promise<{
  target: string;
  currentEvidence: SearchHit[];
  commits: Array<{
    hash: string;
    subject: string;
    date?: string;
    files: string[];
  }>;
}>

#### Workflow

graph TD
    A[whyChanged] --> B[searchContext for target]
    B --> C[Extract file paths from hits]
    C --> D[readGitHistory with file paths]
    D --> E[Combine evidence + commits]
    E --> F[Return structured result]

Sources: src/search.ts:200-230

Git History Integration

The system reads git history for affected files:

function readGitHistory(
  workspace: string,
  filePaths: string[],
  limit: number
): Array<{
  hash: string;
  subject: string;
  date?: string;
  files: string[];
}>

Sources: src/search.ts:85-100

CLI Commands

trace Command

cxf trace --from <symbol_or_file> [--to <target>] [--edge-types <types>] [--limit <count>]

#### Options

OptionTypeDefaultDescription
--fromstringRequiredStarting node
--tostring-Target node
--edge-typesstringallComma-separated edge types
--limitnumber10Maximum paths
--workspacestringCWDWorkspace path

Sources: src/cli.ts:45-60

report Command

Generates a comprehensive context report including graph statistics:

cxf report --workspace <path> --format markdown|json|html

#### Report Includes

  • Index status with graph node/edge counts
  • Top queries by intent type
  • Stale memory detection
  • Recent evidence packs

Sources: src/cli.ts:70-85

MCP Server Tools

Contextful exposes graph traversal as MCP tools for integration with AI coding assistants.

trace_path

{
  "name": "trace_path",
  "description": "Trace graph relationships between files, symbols, modules, and config nodes.",
  "inputSchema": {
    "from": "string",
    "to": "string (optional)",
    "edge_types": ["string"] (optional),
    "limit": "number (optional)"
  }
}

Sources: src/mcp-server.ts:45-55

impact_analysis

{
  "name": "impact_analysis",
  "description": "Find likely dependents and tests for a file, symbol, or module.",
  "inputSchema": {
    "symbol_or_file": "string",
    "limit": "number (optional)"
  }
}

Sources: src/mcp-server.ts:56-65

why_changed

{
  "name": "why_changed",
  "description": "Explain why a file or symbol may have changed by combining current evidence with git history.",
  "inputSchema": {
    "symbol_or_file": "string",
    "limit": "number (optional)"
  }
}

Sources: src/mcp-server.ts:66-75

Usage Examples

Direct CLI Usage

# Trace dependencies of auth module
cxf trace --from src/auth.ts --edge-types IMPORTS

# Find what tests cover a file
cxf impact --target src/parser.ts

# Get change history for a symbol
cxf why --target AuthService

MCP Integration

{
  "mcpServers": {
    "contextful": {
      "command": "npx",
      "args": ["-y", "@inferensys/contextful", "server"]
    }
  }
}
// In an MCP client
const result = await client.callTool("trace_path", {
  from: "src/auth.ts",
  to: "src/database.ts",
  edgeTypes: ["IMPORTS", "DEFINES"]
});

Query Intent Classification

Graph queries are automatically classified to route to appropriate traversal strategies:

IntentKeywordsGraph Relevance
architecturalarchitecture, flow, path, connects, callsHigh priority
impactimpact, affected, depends, blast radiusDirect edge query
historicalwhy, changed, history, regressionGraph + git history
exactSymbol names, file pathsSymbol-level traversal

Sources: src/search.ts:115-130

Limitations and Design Decisions

Privacy Guarantees

  • All processing is local-only
  • No external embedding APIs used
  • No source code upload
  • No file editing capabilities

Sources: README.md:45-50

v1 Scope Boundaries

  • Broken JSON during indexing produces warnings but continues processing
  • Syntax diagnostics are intentionally out of scope
  • Git history is read-only

Sources: src/extract.ts:120-125

Summary

The Graph Traversal and Analysis system in Contextful provides:

  1. Automatic Relationship Extraction - Builds a dependency graph during indexing
  2. Multiple Query Entry Points - CLI commands and MCP tools
  3. Path Finding - Trace connections between any two nodes
  4. Impact Analysis - Identify dependents and test coverage
  5. Change Attribution - Combine current state with git history

This enables AI coding assistants to answer sophisticated questions about code relationships without requiring manual documentation or extensive file reading.

Sources: src/extract.ts:68-95

SQLite Database Schema

Related topics: Workspace Indexing System, Search Engine

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Primary Storage Tables

Continue reading this section for the full explanation and source context.

Section Full-Text Search Index

Continue reading this section for the full explanation and source context.

Section Graph and Metadata Tables

Continue reading this section for the full explanation and source context.

Related topics: Workspace Indexing System, Search Engine

SQLite Database Schema

Overview

Contextful uses SQLite as its primary storage engine for indexing codebase artifacts. The database schema is designed to support full-text search, symbol indexing, dependency graph traversal, and evidence pack generation for AI-assisted queries. All operations are managed through better-sqlite3 for synchronous, high-performance access.

Sources: src/db.ts:1-50

Schema Tables

Primary Storage Tables

#### chunks

Stores indexed code and documentation segments extracted from source files. Each chunk represents a logical unit of content bounded by language-specific rules (functions, classes, headings, etc.).

ColumnTypeDescription
refTEXTUnique reference identifier (format: file:path:start-end)
file_pathTEXTRelative path to the source file
start_lineINTEGERStarting line number (1-indexed)
end_lineINTEGEREnding line number
kindTEXTChunk classification: code, doc, file
titleTEXTDisplay title for the chunk
textTEXTFull content of the chunk
token_estimateINTEGEREstimated token count using GPT tokenizer

Sources: src/db.ts:23-36

#### symbols

Captures programming constructs (functions, classes, interfaces, types) extracted from source files.

ColumnTypeDescription
refTEXTUnique symbol reference
nameTEXTSymbol name
kindTEXTSymbol type: function, class, interface, type, struct, enum, trait, impl
file_pathTEXTSource file path
lineINTEGERLine number where symbol is defined
signatureTEXTFirst 160 characters of symbol declaration
exportedINTEGERBoolean flag (1 = exported, 0 = local)

Sources: src/db.ts:47-60

#### edges

Represents relationships between code entities, including imports, module dependencies, and configuration references.

ColumnTypeDescription
source_nameTEXTName of the importing/configuring entity
target_nameTEXTName or path of the imported/dependency target
edge_typeTEXTRelationship type: IMPORTS, CONFIGURES
file_pathTEXTFile where the relationship is defined
lineINTEGERLine number of the relationship definition

Sources: src/db.ts:38-45

Full-Text Search Index

#### chunks_fts

Virtual FTS5 table providing fast full-text search across all indexed content. Mirrors core chunk data for BM25-ranked retrieval.

ColumnTypeDescription
refTEXTChunk reference
pathTEXTFile path for filtering
titleTEXTSearchable title field
textTEXTFull searchable content

Sources: src/db.ts:37-42

The FTS table is queried using BM25 ranking in search operations:

SELECT ref, path, title, text, bm25(chunks_fts) AS rank 
FROM chunks_fts WHERE chunks_fts MATCH ?

Sources: src/search.ts:45-47

Graph and Metadata Tables

#### nodes

Represents graph vertices for dependency analysis and traversal operations.

ColumnTypeDescription
idINTEGERAuto-incrementing primary key
refTEXTNode reference
kindTEXTNode classification: file, symbol, chunk, module, config
nameTEXTDisplay name
file_pathTEXTAssociated file path (nullable)

Sources: src/db.ts:12-22

#### files

Stores metadata about indexed source files.

ColumnTypeDescription
absolute_pathTEXTFull absolute file path
languageTEXTDetected programming language
hashTEXTSHA-based content hash for change detection
sizeTEXTFile size in bytes

Sources: src/db.ts:13-17

#### fingerprints

Stores content fingerprints for deduplication and incremental indexing.

ColumnTypeDescription
refTEXTReference to the content chunk
kindTEXTContent type
fingerprintTEXTHash of the content

#### evidence_packs

Persists generated evidence packs for audit and replay.

ColumnTypeDescription
idTEXTUnique pack identifier
queryTEXTOriginal search query
token_estimateINTEGERTotal token count
jsonTEXTSerialized pack data

#### query_log

Records search history for analysis and debugging.

ColumnTypeDescription
queryTEXTSearch query text
intentTEXTClassified search intent
timestampTEXTISO timestamp

Sources: src/db.ts:1-10

Data Flow Architecture

graph TD
    A[Source Files] --> B[extractSymbols]
    A --> C[extractEdges]
    A --> D[extractChunks]
    
    B --> E[symbols table]
    C --> F[edges table]
    D --> G[chunks table]
    D --> H[chunks_fts index]
    
    G --> I[Full-Text Search]
    E --> J[Symbol Lookup]
    F --> K[Graph Traversal]
    
    I --> L[searchContext]
    J --> L
    K --> L
    
    L --> M[Evidence Pack]
    M --> N[evidence_packs]

Sources: src/extract.ts:1-150

Supported Symbol Kinds

The indexer extracts and classifies symbols based on language-specific patterns:

LanguageSupported Kinds
TypeScript/JavaScriptfunction, class, interface, type
Pythonfunction, class
Gofunction, struct, interface
Rustfunction, struct, enum, trait, impl

Sources: src/extract.ts:30-60

Supported Edge Types

Edge TypeDescriptionExample
IMPORTSModule/dependency importimport { foo } from './bar'
CONFIGURESConfiguration key reference"dependencies": { ... } in package.json

The CONFIGURES edge type is specifically generated for package.json dependency sections and JSON configuration keys.

Sources: src/extract.ts:70-120

Query Classification and Intent

The search system classifies queries into intent categories that influence result ranking:

IntentTrigger KeywordsPurpose
symbolClass/function names, exact identifiersFind symbol definitions
codeCode-related termsLocate implementation
memorymemory, lessons, sessionSearch evidence-backed memory
impactdepends, affected, blast radiusReverse dependency analysis
historicalwhy, changed, history, commitGit history queries
architecturalarchitecture, flow, importsDependency tracing
docsdocs, documentation, readmeDocumentation lookup
exactFile paths, line refs, symbolsPrecise file/line access
vagueDefault fallbackBroad search

Sources: src/search.ts:15-30

Token Estimation

Token counts are estimated using a heuristic approximation:

export function estimateTokens(text: string): number {
  return Math.ceil(text.length / 4);
}

This provides a rough approximation where 1 token ≈ 4 characters, suitable for budget management in evidence pack generation.

Sources: src/util.ts:1-10

Key Database Operations

Chunk Insertion

db.prepare(`
  INSERT INTO chunks (ref, file_path, start_line, end_line, kind, title, text, token_estimate)
  VALUES (?, ?, ?, ?, ?, ?, ?, ?)
`).run(chunk.ref, chunk.filePath, chunk.startLine, chunk.endLine, chunk.kind, chunk.title, chunk.text, chunk.tokenEstimate);

同步写入 chunks 表和 chunks_fts FTS 索引。

Symbol Loading

db.prepare(`SELECT ref, name, kind, file_path, line, signature, exported 
FROM symbols WHERE file_path IN (${paths.map(() => "?").join(",")})`)
  .all(...paths)

Sources: src/db.ts:23-42 Sources: src/search.ts:180-195

Schema Version and Metadata

The database stores schema version and workspace metadata:

KeyDescription
schema_versionCurrent schema version number
workspaceWorkspace root path
indexed_atLast indexing timestamp
parser_backendParser backend description
warningsLast 50 indexing warnings

Sources: src/indexer.ts:80-90

Conclusion

The SQLite schema in Contextful provides a normalized, queryable representation of source code structure and content. The dual-table approach for chunks (storage + FTS index) enables both efficient storage and fast full-text retrieval. The edges and symbols tables together support graph traversal for dependency analysis, while the evidence pack system enables persistent, ranked context generation for AI queries.

Sources: src/db.ts:1-50

Workspace Indexing System

Related topics: SQLite Database Schema, Search Engine

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Phase 1: File Discovery

Continue reading this section for the full explanation and source context.

Section Phase 2: Symbol Extraction

Continue reading this section for the full explanation and source context.

Section Phase 3: Edge Extraction

Continue reading this section for the full explanation and source context.

Related topics: SQLite Database Schema, Search Engine

Workspace Indexing System

Overview

The Workspace Indexing System is the core indexing engine of Contextful. It scans, parses, and stores representations of source code files from a workspace into a local SQLite database, enabling semantic search, dependency graph traversal, and evidence-backed context retrieval.

Primary responsibilities:

ResponsibilityDescription
File DiscoveryRecursively traverse workspace directories, filtering by language and ignore rules
Symbol ExtractionParse and catalog functions, classes, interfaces, types, enums, traits
Edge ExtractionTrack import/export relationships between modules and dependencies
Content ChunkingSplit large files into manageable, line-numbered chunks for retrieval
Watch ModeMonitor file system changes and incrementally re-index on modifications

Sources: src/cli.ts:1-20

Architecture

graph TD
    A[Workspace Directory] --> B[File Discovery]
    B --> C[Language Detection]
    C --> D[Content Extraction]
    D --> E[Symbol Extraction]
    D --> F[Edge Extraction]
    D --> G[Chunk Generation]
    E --> H[SQLite DB]
    F --> H
    G --> H
    I[Search/Query] --> H
    J[Watch Mode] --> B

The system is built around a SQLite database that stores three core entities: symbols, edges, and chunks. The indexer processes files in a single pass, extracting all three data types simultaneously to minimize I/O overhead.

Sources: src/extract.ts:1-50

Supported Languages

The indexer natively supports symbol and edge extraction for the following languages:

LanguageSymbol PatternsImport Patterns
TypeScript / JavaScriptfunction, class, interface, type, const arrow/functionimport from, require()
Pythondef, classfrom ... import, import
Gofunc, type struct/interface"..." (quoted imports)
Rustfn, struct, enum, trait, impluse, mod
MarkdownHeadings (#{1,6})N/A
JSONConfig keys ("key":)N/A

Sources: src/extract.ts:15-45

Indexing Process

Phase 1: File Discovery

The indexer recursively scans the workspace directory, applying language-specific filtering and Gitignore-style ignore rules. Binary files are detected and skipped using a simple null-byte heuristic.

export function isLikelyBinary(buffer: Buffer): boolean {
  const sample = buffer.subarray(0, Math.min(buffer.length, 4096));
  return sample.includes(0);
}

Sources: src/util.ts:20-22

Phase 2: Symbol Extraction

Symbols are extracted using language-specific regular expression patterns. Each symbol record includes:

FieldTypeDescription
namestringSymbol identifier
kindstringCategory: function, class, interface, type, struct, enum, trait, impl
linenumberDeclaration line number
signaturestringFirst 160 characters of the declaration line
exportedbooleanWhether the symbol is exported
const push = (name: string, kind: string, exported = false) =>
  symbols.push({ name, kind, line: lineNumber, signature: excerpt(line, 160), exported });

Sources: src/extract.ts:5-7

For TypeScript and JavaScript, the extractor captures export modifiers:

matchPush(line, /^\s*(export\s+)?(?:async\s+)?function\s+([A-Za-z_$][\w$]*)/, push, "function");
matchPush(line, /^\s*(export\s+)?class\s+([A-Za-z_$][\w$]*)/, push, "class");

Sources: src/extract.ts:12-15

Phase 3: Edge Extraction

Edges represent dependency relationships between modules. The extractor identifies:

  • IMPORTS: Direct import statements for each language
  • CONFIGURES: Dependencies declared in configuration files (package.json, Cargo.toml, etc.)
if (language === "typescript" || language === "javascript") {
  for (const match of line.matchAll(/(?:from\s+|import\s*)["']([^"']+)["']/g))
    addImport(match[1]);
  for (const match of line.matchAll(/require\(["']([^"']+)["']\)/g))
    addImport(match[1]);
}

Sources: src/extract.ts:67-72

For package.json, dependencies and scripts are indexed as CONFIGURES edges:

for (const section of ["dependencies", "devDependencies", "peerDependencies", "scripts"]) {
  const values = parsed[section];
  if (!values || typeof values !== "object") continue;
  for (const key of Object.keys(values)) {
    edges.push({ targetName: `${section}:${key}`, targetType: "config", edgeType: "CONFIGURES", line: 1 });
  }
}

Sources: src/extract.ts:105-114

Phase 4: Chunk Generation

Large files are split into overlapping chunks to enable granular retrieval. The system uses a sliding window approach with overlap between consecutive chunks:

graph LR
    A[File Lines 1-200] --> B[Chunk 1: 1-80]
    A --> C[Chunk 2: 60-140]
    A --> D[Chunk 3: 120-200]
    B --> E[Token Estimate]
    C --> E
    D --> E

Each chunk includes:

FieldDescription
refUnique reference string (file:path:start-end)
filePathRelative path to source file
startLineStarting line number
endLineEnding line number
kindChunk type: code, doc, file
titleHuman-readable title
tokenEstimateEstimated token count

Sources: src/extract.ts:145-160

Phase 5: Markdown Document Chunking

Markdown files receive special treatment. Instead of fixed-size chunks, the indexer uses headings as natural section boundaries:

lines.forEach((line, index) => {
  const match = line.match(/^(#{1,6})\s+(.+)$/);
  if (match) headings.push({ title: match[2].trim(), line: index + 1 });
});
return headings.map((heading, index) => {
  const next = headings[index + 1];
  const endLine = next ? next.line - 1 : lines.length;
  // ... create chunk for section
});

Sources: src/extract.ts:174-185

Watch Mode

The indexer supports continuous monitoring via file system watchers:

export async function watchWorkspace(workspace: string, onIndex: (result: IndexResult) => void): Promise<void> {
  const resolved = path.resolve(workspace);
  onIndex(await indexWorkspace({ workspace: resolved }));
  let timer: NodeJS.Timeout | undefined;
  fs.watch(resolved, { recursive: true }, () => {
    if (timer) clearTimeout(timer);
    timer = setTimeout(async () => {
      onIndex(await indexWorkspace({ workspace: resolved }));
    }, 500);
  });
}

Sources: src/indexer.ts:80-91

Key characteristics:

  • Debounces file change events with a 500ms delay to batch rapid successive changes
  • Re-runs full indexing on each trigger
  • Outputs JSON results to stdout for consumption by other processes

CLI Commands

The indexing system exposes three primary CLI commands:

CommandDescription
cxf index --workspace <path> [--watch]Initial or incremental indexing of a workspace
cxf daemon --workspace <path>Run as a long-lived daemon that outputs index results on file changes
`cxf report --workspace <path> --format markdown\json\html`Generate an index status report
# Index a workspace
npx @inferensys/contextful index --workspace .

# Watch for changes and print results
npx @inferensys/contextful daemon --workspace .

Sources: src/cli.ts:22-35

Search Integration

The indexing system powers Contextful's search capabilities. After indexing, users can query the database using natural language:

export async function searchContext(options: SearchOptions): Promise<{ intent: SearchIntent; hits: SearchHit[] }> {
  const workspace = resolveWorkspace(options.options.workspace);
  await ensureIndexed(workspace);
  const intent = classifyQuery(options.query);
  // ... perform FTS and semantic search
}

Sources: src/search.ts:45-55

Query intents are automatically classified to optimize search behavior:

IntentTrigger KeywordsDescription
codefunction names, variable namesCode and implementation search
exactBackticks, quotes, #, file pathsLiteral symbol/identifier lookup
impactimpact, affected, depends, blast radiusDependency and change analysis
historicalwhy, changed, commit, historyGit history and regression tracking
architecturalarchitecture, flow, trace, connectsDependency graph traversal
docsresource, documentation, guide, how toDocumentation and README search
memoryremember, session, lesson, learnedAgent memory recall

Sources: src/search.ts:5-18

Token Estimation

Every chunk and evidence pack includes a token estimate for budget management:

export function packTokenCount(text: string): number {
  return estimateTokens(text);
}

The system uses this estimate to enforce budget limits when building context packs for LLM consumption, ensuring responses stay within token budgets.

Sources: src/report.ts:50-52

Data Models

Symbol Record

interface SymbolRecord {
  ref: string;
  name: string;
  kind: "function" | "class" | "interface" | "type" | "struct" | "enum" | "trait" | "impl";
  filePath: string;
  line: number;
  signature: string;
  exported: boolean;
}

Edge Record

interface RawEdge {
  targetName: string;
  targetType: "module" | "config" | "symbol";
  edgeType: "IMPORTS" | "CONFIGURES" | "DEFINES";
  line: number;
}

Chunk Record

interface ChunkRecord {
  ref: string;
  filePath: string;
  startLine: number;
  endLine: number;
  kind: "code" | "doc" | "file";
  title: string;
  text: string;
  tokenEstimate: number;
}

Extension Points

Adding New Language Support

To add support for a new language:

  1. Add language detection in the file scanner
  2. Implement symbol extraction patterns in extractSymbols()
  3. Implement edge extraction patterns in extractEdges()
  4. Update the chunking logic if special handling is needed

Example pattern structure:

} else if (language === "newlang") {
  matchPush(line, /^\s*(pub\s+)?fn\s+([A-Za-z_][\w]*)/, push, "function");
  const use = line.match(/^\s*use\s+([^;]+);/);
  if (use) addImport(use[1].trim());
}

Sources: src/extract.ts:35-44

Sources: src/cli.ts:1-20

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

medium Configuration risk needs validation

Users may get misleading failures or incomplete behavior unless configuration is checked carefully.

medium README/documentation is current enough for a first validation pass.

The project should not be treated as fully validated until this signal is reviewed.

medium Maintainer activity is unknown

Users cannot judge support quality until recent activity, releases, and issue response are checked.

medium no_demo

The project may affect permissions, credentials, data exposure, or host boundaries.

Doramagic Pitfall Log

Doramagic extracted 7 source-linked risk signals. Review them before installing or handing real data to the project.

1. Configuration risk: Configuration risk needs validation

  • Severity: medium
  • Finding: Configuration risk is backed by a source signal: Configuration risk needs validation. Treat it as a review item until the current version is checked.
  • User impact: Users may get misleading failures or incomplete behavior unless configuration is checked carefully.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: capability.host_targets | github_repo:1240001007 | https://github.com/Inferensys/contextful | host_targets=claude, claude_code

2. Capability assumption: README/documentation is current enough for a first validation pass.

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: The project should not be treated as fully validated until this signal is reviewed.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: capability.assumptions | github_repo:1240001007 | https://github.com/Inferensys/contextful | README/documentation is current enough for a first validation pass.

3. Maintenance risk: Maintainer activity is unknown

  • Severity: medium
  • Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:1240001007 | https://github.com/Inferensys/contextful | last_activity_observed missing

4. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: downstream_validation.risk_items | github_repo:1240001007 | https://github.com/Inferensys/contextful | no_demo; severity=medium

5. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: risks.scoring_risks | github_repo:1240001007 | https://github.com/Inferensys/contextful | no_demo; severity=medium

6. Maintenance risk: issue_or_pr_quality=unknown

  • Severity: low
  • Finding: issue_or_pr_quality=unknown。
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:1240001007 | https://github.com/Inferensys/contextful | issue_or_pr_quality=unknown

7. Maintenance risk: release_recency=unknown

  • Severity: low
  • Finding: release_recency=unknown。
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:1240001007 | https://github.com/Inferensys/contextful | release_recency=unknown

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 1

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using contextful with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence