contextful Manual - Doramagic.ai

Doramagic Project Pack · Human Manual

contextful

The Contextful system consists of several interconnected components that work together to provide context management capabilities.

Project Introduction

Related topics: High-Level Architecture, Quick Start Guide

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Component Responsibilities

Continue reading this section for the full explanation and source context.

Section Language Support Matrix

Continue reading this section for the full explanation and source context.

Section Tool Descriptions

Continue reading this section for the full explanation and source context.

Related topics: High-Level Architecture, Quick Start Guide

Project Introduction

Contextful is an intelligent code context management system designed to provide AI agents with compact, evidence-backed information for codebase navigation and understanding. The project serves as a bridge between large codebases and AI-powered development tools by indexing source code, extracting symbols, tracking dependencies, and generating token-budgeted evidence packs for queries.

Purpose and Scope

Contextful solves the fundamental problem that AI coding assistants face when working with large repositories: excessive context requirements that lead to token waste and degraded performance. Instead of forcing agents to read dozens of random files, Contextful enables targeted, cited, and ranked context retrieval that maximizes the value of each token spent.

The system operates in three primary modes:

Indexing Mode - Scans and indexes source code, extracting symbols, dependencies, and semantic chunks
Query Mode - Creates evidence packs for natural language queries with token budgets
Search Mode - Provides lightweight search across code, docs, symbols, and memory without full evidence compilation

Sources: README.md:1-15

Architecture Overview

The Contextful system consists of several interconnected components that work together to provide context management capabilities.

graph TD
    A[Source Code] --> B[Indexing Engine]
    B --> C[SQLite Kernel DB]
    C --> D[Search Module]
    C --> E[Graph Analysis]
    C --> F[Memory Ledger]
    
    G[CLI / MCP Server] --> D
    G --> E
    G --> F
    
    D --> H[Evidence Pack]
    E --> H
    F --> H
    
    H --> I[AI Agent / User]

Component Responsibilities

Component	File	Responsibility
Indexing Engine	`src/extract.ts`	Parse source files, extract symbols and dependencies
Search Module	`src/search.ts`	Full-text search, intent classification, ranking
Graph Analysis	`src/search.ts`	Trace dependencies and code paths
Memory Ledger	`src/memory.ts`	Store evidence-backed lessons across sessions
CLI Interface	`src/cli.ts`	Command-line interface for all operations
MCP Server	`src/mcp-server.ts`	Model Context Protocol stdio server

Sources: src/extract.ts:1-50, src/search.ts:1-30, src/cli.ts:1-40

Supported Languages and File Types

Contextful supports multiple programming languages through pattern-based extraction. The indexing engine recognizes language-specific syntax for symbols and dependencies.

Language Support Matrix

Language	Functions	Classes	Types	Imports
TypeScript/JavaScript	✓	✓	✓	✓
Python	✓	✓	-	✓
Go	✓	✓	✓	✓
Rust	✓	✓	✓	✓
Markdown	-	-	Headings	-
JSON	-	-	Config keys	-

Sources: src/extract.ts:15-80

Core MCP Tools

Contextful exposes its capabilities through the Model Context Protocol (MCP), providing AI agents with a standardized tool interface. The primary tools are designed to keep the agent surface small while providing maximum utility.

graph LR
    A[Agent] -->|context_pack| B[Evidence Pack Generator]
    A -->|search_code| C[Code Search]
    A -->|trace_path| D[Graph Traversal]
    A -->|impact_analysis| E[Dependency Analyzer]
    A -->|why_changed| F[Git History]
    A -->|recall_memory| G[Memory Search]
    A -->|write_lesson| H[Lesson Writer]

Tool Descriptions

Tool	Purpose	Key Parameters
`context_pack`	Returns ranked, cited, token-budgeted context bundles	`query`, `budget`, `scope`
`search_code`	Powerful search across code, docs, symbols, and memory	`query`, `mode`, `filters`
`trace_path`	Graph traversal across files, symbols, modules, and config	`from`, `to`, `edge_types`
`impact_analysis`	Reverse dependencies and likely tests	`symbol_or_file`
`why_changed`	Current evidence plus git history	`symbol_or_file`
`recall_memory`	Search session learnings and durable lessons	`query`, `scope`
`write_lesson`	Store evidence-backed lessons	`claim`, `evidence_refs`, `confidence`

Sources: README.md:25-45, src/mcp-server.ts:1-80

CLI Interface

Contextful provides a command-line interface through the cxf binary (with contextful as a readable alias). The CLI supports both one-shot operations and daemon mode for continuous indexing.

Command Reference

Command	Description	Key Options
`index`	Index a workspace	`--workspace`, `--watch`
`daemon`	Run local indexing daemon	`--workspace`
`query`	Create evidence pack for query	`--workspace`, `--budget`, `--json`
`search`	Search without full evidence pack	`--workspace`, `--limit`, `--kind`
`report`	Generate context report	`--workspace`, `--format`
`memory add`	Store evidence-backed lesson	`--claim`, `--evidence`, `--scope`, `--confidence`
`server`	Run MCP stdio server	-

Sources: src/cli.ts:40-120, README.md:15-35

Example Usage

# Index a workspace
npx @inferensys/contextful index --workspace .

# Query with token budget
npx @inferensys/contextful query "where is user auth handled" --workspace . --budget 2000

# Run as MCP server
npx @inferensys/contextful server

Sources: README.md:8-15

Data Models

Evidence Pack Structure

The EvidencePack is the core data structure returned by query operations. It contains all necessary context for an agent to answer a query.

interface EvidencePack {
  id: string;                    // Unique pack identifier
  query: string;                 // Original query
  scope: string;                 // Scope of the context
  intent: SearchIntent;          // Classified query intent
  summary: string;               // Human-readable summary
  citations: SearchHit[];        // Ranked evidence items
  files: FileContext[];          // Grouped file references
  symbols: SymbolRecord[];       // Relevant symbols
  graphPaths: GraphPath[];       // Dependency paths
  memoryHits: SearchHit[];       // Memory matches
  confidence: number;            // Confidence score (0.1-0.92)
  tokenEstimate: number;         // Estimated token count
  budget: number;                // Token budget
  createdAt: string;             // ISO timestamp
}

Sources: src/search.ts:200-250

Search Hit Structure

Each search result is represented as a SearchHit with relevance ranking and excerpt information.

Field	Type	Description
`ref`	string	Reference identifier (e.g., `file:src/auth.ts:1-20`)
`path`	string	File path
`title`	string	Display title
`excerpt`	string	Relevant text snippet
`kind`	string	Type: `code`, `doc`, `symbol`, `memory`
`rank`	number	BM25 relevance score

Sources: src/search.ts:50-80

Dependencies and Technology Stack

Contextful is built on a carefully selected set of dependencies that enable efficient code indexing and search.

Dependency	Version	Purpose
`@modelcontextprotocol/sdk`	^1.29.0	MCP protocol implementation
`better-sqlite3`	^12.10.0	SQLite database for indexing
`commander`	^14.0.3	CLI argument parsing
`fast-glob`	^3.3.3	File pattern matching
`tree-sitter-wasms`	^0.1.13	Syntax parsing
`web-tree-sitter`	^0.20.8	Tree-sitter bindings
`zod`	^4.4.3	Schema validation

Sources: package.json:20-40

System Requirements

Node.js: >= 20
License: MIT
Repository: inferensys/contextful

Sources: package.json:45-55

Supported IDE Integration

Contextful is designed to integrate with a wide range of AI-powered development tools:

IDE/Extension	Status
GitHub Copilot	Supported
VS Code	Supported
Cursor	Supported
Windsurf	Supported
Cline	Supported
Roo Code	Supported
Continue	Supported
Zed	Supported

Sources: package.json:10-20

Workflow: From Indexing to Query

The complete workflow demonstrates how Contextful transforms raw source code into actionable intelligence for AI agents.

sequenceDiagram
    participant U as User/Agent
    participant CLI as CLI/MCP Server
    participant IDX as Indexer
    participant DB as SQLite Kernel
    participant SRCH as Search Engine
    participant MEM as Memory Ledger

    U->>CLI: index --workspace ./project
    CLI->>IDX: Extract symbols & dependencies
    IDX->>DB: Store in chunks_fts, symbols, edges
    DB-->>CLI: Index complete

    U->>CLI: query "how is auth handled"
    CLI->>SRCH: classifyQuery() intent=exact
    SRCH->>DB: FTS + BM25 search
    DB-->>SRCH: Ranked hits
    SRCH->>MEM: Check memory ledger
    MEM-->>SRCH: Related lessons
    CLI-->>U: EvidencePack (token-budgeted)

    U->>CLI: write_lesson --claim "Auth pattern" --evidence file:...
    CLI->>MEM: Store lesson with confidence
    MEM-->>CLI: Lesson saved

Sources: src/search.ts:100-150, src/report.ts:80-120

Next Steps

To continue exploring Contextful:

Installation Guide - Set up Contextful in your development environment
CLI Reference - Detailed documentation of all CLI commands
MCP Tools API - Complete reference for MCP tool interfaces
Configuration - Workspace configuration and tuning options
Memory System - Using the evidence-backed lesson system

Sources: README.md:1-15

Quick Start Guide

Overview

Contextful is a contextual indexing and search system designed to help AI agents efficiently retrieve relevant code evidence. Instead of forcing agents to perform dozens of random file reads, Contextful returns compact, ranked, and cited evidence packs that fit within a token budget.

Sources: README.md:1-10

Installation

Install Contextful using npm. The package provides both the cxf binary and the full contextful alias.

npm install -g @inferensys/contextful

Alternatively, run commands directly via npx:

npx @inferensys/contextful index --workspace .

Sources: README.md:11-14

CLI Commands

Contextful provides a command-line interface with the following primary commands:

Command	Description
`cxf index`	Index a workspace for search
`cxf daemon`	Run a local indexing daemon
`cxf query`	Create an evidence pack for a query
`cxf search`	Search indexed context
`cxf report`	Generate a context report
`cxf memory add`	Store an evidence-backed lesson
`cxf server`	Run the MCP stdio server

Sources: README.md:23-32

Basic Workflow

Step 1: Index Your Workspace

Before searching, you must index your codebase. This creates the searchable database:

cxf index --workspace .

For continuous indexing as files change, use the daemon mode:

cxf daemon --workspace .

Sources: src/cli.ts:1-20

Step 2: Query for Context

Once indexed, ask questions about your codebase:

cxf query "where is user auth handled" --workspace . --budget 2000

The query command returns a ranked evidence pack with citations and file references.

#### Query Options

Option	Description	Default
`--workspace <path>`	Workspace path	Current directory
`--budget <tokens>`	Approximate token budget	2000
`--json`	Output as JSON instead of Markdown	false

Sources: src/cli.ts:22-30

Step 3: Search Without Building Evidence Packs

For quick lookups without compiling full evidence packs, use search:

cxf search "authentication middleware" --workspace . --limit 10 --kind code

#### Search Options

Option	Description	Default
`--workspace <path>`	Workspace path	Current directory
`--limit <count>`	Maximum hits	10
`--kind`	Filter: `all`, `code`, `docs`, `symbols`, `memory`	`all`

Sources: src/cli.ts:32-42

Step 4: Generate Reports

Generate comprehensive context reports in various formats:

cxf report --workspace . --format markdown
cxf report --workspace . --format json
cxf report --workspace . --format html

Sources: src/cli.ts:44-48

MCP Server Integration

Contextful can run as a Model Context Protocol (MCP) server, providing tools directly to AI agents.

cxf server

Available MCP Tools

Tool	Purpose
`context_pack`	Returns ranked, cited, token-budgeted evidence bundles
`search_code`	Code, docs, symbol, and memory search
`trace_path`	Graph traversal across files, symbols, modules, and config
`impact_analysis`	Reverse dependencies and likely tests
`why_changed`	Current evidence plus git history
`recall_memory`	Search session learnings and durable project lessons
`write_lesson`	Store evidence-backed lessons for future sessions

Sources: README.md:40-48

MCP Tool Parameters

#### context_pack

Parameter	Type	Required	Description
`query`	string	Yes	Query to answer from indexed context
`budget`	number	No	Token budget for the response
`scope`	string	No	Search scope

Sources: src/mcp-server.ts:1-25

#### search_code

Parameter	Type	Required	Description
`query`	string	Yes	Search query
`mode`	string	No	Search mode
`filters`	object	No	Search filters
`workspace`	string	No	Workspace path
`limit`	number	No	Maximum results

Sources: src/mcp-server.ts:26-40

#### write_lesson

Parameter	Type	Required	Description
`claim`	string	Yes	Lesson claim
`evidence_refs`	array	Yes	Evidence references (e.g., `file:src/auth.ts:1-20`)
`scope`	string	No	Memory scope
`confidence`	number	No	Confidence from 0 to 1
`supersedes`	string	No	Previous lesson ID to supersede

Sources: src/mcp-server.ts:65-80

Memory System

Contextful includes an evidence-backed memory system for storing lessons across sessions.

Adding a Lesson

cxf memory add \
  --claim "Always validate tokens in middleware" \
  --evidence "file:src/auth.ts:1-20" \
  --workspace . \
  --confidence 0.8

#### Memory Command Options

Option	Required	Description
`--claim <text>`	Yes	The lesson or claim
`--evidence <ref...>`	Yes	Evidence references
`--workspace <path>`	No	Workspace path
`--scope <scope>`	No	Memory scope (default: `repo`)
`--confidence <number>`	No	Confidence from 0 to 1 (default: 0.7)

Sources: src/cli.ts:50-75

Output Formats

Markdown Output (Default)

cxf query "where is auth handled" --workspace .

Returns a formatted Markdown document with citations and graph paths.

JSON Output

cxf query "where is auth handled" --workspace . --json

Returns structured JSON data suitable for programmatic processing.

Sources: src/cli.ts:22-30

Report Formats

Format	Description
`markdown`	Human-readable Markdown report
`json`	Structured JSON data
`html`	Standalone HTML page

Sources: src/cli.ts:44-48

Architecture Overview

graph TD
    A[CLI / MCP Server] --> B[Workspace Indexer]
    B --> C[SQLite Kernel DB]
    C --> D[Full-Text Search]
    C --> E[Symbol Index]
    C --> F[Graph Edges]
    G[Query Request] --> H[Search Context]
    H --> I[Evidence Pack Builder]
    I --> D
    I --> E
    I --> F
    I --> J[Memory Ledger]
    I --> K[Evidence Pack Output]
    J --> J

Common Usage Patterns

Pattern 1: Initial Setup

# Index the workspace
cxf index --workspace /path/to/project --watch

# Generate initial report
cxf report --workspace /path/to/project --format html > report.html

Pattern 2: Interactive Exploration

# Run as MCP server
cxf server

# Or use CLI directly
cxf query "how does the cache work" --workspace . --budget 3000

Pattern 3: Agent Memory Persistence

# Store learned lessons
cxf memory add --claim "Config validation happens in validate.ts" --evidence "file:src/config/validate.ts:1-50"

# Recall past lessons
# Via MCP: recall_memory(query="config validation")

Next Steps

Explore Architecture Documentation for deep dive into indexing and search internals
Learn about Memory System for evidence-backed knowledge persistence
Review API Reference for programmatic integration

Sources: README.md:1-10

High-Level Architecture

Related topics: Runtime Components, Search Engine, SQLite Database Schema

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Components

Continue reading this section for the full explanation and source context.

Section Symbol Extraction

Continue reading this section for the full explanation and source context.

Section Edge Detection

Continue reading this section for the full explanation and source context.

High-Level Architecture

Contextful is a local-only indexing and context management tool designed to help AI coding assistants retrieve compact, evidence-backed context from workspace codebases. The system operates without external embedding APIs, instead relying on SQLite FTS5 full-text search, graph-based dependency tracking, and intent-classified query routing. Sources: README.md

System Overview

Contextful functions as a local daemon that continuously indexes workspace files, extracts code symbols and import relationships, and provides a structured context pack API to agents. The architecture follows a three-layer design:

Indexing Layer - File parsing, symbol extraction, edge detection
Storage Layer - SQLite kernel with FTS5 search and graph tables
Query Layer - Intent classification, ranked search, evidence pack assembly

Sources: src/indexer.ts

Component Architecture

graph TD
    A[Workspace Files] --> B[Indexer]
    B --> C[Symbol Extraction]
    B --> D[Edge Detection]
    B --> E[Chunk Generation]
    C --> F[SQLite Kernel DB]
    D --> F
    E --> F
    G[CLI / MCP Server] --> H[Search Module]
    H --> F
    H --> I[Context Pack Assembly]
    I --> J[Evidence Pack Output]

Core Components

Component	File	Responsibility
Indexer	`src/indexer.ts`	Recursively walks workspace, triggers file processing
Extractor	`src/extract.ts`	Parses symbols, edges, and code chunks per file
Search	`src/search.ts`	FTS5 queries, intent classification, ranking
CLI	`src/cli.ts`	Command-line interface and MCP server entry point
Report	`src/report.ts`	Generates workspace context reports

Sources: src/indexer.ts, src/extract.ts, src/search.ts

Indexing Pipeline

The indexing pipeline processes workspace files through multiple extraction stages. Each source file is read, classified by language, and passed through specialized extractors that produce structured records.

graph LR
    A[File Content] --> B[Language Detection]
    B --> C[Symbol Extraction]
    B --> D[Edge Extraction]
    B --> E[Chunk Extraction]
    C --> F[symbols table]
    D --> G[edges table]
    E --> H[chunks_fts table]

Symbol Extraction

The extractSymbols function identifies named code entities based on language-specific patterns:

Language	Supported Symbols
TypeScript/JavaScript	functions, classes, interfaces, types, const arrow functions
Python	functions, classes
Go	functions, structs, interfaces
Rust	functions, structs, enums, traits, impl blocks
Markdown	headings
JSON	config keys

Sources: src/extract.ts:1-80

Edge Detection

Import relationships are tracked as directed edges between modules. The extractEdges function processes different import syntaxes per language:

TypeScript/JavaScript: ES6 import and require() statements
Python: from ... import and import statements
Go: Import strings within double quotes
Rust: use and mod declarations
JSON: Top-level keys in configuration files

Sources: src/extract.ts:100-160

Chunk Generation

Code files are split into semantic chunks for full-text search. The codeChunks function segments content into logical blocks based on:

Empty line boundaries
Token budget (target: ~300 tokens per chunk)
Language-specific token estimation via estimateTokens

Sources: src/extract.ts:180-220

Storage Layer

SQLite Kernel Schema

The kernel database uses SQLite with several specialized tables:

Table	Purpose	Key Columns
`files`	Tracked workspace files	`path`, `language`, `hash`, `indexed_at`
`symbols`	Extracted code symbols	`ref`, `name`, `kind`, `file_path`, `line`, `signature`, `exported`
`edges`	Import/dependency graph	`source_file`, `target_name`, `target_type`, `edge_type`, `line`
`chunks_fts`	FTS5 virtual table for full-text search	`ref`, `path`, `title`, `text`, `kind`
`memory`	Evidence-backed lessons	`id`, `claim`, `scope`, `confidence`, `created_at`

Sources: src/search.ts, src/indexer.ts

Query and Search System

Intent Classification

Queries are classified into intents to optimize search strategy:

Intent	Trigger Keywords	Search Focus
`code`	`function`, `class`, `implementation`	Symbol and code chunks
`memory`	`memory`, `lesson`, `session`	Memory ledger
`impact`	`impact`, `depends on`, `blast radius`	Dependency graph
`historical`	`why`, `changed`, `commit`	Git history
`architectural`	`architecture`, `flow`, `path`, `trace`	Graph traversal
`docs`	`documentation`, `readme`, `guide`	Markdown chunks
`exact`	symbols, paths, line references	Precise symbol matching
`vague`	Default fallback	Broad FTS search

Sources: src/search.ts:1-50

Context Pack Assembly

The createContextPack function orchestrates the evidence gathering:

Classify query intent
Execute FTS5 search across chunks
Apply query expansion with domain-specific term additions
Score and rank hits using BM25 with intent-based bonuses
Select hits within token budget
Load related symbols and graph paths
Assemble and return EvidencePack

Sources: src/search.ts:200-280

CLI and MCP Integration

Command Structure

Command	Purpose	Key Options
`index`	Initial workspace indexing	`--workspace`, `--watch`
`daemon`	Continuous indexing with file watching	`--workspace`
`query`	Generate evidence pack	`--workspace`, `--budget`, `--json`
`search`	Direct search without packing	`--workspace`, `--limit`, `--kind`
`report`	Generate context report	`--workspace`, `--format`
`memory add`	Store evidence-backed lessons	`--claim`, `--evidence`, `--scope`
`server`	Start MCP stdio server	(none)

Sources: src/cli.ts:20-100

MCP Server Tools

The MCP server exposes standardized tools for agent integration:

context_pack(query, budget, scope) - Primary killer tool returning ranked, cited evidence
search_code(query, mode, filters) - Code, docs, symbol, and memory search
trace_path(from, to, edge_types) - Graph traversal across the codebase
impact_analysis(symbol_or_file) - Reverse dependency analysis
why_changed(symbol_or_file) - Git history with current evidence
recall_memory(query, scope) - Search persistent lessons
write_lesson(claim, evidence_refs, scope) - Store new memories

Sources: README.md

Report Generation

The report system aggregates workspace statistics and warnings:

graph TD
    A[generateReport] --> B[Index Status Check]
    B --> C[File Statistics]
    B --> D[Symbol Statistics]
    B --> E[Edge Statistics]
    B --> F[Warning Collection]
    C --> G[renderMarkdown / renderHtml]
    D --> G
    E --> G
    F --> G

Reports support three output formats:

markdown - Plain text with markdown headings
json - Structured JSON with all report fields
html - Self-contained HTML document with styling

Sources: src/report.ts:1-80

Privacy and Security

Contextful operates entirely locally with no external API calls:

No embedding API calls for vector search
No source code uploads
No file editing or auto-fixes
No dependency installation in target workspace

Evidence references are validated and stale references are rejected to maintain integrity of the memory system.

Sources: README.md

Data Flow Summary

sequenceDiagram
    participant User
    participant CLI as CLI/MCP Server
    participant Indexer
    participant Extractor
    participant Search
    participant Kernel as SQLite Kernel
    
    User->>CLI: index --workspace .
    CLI->>Indexer: indexWorkspace()
    Indexer->>Extractor: extractFile()
    Extractor->>Kernel: Insert symbols, edges, chunks
    Kernel-->>Indexer: Confirmation
    
    User->>CLI: query "where is auth handled"
    CLI->>Search: searchContext()
    Search->>Kernel: FTS5 query
    Search->>Kernel: Graph traversal
    Search->>Kernel: Memory search
    Kernel-->>Search: Ranked hits
    Search-->>CLI: EvidencePack
    CLI-->>User: Compact context output

Key Design Decisions

Decision	Rationale
SQLite FTS5 over vector embeddings	Local-only operation, no external API dependencies
Intent-based query routing	Optimizes search strategy based on query semantics
BM25 scoring with bonuses	Balances relevance with domain-specific priorities
Token-budgeted evidence packs	Prevents context overflow in LLM contexts
Evidence refs as first-class citizens	Enables verifiable, traceable AI responses

Sources: src/search.ts:50-150, src/util.ts

Sources: src/indexer.ts

Runtime Components

Overview

The Runtime Components in Contextful encompass the services, daemons, and server processes that enable real-time code indexing, search, and context-aware information retrieval. These components operate as the execution layer of the application, providing persistent indexing, live workspace monitoring, and MCP (Model Context Protocol) server capabilities for AI agent integration.

The runtime layer bridges the gap between static code analysis and dynamic query resolution, allowing users and AI agents to query indexed repositories with token-budgeted evidence packs.

Source: https://github.com/Inferensys/contextful / Human Manual

Search Engine

Related topics: Context Packs, SQLite Database Schema

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Components

Continue reading this section for the full explanation and source context.

Section SearchIntent Types

Continue reading this section for the full explanation and source context.

Section Main Search Pipeline

Continue reading this section for the full explanation and source context.

Related topics: Context Packs, SQLite Database Schema

Search Engine

Overview

The Search Engine is the core retrieval system in Contextful, designed to provide intelligent, evidence-backed context for agent queries. It combines full-text search (FTS), symbol indexing, dependency graph traversal, and memory recall to deliver ranked, cited results within a configurable token budget.

The system serves as the foundation for multiple interfaces: CLI commands (query, search), MCP server tools (search_code, context_pack), and report generation.

Sources: src/search.ts:1-50

Architecture

graph TD
    A[Query Input] --> B[Query Classification]
    B --> C{Intent Type}
    C -->|code/docs| D[Full-Text Search]
    C -->|symbols| E[Symbol Lookup]
    C -->|memory| F[Memory Ledger Search]
    C -->|impact| G[Graph Traversal]
    C -->|historical| H[Git History + Search]
    D --> I[BM25 Ranking]
    E --> J[Symbol Index]
    F --> K[Memory DB]
    G --> L[Edge Database]
    H --> M[Git Operations]
    I --> N[Result Scoring]
    J --> N
    K --> N
    L --> N
    N --> O[Context Pack]

Core Components

Component	File	Responsibility
Search Kernel	`src/search.ts`	Core search logic and ranking
Query Classifier	`src/search.ts`	Intent detection
FTS Engine	`src/search.ts`	Full-text search using SQLite FTS5
Graph Tracer	`src/search.ts`	Dependency graph traversal
Memory Store	`src/memory.ts`	Evidence-backed memory recall

Sources: src/search.ts:50-120

Query Classification

The search engine classifies each query into one of seven intent types to optimize retrieval strategy.

SearchIntent Types

Intent	Trigger Keywords	Search Strategy
`code`	`code`, `function`, `class`, `impl`	FTS + symbol lookup
`docs`	`resource`, `docs`, `readme`, `how to`	FTS on markdown/json
`symbols`	`define`, `interface`, `type`, `symbol`	Direct symbol index
`memory`	`remember`, `lesson`, `learned`, `session`	Memory ledger query
`impact`	`impact`, `affected`, `depends`, `blast radius`	Reverse dependency graph
`historical`	`why`, `changed`, `commit`, `history`	Git history + current search
`architectural`	`architecture`, `flow`, `trace`, `connects`	Graph path analysis
`exact`	Code patterns, paths, line refs	Direct file/symbol lookup
`vague`	Default	Broad FTS + graph

function classifyQuery(query: string): SearchIntent {
  const q = query.toLowerCase();
  if (/\b(code|function|class|implement|module)\b/.test(q)) return "code";
  if (/\b(define|interface|type|symbol)\b/.test(q)) return "symbols";
  if (/\b(memory|remember|lesson|learned|sessions?)\b/.test(q)) return "memory";
  // ... additional classifications
}

Sources: src/search.ts:1-30

Search Flow

Main Search Pipeline

sequenceDiagram
    participant CLI as CLI/MCP
    participant Search as searchContext()
    participant Kernel as Kernel DB
    participant FTS as FTS5 Engine
    participant Graph as Graph DB
    participant Memory as Memory Store

    CLI->>Search: query, workspace, limit
    Search->>Kernel: ensureIndexed()
    Search->>Kernel: addQuery()
    Search->>FTS: ftsQuery(expandedTerms)
    FTS-->>Search: ranked rows (BM25)
    Search->>Search: scoreFromRank()
    Search->>Graph: loadGraphPaths()
    Search-->>CLI: {intent, hits}

Full-Text Search Query Builder

The ftsQuery function transforms user queries into FTS5-compatible search strings:

function ftsQuery(query: string): string {
  const terms = expandedTerms(query);
  return Array.from(new Set(terms.map((term) => term.toLowerCase())))
    .filter((term) => !STOPWORDS.has(term))
    .slice(0, 14)
    .map((term) => `${term}*`)
    .join(" OR ");
}

Key behaviors:

Expands terms based on query context (e.g., "tool" → "server", "tool", "callTool")
Filters stopwords: where, what, which, when, how, are, the, for, with, and, or, to
Limits to 14 terms maximum
Appends wildcard * for prefix matching

Sources: src/search.ts:200-280

Scoring System

Rank-to-Score Transformation

The scoreFromRank function converts BM25 ranks into relevance scores (0-10 scale) with domain-specific bonuses:

function scoreFromRank(rank: number, query: string, corpus: string): number {
  const base = 10 / (1 + Math.abs(rank));
  let bonus = 0;
  
  // Domain-specific bonuses
  if (/\b(tool|tools|registered|register)\b/.test(q) && corpus.includes("server.tool(")) {
    bonus += 9;
  }
  if (/\bmcp\b/.test(q) && corpus.includes("mcp-server")) {
    bonus += 4;
  }
  
  return clamp(base + bonus, 0.1, 10);
}

Scoring Bonuses Matrix

Query Pattern	Content Match	Bonus
`tool/tools/register`	`server.tool(`	+9
`mcp`	`mcp-server`	+4
`where registered`	`function runMcpServer`	+4
`tool` query	`src/search.ts`	-8
`memory` query	`src/memory.ts`	+5
`memory` query	`src/search.ts`	-16

This anti-gaming mechanism penalizes results from the search implementation itself when irrelevant.

Sources: src/search.ts:240-320

Term Expansion

The expandedTerms function intelligently expands query terms based on semantic context:

function expandedTerms(query: string): string[] {
  const lower = query.toLowerCase();
  const additions: string[] = [];
  
  if (/\b(tool|tools|registered|register)\b/.test(lower)) {
    additions.push("server", "tool", "tools", "callTool");
  }
  if (/\bmcp\b/.test(lower)) {
    additions.push("mcp", "server", "stdio");
  }
  if (/\bmemory|memories|remember|remembers|lesson|lessons|learned|session|sessions\b/.test(lower)) {
    additions.push("memory", "memories", "lesson", "lessons", "claim", "ledger", "evidence");
  }
  if (/\bimpact|depends|dependents|uses\b/.test(lower)) {
    additions.push("imports", "tests", "edges");
  }
  
  return [...terms, ...additions];
}

Sources: src/search.ts:320-380

CLI Commands

Query Command

cxf query "<query>" --workspace <path> --budget <tokens> --json

Option	Type	Default	Description
`query`	string	required	Query to answer from indexed context
`--workspace`	path	`cwd()`	Workspace path
`--budget`	number	2000	Approximate token budget
`--json`	flag	false	Output JSON instead of Markdown

Search Command

cxf search "<query>" --workspace <path> --limit <count> --kind <kind>

Option	Type	Default	Description
`query`	string	required	Search query
`--workspace`	path	`cwd()`	Workspace path
`--limit`	number	10	Maximum hits
`--kind`	enum	`all`	Search category: `all\	code\	docs\	symbols\	memory`

Sources: src/cli.ts:40-80

MCP Server Tools

The search engine exposes the following MCP tools:

search_code

server.tool("search_code", "Search indexed code, docs, symbols, and stored context", {
  query: z.string(),
  mode: z.enum(["all", "code", "docs", "symbols", "memory"]).optional(),
  limit: z.number().optional(),
  filters: z.record(z.string(), z.unknown()).optional()
});

trace_path

server.tool("trace_path", "Trace graph relationships between files, symbols, modules", {
  from: z.string(),
  to: z.string().optional(),
  edge_types: z.array(z.string()).optional(),
  limit: z.number().optional()
});

impact_analysis

server.tool("impact_analysis", "Find likely dependents and tests", {
  symbol_or_file: z.string(),
  limit: z.number().optional()
});

why_changed

server.tool("why_changed", "Explain why a file/symbol may have changed", {
  symbol_or_file: z.string(),
  limit: z.number().optional()
});

Sources: src/mcp-server.ts:1-80

Context Pack

The createContextPack function assembles comprehensive evidence bundles:

export async function createContextPack(options: {
  workspace?: string;
  query: string;
  budget?: number;
  scope?: string;
}): Promise<EvidencePack>

EvidencePack Structure

Field	Type	Description
`id`	string	Unique pack identifier (`ctx_<hash>`)
`query`	string	Original query
`scope`	string	Search scope (default: `repo`)
`intent`	SearchIntent	Classified intent
`summary`	string	Human-readable summary
`citations`	SearchHit[]	Ranked search results
`files`	FileContext[]	Grouped file references
`symbols`	SymbolRecord[]	Relevant symbols (≤20)
`graphPaths`	GraphPath[]	Dependency connections (≤20)
`memoryHits`	SearchHit[]	Memory matches
`confidence`	number	Confidence score (0.1-0.92)
`tokenEstimate`	number	Estimated token count
`budget`	number	Token budget used
`createdAt`	string	ISO timestamp

Confidence Calculation

function confidenceFor(hits: SearchHit[], graphPaths: GraphPath[], memoryHits: SearchHit[]): number {
  return clamp(
    0.25 + 
    hits.length * 0.05 + 
    graphPaths.length * 0.02 + 
    memoryHits.length * 0.05,
    0.1,
    0.92
  );
}

Sources: src/search.ts:400-480

Graph Traversal

The traceGraph function performs dependency graph analysis:

export async function traceGraph(options: {
  workspace?: string;
  from: string;
  to?: string;
  edgeTypes?: string[];
  limit?: number;
}): Promise<GraphPath[]>

Edge Types

Edge Type	Direction	Description
`IMPORTS`	File → Module	Import/require statements
`DEFINES`	File → Symbol	Symbol definitions
`CONFIGURES`	File → Config	Configuration keys
`TESTS`	Test → Source	Test file relationships

Impact Analysis

export async function impactAnalysis(options: {
  workspace?: string;
  target: string;
  limit?: number;
}): Promise<{
  target: string;
  forward: string[];
  reverse: string[];
  tests: string[];
}>

Returns forward dependencies, reverse dependents, and likely test files for a given symbol or file.

Sources: src/search.ts:480-550

Utility Functions

lineRange

Extracts a specific line range from text:

export function lineRange(text: string, startLine: number, endLine: number): string {
  const lines = text.split(/\r?\n/);
  return lines.slice(Math.max(0, startLine - 1), Math.min(lines.length, endLine)).join("\n");
}

clamp

Constrains values within bounds:

export function clamp(value: number, min: number, max: number): number {
  return Math.max(min, Math.min(max, value));
}

unique

Deduplicates arrays:

export function unique<T>(items: T[]): T[] {
  return Array.from(new Set(items));
}

isLikelyBinary

Detects binary files by checking for null bytes:

export function isLikelyBinary(buffer: Buffer): boolean {
  const sample = buffer.subarray(0, Math.min(buffer.length, 4096));
  return sample.includes(0);
}

Sources: src/util.ts:1-50

Data Models

SearchHit

interface SearchHit {
  ref: string;        // Format: "file:path:start-end"
  path: string;       // File path
  kind: string;       // "chunk", "symbol", "memory", "doc"
  title: string;      // Display title
  text: string;       // Content snippet
  score: number;      // Relevance score
  line?: number;      // Starting line number
}

SymbolRecord

interface SymbolRecord {
  ref: string;
  name: string;
  kind: string;       // "function", "class", "interface", "type", etc.
  filePath: string;
  line: number;
  signature?: string;
  exported?: boolean;
}

Sources: src/search.ts:100-150

Index Status

The getIndexStatus function returns workspace indexing metadata:

export async function getIndexStatus(options: { workspace?: string }): Promise<IndexStatus>

IndexStatus Structure

Field	Type	Description
`workspace`	string	Workspace path
`languageCounts`	Record<string, number>	File count per language
`warnings`	string[]	Index warnings
`lastIndexed`	string	ISO timestamp of last index
`totalChunks`	number	Total indexed chunks

Sources: src/search.ts:550-600

Summary

The Search Engine provides Contextful's intelligent retrieval capabilities through:

Intent Classification - Automatically routes queries to optimal search strategies
Full-Text Search - SQLite FTS5 with BM25 ranking and domain-specific scoring
Symbol Index - Fast lookup of code definitions across languages
Graph Traversal - Dependency analysis and impact tracking
Memory Integration - Recall of past lessons and evidence-backed claims
Token Budgeting - Constrains output to specified budget limits
Confidence Scoring - Quantifies result reliability

All search operations flow through a unified kernel database that combines FTS chunks, symbol records, and edge relationships for comprehensive context retrieval.

Sources: src/search.ts:1-50

Context Packs

Related topics: Search Engine, Memory Ledger

Section Related Pages

Continue reading this section for the full explanation and source context.

Section EvidencePack Structure

Continue reading this section for the full explanation and source context.

Section SearchHit Structure

Continue reading this section for the full explanation and source context.

Section SearchIntent Enum

Continue reading this section for the full explanation and source context.

Related topics: Search Engine, Memory Ledger

Context Packs

Context Packs are the core output format of Contextful, providing AI agents with compact, ranked, and cited evidence bundles that fit within a specified token budget. Instead of forcing agents to read dozens of arbitrary files, Context Packs deliver precisely the evidence needed to answer a specific query.

Overview

A Context Pack is a structured evidence package generated by the context_pack() MCP tool or the cxf query CLI command. It contains:

Ranked code and documentation citations matching the query
Related symbols (functions, classes, interfaces) from matching files
Graph paths connecting related components
Memory hits from evidence-backed lessons
A confidence score and token budget accounting

The pack is designed to be consumed directly by an LLM agent, providing traceable citations and a clear summary of what evidence was found.

Data Model

EvidencePack Structure

Field	Type	Description
`id`	`string`	Unique identifier (format: `ctx_<hash>`)
`query`	`string`	The original search query
`scope`	`string`	Search scope (e.g., "repo")
`intent`	`SearchIntent`	Classified query intent
`summary`	`string`	Human-readable summary of findings
`citations`	`SearchHit[]`	Ranked evidence items
`files`	`FileContext[]`	Grouped file references with reasons
`symbols`	`SymbolRecord[]`	Relevant symbols from matched files
`graphPaths`	`GraphPath[]`	Graph traversals between components
`memoryHits`	`SearchHit[]`	Memory/lesson hits
`confidence`	`number`	Estimated confidence (0.1-0.92)
`tokenEstimate`	`number`	Estimated token count of pack
`budget`	`number`	Requested token budget
`createdAt`	`string`	ISO timestamp of creation

Sources: src/search.ts:search.ts

SearchHit Structure

Field	Type	Description
`ref`	`string`	Reference identifier (e.g., `file:src/auth.ts:1-20`)
`path`	`string`	File path
`title`	`string`	Display title
`kind`	`string`	Hit kind: code, doc, symbol, memory
`excerpt`	`string`	Relevant text excerpt
`score`	`number`	Relevance score
`rank`	`number`	BM25 rank

SearchIntent Enum

Intent	Trigger Keywords
`exact`	Code patterns, paths, symbol names with special chars
`symbol`	Function names, class names, method calls
`test`	test, spec, mock, fixture, unit
`memory`	memory, lesson, learned, session
`impact`	impact, affected, depends, blast radius
`historical`	why, changed, commit, history, regression
`architectural`	architecture, flow, trace, connects, imports
`docs`	resource, docs, documentation, guide, readme
`vague`	Default for generic queries

Sources: src/search.ts:search.ts

Creation Flow

The createContextPack function orchestrates the entire pack creation process:

graph TD
    A[createContextPack] --> B[searchContext]
    B --> C[classifyQuery]
    C --> D[ftsQuery + expandedTerms]
    D --> E[FTS Search on chunks_fts]
    E --> F[scoreFromRank]
    F --> G[Select Hits within Budget]
    G --> H[loadSymbolsForPaths]
    G --> I[loadGraphPaths]
    G --> J[Filter memoryHits]
    H --> K[Build EvidencePack]
    I --> K
    J --> K
    K --> L[saveEvidencePack]
    L --> M[Return EvidencePack]

Step 1: Search Context

The process begins by classifying the query intent and executing full-text search:

const search = await searchContext({ workspace, query, limit: budget * 2 });
const selected = selectWithinBudget(search.hits, budget);

Sources: src/search.ts:search.ts

Step 2: Budget-Aware Selection

Hits are selected greedily until the token estimate exceeds the budget:

function selectWithinBudget(hits: SearchHit[], budget: number): SearchHit[] {
  const selected: SearchHit[] = [];
  let tokenEstimate = 0;
  for (const hit of hits) {
    const est = estimateTokens(hit.excerpt || hit.title);
    if (tokenEstimate + est >= budget) break;
    selected.push(hit);
    tokenEstimate += est;
  }
  return selected;
}

Sources: src/search.ts:search.ts

Step 3: Symbol Loading

For each selected file, related symbols are loaded (up to 20 total):

const symbols = loadSymbolsForPaths(kernel.db, paths).slice(0, 20);

The symbols query joins against the symbols table:

SELECT ref, name, kind, file_path, line, signature, exported 
FROM symbols 
WHERE file_path IN (...)

Sources: src/search.ts:search.ts

Step 4: Graph Path Loading

Graph paths connect files through import/dependency relationships:

const graphPaths = loadGraphPaths(kernel.db, paths, 20);

Sources: src/search.ts:search.ts

Step 5: Memory Hit Extraction

Memory hits are filtered from selected hits by kind:

const memoryHits = selected.filter((hit) => hit.kind === "memory");

Step 6: Confidence Calculation

Confidence is calculated using a clamped formula:

function confidenceFor(hits, graphPaths, memoryHits): number {
  return clamp(
    0.25 + hits.length * 0.05 + graphPaths.length * 0.02 + memoryHits.length * 0.05,
    0.1,
    0.92
  );
}

Base: 0.25
Each hit: +0.05
Each graph path: +0.02
Each memory hit: +0.05
Clamped to [0.1, 0.92]

Sources: src/search.ts:search.ts

Query Classification

The classifyQuery function determines the search intent based on keywords:

function classifyQuery(q: string): SearchIntent {
  const lower = q.toLowerCase();
  if (/[`"'#.:/]/.test(q) || /\b[A-Z][A-Za-z0-9_]{2,}\b/.test(q)) return "exact";
  if (/\b(test|spec|mock|fixture)\b/.test(q)) return "test";
  if (/\b(memory|lesson|learned|session|sessions)\b/.test(q)) return "memory";
  if (/\b(impact|affected|depends|dependents|blast radius)\b/.test(q)) return "impact";
  if (/\b(why|changed|commit|history|regression|introduced)\b/.test(q)) return "historical";
  if (/\b(architecture|flow|path|trace|connects|calls|imports)\b/.test(q)) return "architectural";
  if (/\b(resource|docs|documentation|guide|readme|how to|setup)\b/.test(q)) return "docs";
  return "vague";
}

Sources: src/search.ts:search.ts

Term Expansion

The expandedTerms function adds related terms to improve recall for specific domains:

function expandedTerms(query: string): string[] {
  const additions: string[] = [];
  if (/\b(tool|tools|registered|register)\b/.test(lower)) {
    additions.push("server", "tool", "tools", "callTool");
  }
  if (/\bmcp\b/.test(lower)) {
    additions.push("mcp", "server", "stdio");
  }
  if (/\bmemory|memories|remember|remembers|lesson|lessons\b/.test(lower)) {
    additions.push("memory", "memories", "lesson", "lessons", "claim", "ledger", "evidence");
  }
  if (/\bimpact|depends|dependents|uses\b/.test(lower)) {
    additions.push("imports", "tests", "edges");
  }
  return [...terms, ...additions];
}

Sources: src/search.ts:search.ts

Scoring Algorithm

The scoreFromRank function calculates relevance scores:

function scoreFromRank(rank: number, q: string): number {
  let bonus = 0;
  const lower = q.toLowerCase();
  
  if (/\bmemory|memories|remember|remembers|lesson|lessons|sessions\b/.test(q)) {
    if (lower.includes("memory ledger")) bonus += 7;
    if (lower.includes("src/memory.ts")) bonus += 5;
    if (lower.includes("readme.md")) bonus += 4;
    if (lower.includes("src/search.ts")) bonus -= 16;
  }
  if (/\b(where|how)\b/.test(q) && lower.includes("config-key")) bonus -= 2;
  
  return 10 / (1 + Math.abs(rank)) + bonus;
}

Sources: src/search.ts:search.ts

CLI Usage

The query command creates Context Packs via CLI:

cxf query "<query>" --workspace <path> --budget 2000 --json

Options

Option	Type	Default	Description
`--workspace`	`path`	`cwd`	Workspace path
`--budget`	`number`	`2000`	Approximate token budget
`--json`	`flag`	`false`	Output as JSON instead of Markdown

Example Output

# Context Pack ctx_abc123

Query: where is user auth handled
Intent: architectural
Confidence: 65%
Token estimate: 1850/2000

Found 5 evidence items for a architectural query, with 2 graph connections and 1 memory hit.

## Citations
- file:src/auth.ts:1-50 (auth module)
  Handles user authentication via JWT tokens...
- file:src/middleware/auth.ts:1-30 (auth middleware)
  Express middleware for auth validation...

## Graph Paths
- src/auth.ts --IMPORTS--> src/utils/jwt.ts (src/auth.ts:5)
- src/middleware/auth.ts --IMPORTS--> src/auth.ts (src/middleware/auth.ts:3)

## Memory Hits
- memory:lesson:1: JWT tokens should be validated on every protected route.

Sources: src/cli.ts:cli.ts

Rendering

Context Packs can be rendered in multiple formats via renderEvidencePackMarkdown:

export function renderEvidencePackMarkdown(pack: EvidencePack): string {
  const lines = [
    `# Context Pack ${pack.id}`,
    "",
    `Query: ${pack.query}`,
    `Intent: ${pack.intent}`,
    `Confidence: ${Math.round(pack.confidence * 100)}%`,
    `Token estimate: ${pack.tokenEstimate}/${pack.budget}`,
    "",
    pack.summary,
    "",
    "## Citations"
  ];
  // ... citations, graph paths, memory hits
}

Sources: src/report.ts:report.ts

Chunk Extraction

Contextual chunks are extracted during indexing for searchability:

graph LR
    A[Source File] --> B[Language Detection]
    B --> C[extractSymbols]
    B --> D[extractEdges]
    B --> E[extractChunks]
    C --> F[Symbol Table]
    D --> G[Edge Table]
    E --> H[Chunk Table]

Supported Languages

Language	Symbol Patterns
TypeScript/JavaScript	function, class, interface, type, const arrow
Python	def, class
Go	func, type struct/interface
Rust	fn, struct, enum, trait, impl
Markdown	headings (H1-H6)
JSON	top-level keys

Sources: src/extract.ts:extract.ts

Chunking Strategy

Code files: Divided into blocks of ~60 lines, with overlap for context
Markdown files: Split by headings, with the heading as the chunk title
Token estimation: Used for both selection and budget accounting

function codeChunks(relativePath: string, content: string): ChunkRecord[] {
  const lines = content.split(/\r?\n/);
  const chunks: ChunkRecord[] = [];
  // Split into ~60-line blocks with overlap
  for (let start = 1; start <= lines.length; start += 50) {
    const end = Math.min(start + 60 - 1, lines.length);
    const text = lineRange(content, start, end);
    chunks.push({
      ref: fileRef(relativePath, start, end),
      filePath: relativePath,
      startLine: start,
      endLine: end,
      kind: "file",
      title: `${relativePath}:${start}-${end}`,
      text,
      tokenEstimate: estimateTokens(text)
    });
  }
  return chunks;
}

Sources: src/extract.ts:extract.ts

Summary Generation

The summarizePack function generates human-readable summaries:

function summarizePack(
  query: string,
  intent: SearchIntent,
  hits: SearchHit[],
  graphPaths: GraphPath[],
  memoryHits: SearchHit[]
): string {
  if (hits.length === 0) {
    return `No indexed evidence matched "${query}". Re-index or broaden the query.`;
  }
  return `Found ${hits.length} evidence item${hits.length === 1 ? "" : "s"} ` +
    `for a ${intent} query, with ${graphPaths.length} graph connection${graphPaths.length === 1 ? "" : "s"} ` +
    `and ${memoryHits.length} memory hit${memoryHits.length === 1 ? "" : "s"}.`;
}

Sources: src/search.ts:search.ts

Persistence

Evidence packs are saved to the kernel database for audit and retrieval:

saveEvidencePack(kernel.db, { 
  id: pack.id, 
  query: pack.query, 
  tokenEstimate, 
  json: JSON.stringify(pack) 
});

Sources: src/search.ts:search.ts

Design Principles

Token budget awareness: Never exceed the requested budget; select the most relevant items first
Cited evidence: Every piece of information is traceable to a specific file and line range
Intent-driven: Query classification shapes what gets searched and how results are interpreted
Graph connectivity: Beyond matching files, show how they connect through imports and dependencies
Memory integration: Blend indexed content with evidence-backed lessons from prior sessions

Sources: src/search.ts:search.ts

Memory Ledger

Related topics: Context Packs, Search Engine

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Components

Continue reading this section for the full explanation and source context.

Section Memory Record Structure

Continue reading this section for the full explanation and source context.

Section Evidence Reference Formats

Continue reading this section for the full explanation and source context.

Related topics: Context Packs, Search Engine

Memory Ledger

The Memory Ledger is Contextful's evidence-backed persistent memory system that enables AI agents to retain and recall learned lessons across sessions. Unlike ephemeral context that disappears when a session ends, the Memory Ledger stores structured knowledge annotated with source evidence, allowing agents to build cumulative understanding of a codebase over time.

Overview

The Memory Ledger solves a fundamental problem in AI-assisted development: knowledge gained during one session is lost in the next. When an agent discovers how authentication works, identifies a fragile dependency, or learns a non-obvious architectural pattern, that knowledge typically vanishes when the session ends.

Contextful's approach requires every stored memory to be anchored to concrete evidence—file references, code symbols, or prior context packs. This design prevents hallucinated or unsubstantiated memories from polluting the knowledge base and ensures that recalled lessons can be traced back to their source.

The system operates entirely locally with no external API calls, embedding services, or cloud dependencies. All memory data remains within the workspace's SQLite database.

Architecture

graph TD
    A[Agent Session] -->|write_lesson| B[Memory Ledger]
    A -->|recall_memory| C[Memory Search]
    B -->|evidence refs| D[Evidence Pack]
    C -->|cited memories| A
    D -->|citations| E[Source Files]
    F[Workspace DB] -->|stores| B
    F -->|stores| C

Core Components

Component	Role	Source
Memory Storage	SQLite-backed persistent storage for lessons	`src/db.ts`
Memory Search	FTS-enabled retrieval of memories by query	`src/search.ts`
Evidence Validation	Ensures evidence refs are valid before storage	`src/mcp-server.ts`
Confidence Scoring	Assigns credibility scores to stored memories	`src/cli.ts:85`

Data Model

Memory Record Structure

Each memory in the ledger contains the following fields:

Field	Type	Description
`id`	string	Unique identifier (prefixed with `memory:`)
`claim`	string	The substantive lesson or observation
`scope`	string	Granularity level: `repo`, `file`, `symbol`, or `session`
`evidenceRefs`	string[]	Validated references to source evidence
`confidence`	number	Credibility score from 0.0 to 1.0
`status`	string	Current state: `active`, `superseded`, or `stale`
`supersedes`	string?	ID of the memory this replaces (if any)

Evidence Reference Formats

Valid evidence references that can be attached to memories:

Format	Example	Purpose
File range	`file:src/auth.ts:10-40`	Reference specific lines in a file
Symbol	`symbol:src/auth.ts#AuthService:12`	Point to a specific code symbol
Context pack	`pack:ctx_abc123`	Reference a prior evidence pack

Sources: README.md:54-56

Evidence references must come from search results or context packs—arbitrary references are rejected. This prevents storing claims without verifiable backing.

Memory Scopes

The scope field determines the durability and applicability of a memory:

Scope	Description	Persistence
`repo`	Project-wide lessons applicable across sessions	Permanent
`file`	File-specific knowledge	Permanent
`symbol`	Symbol-level lessons	Permanent
`session`	Ephemeral session-scoped learnings	Lost on session end

The default scope is repo, reflecting the assumption that most valuable memories have project-wide relevance.

Sources: src/cli.ts:73

Writing Memories

CLI Usage

cxf memory add \
  --claim "AuthService.validateToken() throws on expired tokens without catching" \
  --evidence "file:src/auth.ts:45-67" \
  --evidence "file:src/api/middleware.ts:12-20" \
  --confidence 0.85 \
  --scope repo

MCP Tool Usage

await server.callTool("write_lesson", {
  claim: "The payment module requires initialization before use",
  evidence_refs: ["file:src/payment/core.ts:10-30", "symbol:src/payment/core.ts#initialize:15"],
  scope: "repo",
  confidence: 0.9
});

Sources: src/mcp-server.ts:79-94

Validation Rules

Memories are subject to strict validation:

Evidence required: At least one valid evidence reference must be provided
Evidence must be fresh: References must originate from search results or context packs
Claim must be substantive: Empty or trivial claims are rejected
Confidence in valid range: Must be between 0.0 and 1.0

Searching Memories

Intent Classification

Contextful automatically classifies queries to determine when to search memories. The query classifier recognizes memory-related intents through keyword detection:

const memoryPattern = /\bmemory|memories|remember|remembers|lesson|lessons|learned|session|sessions\b/;

When matched, the classifier returns intent: "memory" and the search system automatically queries the memories FTS index.

Sources: src/search.ts:14-17

Query Expansion

Memory searches benefit from automatic term expansion. When a query mentions relevant concepts, additional search terms are added:

if (/\bmemory|memories|remember|remembers|lesson|lessons|learned|session|sessions\b/.test(lower)) {
  additions.push("memory", "memories", "lesson", "lessons", "claim", "ledger", "evidence");
}

This ensures that queries like "what did we learn about auth" retrieve memory results even if those exact words don't appear in the stored claims.

Sources: src/search.ts:28-30

Search Results

Memory hits in search results include:

Field	Description
`ref`	Memory reference in format `memory:<id>`
`kind`	Always `"memory"` for memory hits
`title`	Display title including scope
`excerpt`	Redacted claim text (secrets removed)
`evidence`	Original evidence references
`status`	Current memory status
`score`	Relevance score

Memory Lifecycle

stateDiagram-v2
    [*] --> Active: write_lesson
    Active --> Superseded: write_lesson with supersedes
    Active --> Stale: Evidence becomes invalid
    Superseded --> [*]
    Stale --> [*]
    Active --> [*]: Deleted

Status Transitions

Active → Default state for newly written memories. Active memories are returned in search results and can supersede other memories.

Superseded → When a newer, more accurate memory replaces an older one, the superseded memory retains its ID and evidence but is excluded from search results. The supersedes field links to the replaced memory.

Stale → Memories become stale when their evidence references point to files or symbols that have changed significantly since the memory was written. The reporting system tracks stale memories for review.

Sources: src/report.ts:54-58

Integration with Context Packs

The Memory Ledger integrates with Contextful's evidence pack system:

Before writing: Search context or create a context pack to get evidence references
Writing lessons: Use those evidence refs to anchor the memory claim
Recalling: Later sessions query the ledger, retrieving cited memories

// During a session: create pack, identify lessons
const pack = await createContextPack({ query: "how is auth handled", budget: 2000 });

// Later session: recall what was learned
const result = await recallMemory({ query: "auth patterns", scope: "repo" });

This bidirectional relationship means memories enhance future context packs, and context packs provide evidence for future memories.

Reporting

The report command includes memory statistics:

cxf report --workspace . --format markdown

Output includes a "Stale Memories" section listing memories whose evidence references may no longer be valid:

## Stale Memories
- memory_abc123: AuthService.validateToken() behavior changed in v2
- memory_def456: payment module initialization order is now reversed

Sources: src/report.ts:54-58

Configuration Options

Option	CLI Flag	Default	Description
Workspace	`--workspace`	`process.cwd()`	Path to workspace with memory database
Claim	`--claim`	required	The memory content
Evidence	`--evidence`	required	One or more evidence refs
Scope	`--scope`	`repo`	Memory scope level
Confidence	`--confidence`	`0.7`	Credibility score

Privacy Considerations

The Memory Ledger is designed with privacy as a core principle:

Local only: No data leaves the workspace
No cloud sync: Memories remain on the local machine
Evidence-linked: Claims cannot be stored without verifiable source
Content redaction: Secrets are automatically redacted from stored claims using pattern matching for emails, API keys, and tokens

Sources: src/util.ts:12-18

Tool	Purpose
`recall_memory`	Search the memory ledger
`write_lesson`	Store a new evidence-backed memory
`context_pack`	Generate evidence packs that can feed into memories

Sources: README.md:35-40

Sources: README.md:54-56

Graph Traversal and Analysis

Related topics: Search Engine, SQLite Database Schema

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Data Flow

Continue reading this section for the full explanation and source context.

Section Core Types

Continue reading this section for the full explanation and source context.

Section Edge Types

Continue reading this section for the full explanation and source context.

Related topics: Search Engine, SQLite Database Schema

Graph Traversal and Analysis

Graph Traversal and Analysis is a core feature of Contextful that builds and queries a dependency graph from source code. This system tracks relationships between files, symbols, modules, and configuration nodes, enabling sophisticated impact analysis, change tracing, and dependency exploration.

Overview

Contextful extracts code relationships during indexing and stores them in a SQLite database as a traversable graph. This enables agents to answer questions like:

"What depends on this module?"
"What tests cover this file?"
"How does this symbol connect to other parts of the codebase?"

Sources: src/extract.ts:68-95

Architecture

graph TD
    A[Source Files] --> B[extractEdges]
    B --> C[GraphEdge Records]
    C --> D[SQLite Kernel DB]
    E[CLI/MCP Query] --> F[searchContext]
    F --> G[traceGraph]
    G --> H[GraphPath Results]
    F --> I[impactAnalysis]
    I --> J[Impact Results]
    F --> K[whyChanged]
    K --> L[Git History + Evidence]

Data Flow

Extraction Phase: During workspace indexing, extractEdges() parses source files to identify relationships Sources: src/extract.ts:52-95
Storage Phase: Edge data is stored in the edges table within the kernel SQLite database Sources: src/search.ts:1-30
Query Phase: CLI commands and MCP tools query the graph using traversal algorithms Sources: src/search.ts:180-220

Graph Data Model

Core Types

interface GraphEdge {
  sourceType: "file" | "symbol";
  sourceName: string;
  targetType: "file" | "symbol" | "module" | "config";
  targetName: string;
  edgeType: EdgeType;
  filePath: string;
  line: number;
}

interface GraphPath {
  edges: Array<{
    sourceName: string;
    sourceType: string;
    edgeType: string;
    targetName: string;
    targetType: string;
  }>;
  totalHops: number;
}

interface GraphNode {
  name: string;
  type: "file" | "symbol" | "module" | "config";
  path?: string;
  kind?: string;
}

Sources: src/types.ts:45-70

Edge Types

Edge Type	Description	Source Detection
`DEFINES`	File defines a symbol	Function/class declarations
`IMPORTS`	File imports a module	`import`, `require`, `from` statements
`CONFIGURES`	File/config references a key	JSON keys, package.json fields
`TESTS`	Test file tests imports	Auto-generated for test files

Sources: src/extract.ts:75-100

Language-Specific Detection

The extraction layer supports multiple languages:

Language	Import Patterns	Symbol Patterns
TypeScript/JavaScript	`from "module"`, `require("module")`	`export function/class/interface`
Python	`from module import`	`def`, `class`
Go	`"package"`	`func`, `type struct/interface`
Rust	`use module;`, `mod name;`	`fn`, `struct`, `enum`, `trait`

Sources: src/extract.ts:70-95

Graph Traversal API

traceGraph

Performs graph traversal starting from a source node, optionally filtering by edge types and limiting results.

export async function traceGraph(options: {
  workspace?: string;
  from: string;
  to?: string;
  edgeTypes?: string[];
  limit?: number;
}): Promise<GraphPath[]>

#### Parameters

Parameter	Type	Required	Description
`workspace`	`string`	No	Workspace path (defaults to CWD)
`from`	`string`	Yes	Starting node name
`to`	`string`	No	Target node for path finding
`edgeTypes`	`string[]`	No	Filter by specific edge types
`limit`	`number`	No	Maximum paths to return (default: 10)

Sources: src/search.ts:180-190

loadGraphPaths

Loads graph paths from the database for a set of file paths.

function loadGraphPaths(
  db: Database,
  paths: string[],
  limit: number
): GraphPath[]

Sources: src/search.ts:60-80

Impact Analysis

Impact analysis identifies reverse dependencies—what depends on a given file or symbol—and finds relevant test coverage.

graph LR
    A[Target File/Symbol] --> B[Find All Edges Pointing TO Target]
    B --> C[Group by Source File]
    C --> D[Identify Test Files]
    D --> E[Return Impact Set]

impactAnalysis Function

export async function impactAnalysis(options: {
  workspace?: string;
  target: string;
  limit?: number;
}): Promise<ImpactResult>

#### Impact Result Structure

Field	Type	Description
`target`	`string`	The analyzed symbol or file
`dependents`	`DependentInfo[]`	Files/symbols that depend on target
`tests`	`SearchHit[]`	Related test files

interface DependentInfo {
  path: string;
  type: string;
  imports: string[];
}

interface ImpactResult {
  target: string;
  dependents: DependentInfo[];
  tests: SearchHit[];
}

Sources: src/search.ts:130-175

Test Detection Logic

Test files are identified by path patterns and edges with TESTS type:

const testPaths = paths.filter(
  (path) => path.edgeType === "TESTS" || 
            /(^|\/)(tests?|__tests__)\/|(\.|-)(test|spec)\./.test(path.filePath)
);

Sources: src/search.ts:165-170

Change Analysis

whyChanged

Combines current code evidence with git history to explain why a file or symbol may have changed.

export async function whyChanged(options: {
  workspace?: string;
  target: string;
  limit?: number
}): Promise<{
  target: string;
  currentEvidence: SearchHit[];
  commits: Array<{
    hash: string;
    subject: string;
    date?: string;
    files: string[];
  }>;
}>

#### Workflow

graph TD
    A[whyChanged] --> B[searchContext for target]
    B --> C[Extract file paths from hits]
    C --> D[readGitHistory with file paths]
    D --> E[Combine evidence + commits]
    E --> F[Return structured result]

Sources: src/search.ts:200-230

Git History Integration

The system reads git history for affected files:

function readGitHistory(
  workspace: string,
  filePaths: string[],
  limit: number
): Array<{
  hash: string;
  subject: string;
  date?: string;
  files: string[];
}>

Sources: src/search.ts:85-100

CLI Commands

trace Command

cxf trace --from <symbol_or_file> [--to <target>] [--edge-types <types>] [--limit <count>]

#### Options

Option	Type	Default	Description
`--from`	`string`	Required	Starting node
`--to`	`string`	-	Target node
`--edge-types`	`string`	all	Comma-separated edge types
`--limit`	`number`	10	Maximum paths
`--workspace`	`string`	CWD	Workspace path

Sources: src/cli.ts:45-60

report Command

Generates a comprehensive context report including graph statistics:

cxf report --workspace <path> --format markdown|json|html

#### Report Includes

Index status with graph node/edge counts
Top queries by intent type
Stale memory detection
Recent evidence packs

Sources: src/cli.ts:70-85

MCP Server Tools

Contextful exposes graph traversal as MCP tools for integration with AI coding assistants.

trace_path

{
  "name": "trace_path",
  "description": "Trace graph relationships between files, symbols, modules, and config nodes.",
  "inputSchema": {
    "from": "string",
    "to": "string (optional)",
    "edge_types": ["string"] (optional),
    "limit": "number (optional)"
  }
}

Sources: src/mcp-server.ts:45-55

impact_analysis

{
  "name": "impact_analysis",
  "description": "Find likely dependents and tests for a file, symbol, or module.",
  "inputSchema": {
    "symbol_or_file": "string",
    "limit": "number (optional)"
  }
}

Sources: src/mcp-server.ts:56-65

why_changed

{
  "name": "why_changed",
  "description": "Explain why a file or symbol may have changed by combining current evidence with git history.",
  "inputSchema": {
    "symbol_or_file": "string",
    "limit": "number (optional)"
  }
}

Sources: src/mcp-server.ts:66-75

Usage Examples

Direct CLI Usage

# Trace dependencies of auth module
cxf trace --from src/auth.ts --edge-types IMPORTS

# Find what tests cover a file
cxf impact --target src/parser.ts

# Get change history for a symbol
cxf why --target AuthService

MCP Integration

{
  "mcpServers": {
    "contextful": {
      "command": "npx",
      "args": ["-y", "@inferensys/contextful", "server"]
    }
  }
}

// In an MCP client
const result = await client.callTool("trace_path", {
  from: "src/auth.ts",
  to: "src/database.ts",
  edgeTypes: ["IMPORTS", "DEFINES"]
});

Query Intent Classification

Graph queries are automatically classified to route to appropriate traversal strategies:

Intent	Keywords	Graph Relevance
`architectural`	architecture, flow, path, connects, calls	High priority
`impact`	impact, affected, depends, blast radius	Direct edge query
`historical`	why, changed, history, regression	Graph + git history
`exact`	Symbol names, file paths	Symbol-level traversal

Sources: src/search.ts:115-130

Limitations and Design Decisions

Privacy Guarantees

All processing is local-only
No external embedding APIs used
No source code upload
No file editing capabilities

Sources: README.md:45-50

v1 Scope Boundaries

Broken JSON during indexing produces warnings but continues processing
Syntax diagnostics are intentionally out of scope
Git history is read-only

Sources: src/extract.ts:120-125

Summary

The Graph Traversal and Analysis system in Contextful provides:

Automatic Relationship Extraction - Builds a dependency graph during indexing
Multiple Query Entry Points - CLI commands and MCP tools
Path Finding - Trace connections between any two nodes
Impact Analysis - Identify dependents and test coverage
Change Attribution - Combine current state with git history

This enables AI coding assistants to answer sophisticated questions about code relationships without requiring manual documentation or extensive file reading.

Sources: src/extract.ts:68-95

SQLite Database Schema

Related topics: Workspace Indexing System, Search Engine

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Primary Storage Tables

Continue reading this section for the full explanation and source context.

Section Full-Text Search Index

Continue reading this section for the full explanation and source context.

Section Graph and Metadata Tables

Continue reading this section for the full explanation and source context.

Related topics: Workspace Indexing System, Search Engine

SQLite Database Schema

Overview

Contextful uses SQLite as its primary storage engine for indexing codebase artifacts. The database schema is designed to support full-text search, symbol indexing, dependency graph traversal, and evidence pack generation for AI-assisted queries. All operations are managed through better-sqlite3 for synchronous, high-performance access.

Sources: src/db.ts:1-50

Schema Tables

Primary Storage Tables

#### chunks

Stores indexed code and documentation segments extracted from source files. Each chunk represents a logical unit of content bounded by language-specific rules (functions, classes, headings, etc.).

Column	Type	Description
`ref`	TEXT	Unique reference identifier (format: `file:path:start-end`)
`file_path`	TEXT	Relative path to the source file
`start_line`	INTEGER	Starting line number (1-indexed)
`end_line`	INTEGER	Ending line number
`kind`	TEXT	Chunk classification: `code`, `doc`, `file`
`title`	TEXT	Display title for the chunk
`text`	TEXT	Full content of the chunk
`token_estimate`	INTEGER	Estimated token count using GPT tokenizer

Sources: src/db.ts:23-36

#### symbols

Captures programming constructs (functions, classes, interfaces, types) extracted from source files.

Column	Type	Description
`ref`	TEXT	Unique symbol reference
`name`	TEXT	Symbol name
`kind`	TEXT	Symbol type: `function`, `class`, `interface`, `type`, `struct`, `enum`, `trait`, `impl`
`file_path`	TEXT	Source file path
`line`	INTEGER	Line number where symbol is defined
`signature`	TEXT	First 160 characters of symbol declaration
`exported`	INTEGER	Boolean flag (1 = exported, 0 = local)

Sources: src/db.ts:47-60

#### edges

Represents relationships between code entities, including imports, module dependencies, and configuration references.

Column	Type	Description
`source_name`	TEXT	Name of the importing/configuring entity
`target_name`	TEXT	Name or path of the imported/dependency target
`edge_type`	TEXT	Relationship type: `IMPORTS`, `CONFIGURES`
`file_path`	TEXT	File where the relationship is defined
`line`	INTEGER	Line number of the relationship definition

Sources: src/db.ts:38-45

Full-Text Search Index

#### chunks_fts

Virtual FTS5 table providing fast full-text search across all indexed content. Mirrors core chunk data for BM25-ranked retrieval.

Column	Type	Description
`ref`	TEXT	Chunk reference
`path`	TEXT	File path for filtering
`title`	TEXT	Searchable title field
`text`	TEXT	Full searchable content

Sources: src/db.ts:37-42

The FTS table is queried using BM25 ranking in search operations:

SELECT ref, path, title, text, bm25(chunks_fts) AS rank 
FROM chunks_fts WHERE chunks_fts MATCH ?

Sources: src/search.ts:45-47

Graph and Metadata Tables

#### nodes

Represents graph vertices for dependency analysis and traversal operations.

Column	Type	Description
`id`	INTEGER	Auto-incrementing primary key
`ref`	TEXT	Node reference
`kind`	TEXT	Node classification: `file`, `symbol`, `chunk`, `module`, `config`
`name`	TEXT	Display name
`file_path`	TEXT	Associated file path (nullable)

Sources: src/db.ts:12-22

#### files

Stores metadata about indexed source files.

Column	Type	Description
`absolute_path`	TEXT	Full absolute file path
`language`	TEXT	Detected programming language
`hash`	TEXT	SHA-based content hash for change detection
`size`	TEXT	File size in bytes

Sources: src/db.ts:13-17

#### fingerprints

Stores content fingerprints for deduplication and incremental indexing.

Column	Type	Description
`ref`	TEXT	Reference to the content chunk
`kind`	TEXT	Content type
`fingerprint`	TEXT	Hash of the content

#### evidence_packs

Persists generated evidence packs for audit and replay.

Column	Type	Description
`id`	TEXT	Unique pack identifier
`query`	TEXT	Original search query
`token_estimate`	INTEGER	Total token count
`json`	TEXT	Serialized pack data

#### query_log

Records search history for analysis and debugging.

Column	Type	Description
`query`	TEXT	Search query text
`intent`	TEXT	Classified search intent
`timestamp`	TEXT	ISO timestamp

Sources: src/db.ts:1-10

Data Flow Architecture

graph TD
    A[Source Files] --> B[extractSymbols]
    A --> C[extractEdges]
    A --> D[extractChunks]
    
    B --> E[symbols table]
    C --> F[edges table]
    D --> G[chunks table]
    D --> H[chunks_fts index]
    
    G --> I[Full-Text Search]
    E --> J[Symbol Lookup]
    F --> K[Graph Traversal]
    
    I --> L[searchContext]
    J --> L
    K --> L
    
    L --> M[Evidence Pack]
    M --> N[evidence_packs]

Sources: src/extract.ts:1-150

Supported Symbol Kinds

The indexer extracts and classifies symbols based on language-specific patterns:

Language	Supported Kinds
TypeScript/JavaScript	`function`, `class`, `interface`, `type`
Python	`function`, `class`
Go	`function`, `struct`, `interface`
Rust	`function`, `struct`, `enum`, `trait`, `impl`

Sources: src/extract.ts:30-60

Supported Edge Types

Edge Type	Description	Example
`IMPORTS`	Module/dependency import	`import { foo } from './bar'`
`CONFIGURES`	Configuration key reference	`"dependencies": { ... }` in package.json

The CONFIGURES edge type is specifically generated for package.json dependency sections and JSON configuration keys.

Sources: src/extract.ts:70-120

Query Classification and Intent

The search system classifies queries into intent categories that influence result ranking:

Intent	Trigger Keywords	Purpose
`symbol`	Class/function names, exact identifiers	Find symbol definitions
`code`	Code-related terms	Locate implementation
`memory`	memory, lessons, session	Search evidence-backed memory
`impact`	depends, affected, blast radius	Reverse dependency analysis
`historical`	why, changed, history, commit	Git history queries
`architectural`	architecture, flow, imports	Dependency tracing
`docs`	docs, documentation, readme	Documentation lookup
`exact`	File paths, line refs, symbols	Precise file/line access
`vague`	Default fallback	Broad search

Sources: src/search.ts:15-30

Token Estimation

Token counts are estimated using a heuristic approximation:

export function estimateTokens(text: string): number {
  return Math.ceil(text.length / 4);
}

This provides a rough approximation where 1 token ≈ 4 characters, suitable for budget management in evidence pack generation.

Sources: src/util.ts:1-10

Key Database Operations

Chunk Insertion

db.prepare(`
  INSERT INTO chunks (ref, file_path, start_line, end_line, kind, title, text, token_estimate)
  VALUES (?, ?, ?, ?, ?, ?, ?, ?)
`).run(chunk.ref, chunk.filePath, chunk.startLine, chunk.endLine, chunk.kind, chunk.title, chunk.text, chunk.tokenEstimate);

同步写入 chunks 表和 chunks_fts FTS 索引。

Symbol Loading

db.prepare(`SELECT ref, name, kind, file_path, line, signature, exported 
FROM symbols WHERE file_path IN (${paths.map(() => "?").join(",")})`)
  .all(...paths)

Sources: src/db.ts:23-42 Sources: src/search.ts:180-195

Schema Version and Metadata

The database stores schema version and workspace metadata:

Key	Description
`schema_version`	Current schema version number
`workspace`	Workspace root path
`indexed_at`	Last indexing timestamp
`parser_backend`	Parser backend description
`warnings`	Last 50 indexing warnings

Sources: src/indexer.ts:80-90

Conclusion

The SQLite schema in Contextful provides a normalized, queryable representation of source code structure and content. The dual-table approach for chunks (storage + FTS index) enables both efficient storage and fast full-text retrieval. The edges and symbols tables together support graph traversal for dependency analysis, while the evidence pack system enables persistent, ranked context generation for AI queries.

Sources: src/db.ts:1-50

Workspace Indexing System

Related topics: SQLite Database Schema, Search Engine

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Phase 1: File Discovery

Continue reading this section for the full explanation and source context.

Section Phase 2: Symbol Extraction

Continue reading this section for the full explanation and source context.

Section Phase 3: Edge Extraction

Continue reading this section for the full explanation and source context.

Related topics: SQLite Database Schema, Search Engine

Workspace Indexing System

Overview

The Workspace Indexing System is the core indexing engine of Contextful. It scans, parses, and stores representations of source code files from a workspace into a local SQLite database, enabling semantic search, dependency graph traversal, and evidence-backed context retrieval.

Primary responsibilities:

Responsibility	Description
File Discovery	Recursively traverse workspace directories, filtering by language and ignore rules
Symbol Extraction	Parse and catalog functions, classes, interfaces, types, enums, traits
Edge Extraction	Track import/export relationships between modules and dependencies
Content Chunking	Split large files into manageable, line-numbered chunks for retrieval
Watch Mode	Monitor file system changes and incrementally re-index on modifications

Sources: src/cli.ts:1-20

Architecture

graph TD
    A[Workspace Directory] --> B[File Discovery]
    B --> C[Language Detection]
    C --> D[Content Extraction]
    D --> E[Symbol Extraction]
    D --> F[Edge Extraction]
    D --> G[Chunk Generation]
    E --> H[SQLite DB]
    F --> H
    G --> H
    I[Search/Query] --> H
    J[Watch Mode] --> B

The system is built around a SQLite database that stores three core entities: symbols, edges, and chunks. The indexer processes files in a single pass, extracting all three data types simultaneously to minimize I/O overhead.

Sources: src/extract.ts:1-50

Supported Languages

The indexer natively supports symbol and edge extraction for the following languages:

Language	Symbol Patterns	Import Patterns
TypeScript / JavaScript	`function`, `class`, `interface`, `type`, `const` arrow/function	`import from`, `require()`
Python	`def`, `class`	`from ... import`, `import`
Go	`func`, `type struct/interface`	`"..."` (quoted imports)
Rust	`fn`, `struct`, `enum`, `trait`, `impl`	`use`, `mod`
Markdown	Headings (`#{1,6}`)	N/A
JSON	Config keys (`"key":`)	N/A

Sources: src/extract.ts:15-45

Indexing Process

Phase 1: File Discovery

The indexer recursively scans the workspace directory, applying language-specific filtering and Gitignore-style ignore rules. Binary files are detected and skipped using a simple null-byte heuristic.

export function isLikelyBinary(buffer: Buffer): boolean {
  const sample = buffer.subarray(0, Math.min(buffer.length, 4096));
  return sample.includes(0);
}

Sources: src/util.ts:20-22

Phase 2: Symbol Extraction

Symbols are extracted using language-specific regular expression patterns. Each symbol record includes:

Field	Type	Description
`name`	string	Symbol identifier
`kind`	string	Category: function, class, interface, type, struct, enum, trait, impl
`line`	number	Declaration line number
`signature`	string	First 160 characters of the declaration line
`exported`	boolean	Whether the symbol is exported

const push = (name: string, kind: string, exported = false) =>
  symbols.push({ name, kind, line: lineNumber, signature: excerpt(line, 160), exported });

Sources: src/extract.ts:5-7

For TypeScript and JavaScript, the extractor captures export modifiers:

matchPush(line, /^\s*(export\s+)?(?:async\s+)?function\s+([A-Za-z_$][\w$]*)/, push, "function");
matchPush(line, /^\s*(export\s+)?class\s+([A-Za-z_$][\w$]*)/, push, "class");

Sources: src/extract.ts:12-15

Phase 3: Edge Extraction

Edges represent dependency relationships between modules. The extractor identifies:

IMPORTS: Direct import statements for each language
CONFIGURES: Dependencies declared in configuration files (package.json, Cargo.toml, etc.)

if (language === "typescript" || language === "javascript") {
  for (const match of line.matchAll(/(?:from\s+|import\s*)["']([^"']+)["']/g))
    addImport(match[1]);
  for (const match of line.matchAll(/require\(["']([^"']+)["']\)/g))
    addImport(match[1]);
}

Sources: src/extract.ts:67-72

For package.json, dependencies and scripts are indexed as CONFIGURES edges:

for (const section of ["dependencies", "devDependencies", "peerDependencies", "scripts"]) {
  const values = parsed[section];
  if (!values || typeof values !== "object") continue;
  for (const key of Object.keys(values)) {
    edges.push({ targetName: `${section}:${key}`, targetType: "config", edgeType: "CONFIGURES", line: 1 });
  }
}

Sources: src/extract.ts:105-114

Phase 4: Chunk Generation

Large files are split into overlapping chunks to enable granular retrieval. The system uses a sliding window approach with overlap between consecutive chunks:

graph LR
    A[File Lines 1-200] --> B[Chunk 1: 1-80]
    A --> C[Chunk 2: 60-140]
    A --> D[Chunk 3: 120-200]
    B --> E[Token Estimate]
    C --> E
    D --> E

Each chunk includes:

Field	Description
`ref`	Unique reference string (`file:path:start-end`)
`filePath`	Relative path to source file
`startLine`	Starting line number
`endLine`	Ending line number
`kind`	Chunk type: `code`, `doc`, `file`
`title`	Human-readable title
`tokenEstimate`	Estimated token count

Sources: src/extract.ts:145-160

Phase 5: Markdown Document Chunking

Markdown files receive special treatment. Instead of fixed-size chunks, the indexer uses headings as natural section boundaries:

lines.forEach((line, index) => {
  const match = line.match(/^(#{1,6})\s+(.+)$/);
  if (match) headings.push({ title: match[2].trim(), line: index + 1 });
});
return headings.map((heading, index) => {
  const next = headings[index + 1];
  const endLine = next ? next.line - 1 : lines.length;
  // ... create chunk for section
});

Sources: src/extract.ts:174-185

Watch Mode

The indexer supports continuous monitoring via file system watchers:

export async function watchWorkspace(workspace: string, onIndex: (result: IndexResult) => void): Promise<void> {
  const resolved = path.resolve(workspace);
  onIndex(await indexWorkspace({ workspace: resolved }));
  let timer: NodeJS.Timeout | undefined;
  fs.watch(resolved, { recursive: true }, () => {
    if (timer) clearTimeout(timer);
    timer = setTimeout(async () => {
      onIndex(await indexWorkspace({ workspace: resolved }));
    }, 500);
  });
}

Sources: src/indexer.ts:80-91

Key characteristics:

Debounces file change events with a 500ms delay to batch rapid successive changes
Re-runs full indexing on each trigger
Outputs JSON results to stdout for consumption by other processes

CLI Commands

The indexing system exposes three primary CLI commands:

Command	Description
`cxf index --workspace <path> [--watch]`	Initial or incremental indexing of a workspace
`cxf daemon --workspace <path>`	Run as a long-lived daemon that outputs index results on file changes
`cxf report --workspace <path> --format markdown\	json\	html`	Generate an index status report

# Index a workspace
npx @inferensys/contextful index --workspace .

# Watch for changes and print results
npx @inferensys/contextful daemon --workspace .

Sources: src/cli.ts:22-35

Search Integration

The indexing system powers Contextful's search capabilities. After indexing, users can query the database using natural language:

export async function searchContext(options: SearchOptions): Promise<{ intent: SearchIntent; hits: SearchHit[] }> {
  const workspace = resolveWorkspace(options.options.workspace);
  await ensureIndexed(workspace);
  const intent = classifyQuery(options.query);
  // ... perform FTS and semantic search
}

Sources: src/search.ts:45-55

Query intents are automatically classified to optimize search behavior:

Intent	Trigger Keywords	Description
`code`	function names, variable names	Code and implementation search
`exact`	Backticks, quotes, `#`, file paths	Literal symbol/identifier lookup
`impact`	impact, affected, depends, blast radius	Dependency and change analysis
`historical`	why, changed, commit, history	Git history and regression tracking
`architectural`	architecture, flow, trace, connects	Dependency graph traversal
`docs`	resource, documentation, guide, how to	Documentation and README search
`memory`	remember, session, lesson, learned	Agent memory recall

Sources: src/search.ts:5-18

Token Estimation

Every chunk and evidence pack includes a token estimate for budget management:

export function packTokenCount(text: string): number {
  return estimateTokens(text);
}

The system uses this estimate to enforce budget limits when building context packs for LLM consumption, ensuring responses stay within token budgets.

Sources: src/report.ts:50-52

Data Models

Symbol Record

interface SymbolRecord {
  ref: string;
  name: string;
  kind: "function" | "class" | "interface" | "type" | "struct" | "enum" | "trait" | "impl";
  filePath: string;
  line: number;
  signature: string;
  exported: boolean;
}

Edge Record

interface RawEdge {
  targetName: string;
  targetType: "module" | "config" | "symbol";
  edgeType: "IMPORTS" | "CONFIGURES" | "DEFINES";
  line: number;
}

Chunk Record

interface ChunkRecord {
  ref: string;
  filePath: string;
  startLine: number;
  endLine: number;
  kind: "code" | "doc" | "file";
  title: string;
  text: string;
  tokenEstimate: number;
}

Extension Points

Adding New Language Support

To add support for a new language:

Add language detection in the file scanner
Implement symbol extraction patterns in extractSymbols()
Implement edge extraction patterns in extractEdges()
Update the chunking logic if special handling is needed

Example pattern structure:

} else if (language === "newlang") {
  matchPush(line, /^\s*(pub\s+)?fn\s+([A-Za-z_][\w]*)/, push, "function");
  const use = line.match(/^\s*use\s+([^;]+);/);
  if (use) addImport(use[1].trim());
}

Sources: src/extract.ts:35-44

Sources: src/cli.ts:1-20

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

medium Configuration risk needs validation

Users may get misleading failures or incomplete behavior unless configuration is checked carefully.

medium README/documentation is current enough for a first validation pass.

The project should not be treated as fully validated until this signal is reviewed.

medium Maintainer activity is unknown

Users cannot judge support quality until recent activity, releases, and issue response are checked.

medium no_demo

The project may affect permissions, credentials, data exposure, or host boundaries.

Doramagic Pitfall Log

Doramagic extracted 7 source-linked risk signals. Review them before installing or handing real data to the project.

1. Configuration risk: Configuration risk needs validation

Severity: medium
Finding: Configuration risk is backed by a source signal: Configuration risk needs validation. Treat it as a review item until the current version is checked.
User impact: Users may get misleading failures or incomplete behavior unless configuration is checked carefully.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: capability.host_targets | github_repo:1240001007 | https://github.com/Inferensys/contextful | host_targets=claude, claude_code

2. Capability assumption: README/documentation is current enough for a first validation pass.

Severity: medium
Finding: README/documentation is current enough for a first validation pass.
User impact: The project should not be treated as fully validated until this signal is reviewed.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: capability.assumptions | github_repo:1240001007 | https://github.com/Inferensys/contextful | README/documentation is current enough for a first validation pass.

3. Maintenance risk: Maintainer activity is unknown

Severity: medium
Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: evidence.maintainer_signals | github_repo:1240001007 | https://github.com/Inferensys/contextful | last_activity_observed missing

4. Security or permission risk: no_demo

Severity: medium
Finding: no_demo
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: downstream_validation.risk_items | github_repo:1240001007 | https://github.com/Inferensys/contextful | no_demo; severity=medium

5. Security or permission risk: no_demo

Severity: medium
Finding: no_demo
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: risks.scoring_risks | github_repo:1240001007 | https://github.com/Inferensys/contextful | no_demo; severity=medium

6. Maintenance risk: issue_or_pr_quality=unknown

Severity: low
Finding: issue_or_pr_quality=unknown。
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: evidence.maintainer_signals | github_repo:1240001007 | https://github.com/Inferensys/contextful | issue_or_pr_quality=unknown

7. Maintenance risk: release_recency=unknown

Severity: low
Finding: release_recency=unknown。
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: evidence.maintainer_signals | github_repo:1240001007 | https://github.com/Inferensys/contextful | release_recency=unknown

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 1

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using contextful with real data or production workflows.

Configuration risk needs validation - GitHub / issue

Source: Project Pack community evidence and pitfall evidence