Doramagic Project Pack · Human Manual
contextful
The Contextful system consists of several interconnected components that work together to provide context management capabilities.
Project Introduction
Related topics: High-Level Architecture, Quick Start Guide
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: High-Level Architecture, Quick Start Guide
Project Introduction
Contextful is an intelligent code context management system designed to provide AI agents with compact, evidence-backed information for codebase navigation and understanding. The project serves as a bridge between large codebases and AI-powered development tools by indexing source code, extracting symbols, tracking dependencies, and generating token-budgeted evidence packs for queries.
Purpose and Scope
Contextful solves the fundamental problem that AI coding assistants face when working with large repositories: excessive context requirements that lead to token waste and degraded performance. Instead of forcing agents to read dozens of random files, Contextful enables targeted, cited, and ranked context retrieval that maximizes the value of each token spent.
The system operates in three primary modes:
- Indexing Mode - Scans and indexes source code, extracting symbols, dependencies, and semantic chunks
- Query Mode - Creates evidence packs for natural language queries with token budgets
- Search Mode - Provides lightweight search across code, docs, symbols, and memory without full evidence compilation
Sources: README.md:1-15
Architecture Overview
The Contextful system consists of several interconnected components that work together to provide context management capabilities.
graph TD
A[Source Code] --> B[Indexing Engine]
B --> C[SQLite Kernel DB]
C --> D[Search Module]
C --> E[Graph Analysis]
C --> F[Memory Ledger]
G[CLI / MCP Server] --> D
G --> E
G --> F
D --> H[Evidence Pack]
E --> H
F --> H
H --> I[AI Agent / User]Component Responsibilities
| Component | File | Responsibility |
|---|---|---|
| Indexing Engine | src/extract.ts | Parse source files, extract symbols and dependencies |
| Search Module | src/search.ts | Full-text search, intent classification, ranking |
| Graph Analysis | src/search.ts | Trace dependencies and code paths |
| Memory Ledger | src/memory.ts | Store evidence-backed lessons across sessions |
| CLI Interface | src/cli.ts | Command-line interface for all operations |
| MCP Server | src/mcp-server.ts | Model Context Protocol stdio server |
Sources: src/extract.ts:1-50, src/search.ts:1-30, src/cli.ts:1-40
Supported Languages and File Types
Contextful supports multiple programming languages through pattern-based extraction. The indexing engine recognizes language-specific syntax for symbols and dependencies.
Language Support Matrix
| Language | Functions | Classes | Types | Imports |
|---|---|---|---|---|
| TypeScript/JavaScript | ✓ | ✓ | ✓ | ✓ |
| Python | ✓ | ✓ | - | ✓ |
| Go | ✓ | ✓ | ✓ | ✓ |
| Rust | ✓ | ✓ | ✓ | ✓ |
| Markdown | - | - | Headings | - |
| JSON | - | - | Config keys | - |
Sources: src/extract.ts:15-80
Core MCP Tools
Contextful exposes its capabilities through the Model Context Protocol (MCP), providing AI agents with a standardized tool interface. The primary tools are designed to keep the agent surface small while providing maximum utility.
graph LR
A[Agent] -->|context_pack| B[Evidence Pack Generator]
A -->|search_code| C[Code Search]
A -->|trace_path| D[Graph Traversal]
A -->|impact_analysis| E[Dependency Analyzer]
A -->|why_changed| F[Git History]
A -->|recall_memory| G[Memory Search]
A -->|write_lesson| H[Lesson Writer]Tool Descriptions
| Tool | Purpose | Key Parameters |
|---|---|---|
context_pack | Returns ranked, cited, token-budgeted context bundles | query, budget, scope |
search_code | Powerful search across code, docs, symbols, and memory | query, mode, filters |
trace_path | Graph traversal across files, symbols, modules, and config | from, to, edge_types |
impact_analysis | Reverse dependencies and likely tests | symbol_or_file |
why_changed | Current evidence plus git history | symbol_or_file |
recall_memory | Search session learnings and durable lessons | query, scope |
write_lesson | Store evidence-backed lessons | claim, evidence_refs, confidence |
Sources: README.md:25-45, src/mcp-server.ts:1-80
CLI Interface
Contextful provides a command-line interface through the cxf binary (with contextful as a readable alias). The CLI supports both one-shot operations and daemon mode for continuous indexing.
Command Reference
| Command | Description | Key Options |
|---|---|---|
index | Index a workspace | --workspace, --watch |
daemon | Run local indexing daemon | --workspace |
query | Create evidence pack for query | --workspace, --budget, --json |
search | Search without full evidence pack | --workspace, --limit, --kind |
report | Generate context report | --workspace, --format |
memory add | Store evidence-backed lesson | --claim, --evidence, --scope, --confidence |
server | Run MCP stdio server | - |
Sources: src/cli.ts:40-120, README.md:15-35
Example Usage
# Index a workspace
npx @inferensys/contextful index --workspace .
# Query with token budget
npx @inferensys/contextful query "where is user auth handled" --workspace . --budget 2000
# Run as MCP server
npx @inferensys/contextful server
Sources: README.md:8-15
Data Models
Evidence Pack Structure
The EvidencePack is the core data structure returned by query operations. It contains all necessary context for an agent to answer a query.
interface EvidencePack {
id: string; // Unique pack identifier
query: string; // Original query
scope: string; // Scope of the context
intent: SearchIntent; // Classified query intent
summary: string; // Human-readable summary
citations: SearchHit[]; // Ranked evidence items
files: FileContext[]; // Grouped file references
symbols: SymbolRecord[]; // Relevant symbols
graphPaths: GraphPath[]; // Dependency paths
memoryHits: SearchHit[]; // Memory matches
confidence: number; // Confidence score (0.1-0.92)
tokenEstimate: number; // Estimated token count
budget: number; // Token budget
createdAt: string; // ISO timestamp
}
Sources: src/search.ts:200-250
Search Hit Structure
Each search result is represented as a SearchHit with relevance ranking and excerpt information.
| Field | Type | Description |
|---|---|---|
ref | string | Reference identifier (e.g., file:src/auth.ts:1-20) |
path | string | File path |
title | string | Display title |
excerpt | string | Relevant text snippet |
kind | string | Type: code, doc, symbol, memory |
rank | number | BM25 relevance score |
Sources: src/search.ts:50-80
Dependencies and Technology Stack
Contextful is built on a carefully selected set of dependencies that enable efficient code indexing and search.
| Dependency | Version | Purpose |
|---|---|---|
@modelcontextprotocol/sdk | ^1.29.0 | MCP protocol implementation |
better-sqlite3 | ^12.10.0 | SQLite database for indexing |
commander | ^14.0.3 | CLI argument parsing |
fast-glob | ^3.3.3 | File pattern matching |
tree-sitter-wasms | ^0.1.13 | Syntax parsing |
web-tree-sitter | ^0.20.8 | Tree-sitter bindings |
zod | ^4.4.3 | Schema validation |
Sources: package.json:20-40
System Requirements
- Node.js: >= 20
- License: MIT
- Repository: inferensys/contextful
Sources: package.json:45-55
Supported IDE Integration
Contextful is designed to integrate with a wide range of AI-powered development tools:
| IDE/Extension | Status |
|---|---|
| GitHub Copilot | Supported |
| VS Code | Supported |
| Cursor | Supported |
| Windsurf | Supported |
| Cline | Supported |
| Roo Code | Supported |
| Continue | Supported |
| Zed | Supported |
Sources: package.json:10-20
Workflow: From Indexing to Query
The complete workflow demonstrates how Contextful transforms raw source code into actionable intelligence for AI agents.
sequenceDiagram
participant U as User/Agent
participant CLI as CLI/MCP Server
participant IDX as Indexer
participant DB as SQLite Kernel
participant SRCH as Search Engine
participant MEM as Memory Ledger
U->>CLI: index --workspace ./project
CLI->>IDX: Extract symbols & dependencies
IDX->>DB: Store in chunks_fts, symbols, edges
DB-->>CLI: Index complete
U->>CLI: query "how is auth handled"
CLI->>SRCH: classifyQuery() intent=exact
SRCH->>DB: FTS + BM25 search
DB-->>SRCH: Ranked hits
SRCH->>MEM: Check memory ledger
MEM-->>SRCH: Related lessons
CLI-->>U: EvidencePack (token-budgeted)
U->>CLI: write_lesson --claim "Auth pattern" --evidence file:...
CLI->>MEM: Store lesson with confidence
MEM-->>CLI: Lesson savedSources: src/search.ts:100-150, src/report.ts:80-120
Next Steps
To continue exploring Contextful:
- Installation Guide - Set up Contextful in your development environment
- CLI Reference - Detailed documentation of all CLI commands
- MCP Tools API - Complete reference for MCP tool interfaces
- Configuration - Workspace configuration and tuning options
- Memory System - Using the evidence-backed lesson system
Sources: README.md:1-15
Quick Start Guide
Related topics: Project Introduction
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Project Introduction
Quick Start Guide
Overview
Contextful is a contextual indexing and search system designed to help AI agents efficiently retrieve relevant code evidence. Instead of forcing agents to perform dozens of random file reads, Contextful returns compact, ranked, and cited evidence packs that fit within a token budget.
Sources: README.md:1-10
Installation
Install Contextful using npm. The package provides both the cxf binary and the full contextful alias.
npm install -g @inferensys/contextful
Alternatively, run commands directly via npx:
npx @inferensys/contextful index --workspace .
Sources: README.md:11-14
CLI Commands
Contextful provides a command-line interface with the following primary commands:
| Command | Description |
|---|---|
cxf index | Index a workspace for search |
cxf daemon | Run a local indexing daemon |
cxf query | Create an evidence pack for a query |
cxf search | Search indexed context |
cxf report | Generate a context report |
cxf memory add | Store an evidence-backed lesson |
cxf server | Run the MCP stdio server |
Sources: README.md:23-32
Basic Workflow
Step 1: Index Your Workspace
Before searching, you must index your codebase. This creates the searchable database:
cxf index --workspace .
For continuous indexing as files change, use the daemon mode:
cxf daemon --workspace .
Sources: src/cli.ts:1-20
Step 2: Query for Context
Once indexed, ask questions about your codebase:
cxf query "where is user auth handled" --workspace . --budget 2000
The query command returns a ranked evidence pack with citations and file references.
#### Query Options
| Option | Description | Default |
|---|---|---|
--workspace <path> | Workspace path | Current directory |
--budget <tokens> | Approximate token budget | 2000 |
--json | Output as JSON instead of Markdown | false |
Sources: src/cli.ts:22-30
Step 3: Search Without Building Evidence Packs
For quick lookups without compiling full evidence packs, use search:
cxf search "authentication middleware" --workspace . --limit 10 --kind code
#### Search Options
| Option | Description | Default |
|---|---|---|
--workspace <path> | Workspace path | Current directory |
--limit <count> | Maximum hits | 10 |
--kind | Filter: all, code, docs, symbols, memory | all |
Sources: src/cli.ts:32-42
Step 4: Generate Reports
Generate comprehensive context reports in various formats:
cxf report --workspace . --format markdown
cxf report --workspace . --format json
cxf report --workspace . --format html
Sources: src/cli.ts:44-48
MCP Server Integration
Contextful can run as a Model Context Protocol (MCP) server, providing tools directly to AI agents.
cxf server
Available MCP Tools
| Tool | Purpose |
|---|---|
context_pack | Returns ranked, cited, token-budgeted evidence bundles |
search_code | Code, docs, symbol, and memory search |
trace_path | Graph traversal across files, symbols, modules, and config |
impact_analysis | Reverse dependencies and likely tests |
why_changed | Current evidence plus git history |
recall_memory | Search session learnings and durable project lessons |
write_lesson | Store evidence-backed lessons for future sessions |
Sources: README.md:40-48
MCP Tool Parameters
#### context_pack
| Parameter | Type | Required | Description |
|---|---|---|---|
query | string | Yes | Query to answer from indexed context |
budget | number | No | Token budget for the response |
scope | string | No | Search scope |
Sources: src/mcp-server.ts:1-25
#### search_code
| Parameter | Type | Required | Description |
|---|---|---|---|
query | string | Yes | Search query |
mode | string | No | Search mode |
filters | object | No | Search filters |
workspace | string | No | Workspace path |
limit | number | No | Maximum results |
Sources: src/mcp-server.ts:26-40
#### write_lesson
| Parameter | Type | Required | Description |
|---|---|---|---|
claim | string | Yes | Lesson claim |
evidence_refs | array | Yes | Evidence references (e.g., file:src/auth.ts:1-20) |
scope | string | No | Memory scope |
confidence | number | No | Confidence from 0 to 1 |
supersedes | string | No | Previous lesson ID to supersede |
Sources: src/mcp-server.ts:65-80
Memory System
Contextful includes an evidence-backed memory system for storing lessons across sessions.
Adding a Lesson
cxf memory add \
--claim "Always validate tokens in middleware" \
--evidence "file:src/auth.ts:1-20" \
--workspace . \
--confidence 0.8
#### Memory Command Options
| Option | Required | Description |
|---|---|---|
--claim <text> | Yes | The lesson or claim |
--evidence <ref...> | Yes | Evidence references |
--workspace <path> | No | Workspace path |
--scope <scope> | No | Memory scope (default: repo) |
--confidence <number> | No | Confidence from 0 to 1 (default: 0.7) |
Sources: src/cli.ts:50-75
Output Formats
Markdown Output (Default)
cxf query "where is auth handled" --workspace .
Returns a formatted Markdown document with citations and graph paths.
JSON Output
cxf query "where is auth handled" --workspace . --json
Returns structured JSON data suitable for programmatic processing.
Sources: src/cli.ts:22-30
Report Formats
| Format | Description |
|---|---|
markdown | Human-readable Markdown report |
json | Structured JSON data |
html | Standalone HTML page |
Sources: src/cli.ts:44-48
Architecture Overview
graph TD
A[CLI / MCP Server] --> B[Workspace Indexer]
B --> C[SQLite Kernel DB]
C --> D[Full-Text Search]
C --> E[Symbol Index]
C --> F[Graph Edges]
G[Query Request] --> H[Search Context]
H --> I[Evidence Pack Builder]
I --> D
I --> E
I --> F
I --> J[Memory Ledger]
I --> K[Evidence Pack Output]
J --> JCommon Usage Patterns
Pattern 1: Initial Setup
# Index the workspace
cxf index --workspace /path/to/project --watch
# Generate initial report
cxf report --workspace /path/to/project --format html > report.html
Pattern 2: Interactive Exploration
# Run as MCP server
cxf server
# Or use CLI directly
cxf query "how does the cache work" --workspace . --budget 3000
Pattern 3: Agent Memory Persistence
# Store learned lessons
cxf memory add --claim "Config validation happens in validate.ts" --evidence "file:src/config/validate.ts:1-50"
# Recall past lessons
# Via MCP: recall_memory(query="config validation")
Next Steps
- Explore Architecture Documentation for deep dive into indexing and search internals
- Learn about Memory System for evidence-backed knowledge persistence
- Review API Reference for programmatic integration
Sources: README.md:1-10
High-Level Architecture
Related topics: Runtime Components, Search Engine, SQLite Database Schema
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Runtime Components, Search Engine, SQLite Database Schema
High-Level Architecture
Contextful is a local-only indexing and context management tool designed to help AI coding assistants retrieve compact, evidence-backed context from workspace codebases. The system operates without external embedding APIs, instead relying on SQLite FTS5 full-text search, graph-based dependency tracking, and intent-classified query routing. Sources: README.md
System Overview
Contextful functions as a local daemon that continuously indexes workspace files, extracts code symbols and import relationships, and provides a structured context pack API to agents. The architecture follows a three-layer design:
- Indexing Layer - File parsing, symbol extraction, edge detection
- Storage Layer - SQLite kernel with FTS5 search and graph tables
- Query Layer - Intent classification, ranked search, evidence pack assembly
Sources: src/indexer.ts
Component Architecture
graph TD
A[Workspace Files] --> B[Indexer]
B --> C[Symbol Extraction]
B --> D[Edge Detection]
B --> E[Chunk Generation]
C --> F[SQLite Kernel DB]
D --> F
E --> F
G[CLI / MCP Server] --> H[Search Module]
H --> F
H --> I[Context Pack Assembly]
I --> J[Evidence Pack Output]Core Components
| Component | File | Responsibility |
|---|---|---|
| Indexer | src/indexer.ts | Recursively walks workspace, triggers file processing |
| Extractor | src/extract.ts | Parses symbols, edges, and code chunks per file |
| Search | src/search.ts | FTS5 queries, intent classification, ranking |
| CLI | src/cli.ts | Command-line interface and MCP server entry point |
| Report | src/report.ts | Generates workspace context reports |
Sources: src/indexer.ts, src/extract.ts, src/search.ts
Indexing Pipeline
The indexing pipeline processes workspace files through multiple extraction stages. Each source file is read, classified by language, and passed through specialized extractors that produce structured records.
graph LR
A[File Content] --> B[Language Detection]
B --> C[Symbol Extraction]
B --> D[Edge Extraction]
B --> E[Chunk Extraction]
C --> F[symbols table]
D --> G[edges table]
E --> H[chunks_fts table]Symbol Extraction
The extractSymbols function identifies named code entities based on language-specific patterns:
| Language | Supported Symbols |
|---|---|
| TypeScript/JavaScript | functions, classes, interfaces, types, const arrow functions |
| Python | functions, classes |
| Go | functions, structs, interfaces |
| Rust | functions, structs, enums, traits, impl blocks |
| Markdown | headings |
| JSON | config keys |
Sources: src/extract.ts:1-80
Edge Detection
Import relationships are tracked as directed edges between modules. The extractEdges function processes different import syntaxes per language:
- TypeScript/JavaScript: ES6
importandrequire()statements - Python:
from ... importandimportstatements - Go: Import strings within double quotes
- Rust:
useandmoddeclarations - JSON: Top-level keys in configuration files
Sources: src/extract.ts:100-160
Chunk Generation
Code files are split into semantic chunks for full-text search. The codeChunks function segments content into logical blocks based on:
- Empty line boundaries
- Token budget (target: ~300 tokens per chunk)
- Language-specific token estimation via
estimateTokens
Sources: src/extract.ts:180-220
Storage Layer
SQLite Kernel Schema
The kernel database uses SQLite with several specialized tables:
| Table | Purpose | Key Columns |
|---|---|---|
files | Tracked workspace files | path, language, hash, indexed_at |
symbols | Extracted code symbols | ref, name, kind, file_path, line, signature, exported |
edges | Import/dependency graph | source_file, target_name, target_type, edge_type, line |
chunks_fts | FTS5 virtual table for full-text search | ref, path, title, text, kind |
memory | Evidence-backed lessons | id, claim, scope, confidence, created_at |
Sources: src/search.ts, src/indexer.ts
Query and Search System
Intent Classification
Queries are classified into intents to optimize search strategy:
| Intent | Trigger Keywords | Search Focus |
|---|---|---|
code | function, class, implementation | Symbol and code chunks |
memory | memory, lesson, session | Memory ledger |
impact | impact, depends on, blast radius | Dependency graph |
historical | why, changed, commit | Git history |
architectural | architecture, flow, path, trace | Graph traversal |
docs | documentation, readme, guide | Markdown chunks |
exact | symbols, paths, line references | Precise symbol matching |
vague | Default fallback | Broad FTS search |
Sources: src/search.ts:1-50
Context Pack Assembly
The createContextPack function orchestrates the evidence gathering:
- Classify query intent
- Execute FTS5 search across chunks
- Apply query expansion with domain-specific term additions
- Score and rank hits using BM25 with intent-based bonuses
- Select hits within token budget
- Load related symbols and graph paths
- Assemble and return
EvidencePack
Sources: src/search.ts:200-280
CLI and MCP Integration
Command Structure
| Command | Purpose | Key Options |
|---|---|---|
index | Initial workspace indexing | --workspace, --watch |
daemon | Continuous indexing with file watching | --workspace |
query | Generate evidence pack | --workspace, --budget, --json |
search | Direct search without packing | --workspace, --limit, --kind |
report | Generate context report | --workspace, --format |
memory add | Store evidence-backed lessons | --claim, --evidence, --scope |
server | Start MCP stdio server | (none) |
Sources: src/cli.ts:20-100
MCP Server Tools
The MCP server exposes standardized tools for agent integration:
context_pack(query, budget, scope)- Primary killer tool returning ranked, cited evidencesearch_code(query, mode, filters)- Code, docs, symbol, and memory searchtrace_path(from, to, edge_types)- Graph traversal across the codebaseimpact_analysis(symbol_or_file)- Reverse dependency analysiswhy_changed(symbol_or_file)- Git history with current evidencerecall_memory(query, scope)- Search persistent lessonswrite_lesson(claim, evidence_refs, scope)- Store new memories
Sources: README.md
Report Generation
The report system aggregates workspace statistics and warnings:
graph TD
A[generateReport] --> B[Index Status Check]
B --> C[File Statistics]
B --> D[Symbol Statistics]
B --> E[Edge Statistics]
B --> F[Warning Collection]
C --> G[renderMarkdown / renderHtml]
D --> G
E --> G
F --> GReports support three output formats:
- markdown - Plain text with markdown headings
- json - Structured JSON with all report fields
- html - Self-contained HTML document with styling
Sources: src/report.ts:1-80
Privacy and Security
Contextful operates entirely locally with no external API calls:
- No embedding API calls for vector search
- No source code uploads
- No file editing or auto-fixes
- No dependency installation in target workspace
Evidence references are validated and stale references are rejected to maintain integrity of the memory system.
Sources: README.md
Data Flow Summary
sequenceDiagram
participant User
participant CLI as CLI/MCP Server
participant Indexer
participant Extractor
participant Search
participant Kernel as SQLite Kernel
User->>CLI: index --workspace .
CLI->>Indexer: indexWorkspace()
Indexer->>Extractor: extractFile()
Extractor->>Kernel: Insert symbols, edges, chunks
Kernel-->>Indexer: Confirmation
User->>CLI: query "where is auth handled"
CLI->>Search: searchContext()
Search->>Kernel: FTS5 query
Search->>Kernel: Graph traversal
Search->>Kernel: Memory search
Kernel-->>Search: Ranked hits
Search-->>CLI: EvidencePack
CLI-->>User: Compact context outputKey Design Decisions
| Decision | Rationale |
|---|---|
| SQLite FTS5 over vector embeddings | Local-only operation, no external API dependencies |
| Intent-based query routing | Optimizes search strategy based on query semantics |
| BM25 scoring with bonuses | Balances relevance with domain-specific priorities |
| Token-budgeted evidence packs | Prevents context overflow in LLM contexts |
| Evidence refs as first-class citizens | Enables verifiable, traceable AI responses |
Sources: src/search.ts:50-150, src/util.ts
Sources: src/indexer.ts
Runtime Components
Related topics: High-Level Architecture
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: High-Level Architecture
Runtime Components
Overview
The Runtime Components in Contextful encompass the services, daemons, and server processes that enable real-time code indexing, search, and context-aware information retrieval. These components operate as the execution layer of the application, providing persistent indexing, live workspace monitoring, and MCP (Model Context Protocol) server capabilities for AI agent integration.
The runtime layer bridges the gap between static code analysis and dynamic query resolution, allowing users and AI agents to query indexed repositories with token-budgeted evidence packs.
Source: https://github.com/Inferensys/contextful / Human Manual
Search Engine
Related topics: Context Packs, SQLite Database Schema
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Context Packs, SQLite Database Schema
Search Engine
Overview
The Search Engine is the core retrieval system in Contextful, designed to provide intelligent, evidence-backed context for agent queries. It combines full-text search (FTS), symbol indexing, dependency graph traversal, and memory recall to deliver ranked, cited results within a configurable token budget.
The system serves as the foundation for multiple interfaces: CLI commands (query, search), MCP server tools (search_code, context_pack), and report generation.
Sources: src/search.ts:1-50
Architecture
graph TD
A[Query Input] --> B[Query Classification]
B --> C{Intent Type}
C -->|code/docs| D[Full-Text Search]
C -->|symbols| E[Symbol Lookup]
C -->|memory| F[Memory Ledger Search]
C -->|impact| G[Graph Traversal]
C -->|historical| H[Git History + Search]
D --> I[BM25 Ranking]
E --> J[Symbol Index]
F --> K[Memory DB]
G --> L[Edge Database]
H --> M[Git Operations]
I --> N[Result Scoring]
J --> N
K --> N
L --> N
N --> O[Context Pack]Core Components
| Component | File | Responsibility |
|---|---|---|
| Search Kernel | src/search.ts | Core search logic and ranking |
| Query Classifier | src/search.ts | Intent detection |
| FTS Engine | src/search.ts | Full-text search using SQLite FTS5 |
| Graph Tracer | src/search.ts | Dependency graph traversal |
| Memory Store | src/memory.ts | Evidence-backed memory recall |
Sources: src/search.ts:50-120
Query Classification
The search engine classifies each query into one of seven intent types to optimize retrieval strategy.
SearchIntent Types
| Intent | Trigger Keywords | Search Strategy |
|---|---|---|
code | code, function, class, impl | FTS + symbol lookup |
docs | resource, docs, readme, how to | FTS on markdown/json |
symbols | define, interface, type, symbol | Direct symbol index |
memory | remember, lesson, learned, session | Memory ledger query |
impact | impact, affected, depends, blast radius | Reverse dependency graph |
historical | why, changed, commit, history | Git history + current search |
architectural | architecture, flow, trace, connects | Graph path analysis |
exact | Code patterns, paths, line refs | Direct file/symbol lookup |
vague | Default | Broad FTS + graph |
function classifyQuery(query: string): SearchIntent {
const q = query.toLowerCase();
if (/\b(code|function|class|implement|module)\b/.test(q)) return "code";
if (/\b(define|interface|type|symbol)\b/.test(q)) return "symbols";
if (/\b(memory|remember|lesson|learned|sessions?)\b/.test(q)) return "memory";
// ... additional classifications
}
Sources: src/search.ts:1-30
Search Flow
Main Search Pipeline
sequenceDiagram
participant CLI as CLI/MCP
participant Search as searchContext()
participant Kernel as Kernel DB
participant FTS as FTS5 Engine
participant Graph as Graph DB
participant Memory as Memory Store
CLI->>Search: query, workspace, limit
Search->>Kernel: ensureIndexed()
Search->>Kernel: addQuery()
Search->>FTS: ftsQuery(expandedTerms)
FTS-->>Search: ranked rows (BM25)
Search->>Search: scoreFromRank()
Search->>Graph: loadGraphPaths()
Search-->>CLI: {intent, hits}Full-Text Search Query Builder
The ftsQuery function transforms user queries into FTS5-compatible search strings:
function ftsQuery(query: string): string {
const terms = expandedTerms(query);
return Array.from(new Set(terms.map((term) => term.toLowerCase())))
.filter((term) => !STOPWORDS.has(term))
.slice(0, 14)
.map((term) => `${term}*`)
.join(" OR ");
}
Key behaviors:
- Expands terms based on query context (e.g., "tool" → "server", "tool", "callTool")
- Filters stopwords:
where,what,which,when,how,are,the,for,with,and,or,to - Limits to 14 terms maximum
- Appends wildcard
*for prefix matching
Sources: src/search.ts:200-280
Scoring System
Rank-to-Score Transformation
The scoreFromRank function converts BM25 ranks into relevance scores (0-10 scale) with domain-specific bonuses:
function scoreFromRank(rank: number, query: string, corpus: string): number {
const base = 10 / (1 + Math.abs(rank));
let bonus = 0;
// Domain-specific bonuses
if (/\b(tool|tools|registered|register)\b/.test(q) && corpus.includes("server.tool(")) {
bonus += 9;
}
if (/\bmcp\b/.test(q) && corpus.includes("mcp-server")) {
bonus += 4;
}
return clamp(base + bonus, 0.1, 10);
}
Scoring Bonuses Matrix
| Query Pattern | Content Match | Bonus |
|---|---|---|
tool/tools/register | server.tool( | +9 |
mcp | mcp-server | +4 |
where registered | function runMcpServer | +4 |
tool query | src/search.ts | -8 |
memory query | src/memory.ts | +5 |
memory query | src/search.ts | -16 |
This anti-gaming mechanism penalizes results from the search implementation itself when irrelevant.
Sources: src/search.ts:240-320
Term Expansion
The expandedTerms function intelligently expands query terms based on semantic context:
function expandedTerms(query: string): string[] {
const lower = query.toLowerCase();
const additions: string[] = [];
if (/\b(tool|tools|registered|register)\b/.test(lower)) {
additions.push("server", "tool", "tools", "callTool");
}
if (/\bmcp\b/.test(lower)) {
additions.push("mcp", "server", "stdio");
}
if (/\bmemory|memories|remember|remembers|lesson|lessons|learned|session|sessions\b/.test(lower)) {
additions.push("memory", "memories", "lesson", "lessons", "claim", "ledger", "evidence");
}
if (/\bimpact|depends|dependents|uses\b/.test(lower)) {
additions.push("imports", "tests", "edges");
}
return [...terms, ...additions];
}
Sources: src/search.ts:320-380
CLI Commands
Query Command
cxf query "<query>" --workspace <path> --budget <tokens> --json
| Option | Type | Default | Description |
|---|---|---|---|
query | string | required | Query to answer from indexed context |
--workspace | path | cwd() | Workspace path |
--budget | number | 2000 | Approximate token budget |
--json | flag | false | Output JSON instead of Markdown |
Search Command
cxf search "<query>" --workspace <path> --limit <count> --kind <kind>
| Option | Type | Default | Description | ||||
|---|---|---|---|---|---|---|---|
query | string | required | Search query | ||||
--workspace | path | cwd() | Workspace path | ||||
--limit | number | 10 | Maximum hits | ||||
--kind | enum | all | Search category: `all\ | code\ | docs\ | symbols\ | memory` |
Sources: src/cli.ts:40-80
MCP Server Tools
The search engine exposes the following MCP tools:
search_code
server.tool("search_code", "Search indexed code, docs, symbols, and stored context", {
query: z.string(),
mode: z.enum(["all", "code", "docs", "symbols", "memory"]).optional(),
limit: z.number().optional(),
filters: z.record(z.string(), z.unknown()).optional()
});
trace_path
server.tool("trace_path", "Trace graph relationships between files, symbols, modules", {
from: z.string(),
to: z.string().optional(),
edge_types: z.array(z.string()).optional(),
limit: z.number().optional()
});
impact_analysis
server.tool("impact_analysis", "Find likely dependents and tests", {
symbol_or_file: z.string(),
limit: z.number().optional()
});
why_changed
server.tool("why_changed", "Explain why a file/symbol may have changed", {
symbol_or_file: z.string(),
limit: z.number().optional()
});
Sources: src/mcp-server.ts:1-80
Context Pack
The createContextPack function assembles comprehensive evidence bundles:
export async function createContextPack(options: {
workspace?: string;
query: string;
budget?: number;
scope?: string;
}): Promise<EvidencePack>
EvidencePack Structure
| Field | Type | Description |
|---|---|---|
id | string | Unique pack identifier (ctx_<hash>) |
query | string | Original query |
scope | string | Search scope (default: repo) |
intent | SearchIntent | Classified intent |
summary | string | Human-readable summary |
citations | SearchHit[] | Ranked search results |
files | FileContext[] | Grouped file references |
symbols | SymbolRecord[] | Relevant symbols (≤20) |
graphPaths | GraphPath[] | Dependency connections (≤20) |
memoryHits | SearchHit[] | Memory matches |
confidence | number | Confidence score (0.1-0.92) |
tokenEstimate | number | Estimated token count |
budget | number | Token budget used |
createdAt | string | ISO timestamp |
Confidence Calculation
function confidenceFor(hits: SearchHit[], graphPaths: GraphPath[], memoryHits: SearchHit[]): number {
return clamp(
0.25 +
hits.length * 0.05 +
graphPaths.length * 0.02 +
memoryHits.length * 0.05,
0.1,
0.92
);
}
Sources: src/search.ts:400-480
Graph Traversal
The traceGraph function performs dependency graph analysis:
export async function traceGraph(options: {
workspace?: string;
from: string;
to?: string;
edgeTypes?: string[];
limit?: number;
}): Promise<GraphPath[]>
Edge Types
| Edge Type | Direction | Description |
|---|---|---|
IMPORTS | File → Module | Import/require statements |
DEFINES | File → Symbol | Symbol definitions |
CONFIGURES | File → Config | Configuration keys |
TESTS | Test → Source | Test file relationships |
Impact Analysis
export async function impactAnalysis(options: {
workspace?: string;
target: string;
limit?: number;
}): Promise<{
target: string;
forward: string[];
reverse: string[];
tests: string[];
}>
Returns forward dependencies, reverse dependents, and likely test files for a given symbol or file.
Sources: src/search.ts:480-550
Utility Functions
lineRange
Extracts a specific line range from text:
export function lineRange(text: string, startLine: number, endLine: number): string {
const lines = text.split(/\r?\n/);
return lines.slice(Math.max(0, startLine - 1), Math.min(lines.length, endLine)).join("\n");
}
clamp
Constrains values within bounds:
export function clamp(value: number, min: number, max: number): number {
return Math.max(min, Math.min(max, value));
}
unique
Deduplicates arrays:
export function unique<T>(items: T[]): T[] {
return Array.from(new Set(items));
}
isLikelyBinary
Detects binary files by checking for null bytes:
export function isLikelyBinary(buffer: Buffer): boolean {
const sample = buffer.subarray(0, Math.min(buffer.length, 4096));
return sample.includes(0);
}
Sources: src/util.ts:1-50
Data Models
SearchHit
interface SearchHit {
ref: string; // Format: "file:path:start-end"
path: string; // File path
kind: string; // "chunk", "symbol", "memory", "doc"
title: string; // Display title
text: string; // Content snippet
score: number; // Relevance score
line?: number; // Starting line number
}
SymbolRecord
interface SymbolRecord {
ref: string;
name: string;
kind: string; // "function", "class", "interface", "type", etc.
filePath: string;
line: number;
signature?: string;
exported?: boolean;
}
Sources: src/search.ts:100-150
Index Status
The getIndexStatus function returns workspace indexing metadata:
export async function getIndexStatus(options: { workspace?: string }): Promise<IndexStatus>
IndexStatus Structure
| Field | Type | Description |
|---|---|---|
workspace | string | Workspace path |
languageCounts | Record<string, number> | File count per language |
warnings | string[] | Index warnings |
lastIndexed | string | ISO timestamp of last index |
totalChunks | number | Total indexed chunks |
Sources: src/search.ts:550-600
Summary
The Search Engine provides Contextful's intelligent retrieval capabilities through:
- Intent Classification - Automatically routes queries to optimal search strategies
- Full-Text Search - SQLite FTS5 with BM25 ranking and domain-specific scoring
- Symbol Index - Fast lookup of code definitions across languages
- Graph Traversal - Dependency analysis and impact tracking
- Memory Integration - Recall of past lessons and evidence-backed claims
- Token Budgeting - Constrains output to specified budget limits
- Confidence Scoring - Quantifies result reliability
All search operations flow through a unified kernel database that combines FTS chunks, symbol records, and edge relationships for comprehensive context retrieval.
Sources: src/search.ts:1-50
Context Packs
Related topics: Search Engine, Memory Ledger
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Search Engine, Memory Ledger
Context Packs
Context Packs are the core output format of Contextful, providing AI agents with compact, ranked, and cited evidence bundles that fit within a specified token budget. Instead of forcing agents to read dozens of arbitrary files, Context Packs deliver precisely the evidence needed to answer a specific query.
Overview
A Context Pack is a structured evidence package generated by the context_pack() MCP tool or the cxf query CLI command. It contains:
- Ranked code and documentation citations matching the query
- Related symbols (functions, classes, interfaces) from matching files
- Graph paths connecting related components
- Memory hits from evidence-backed lessons
- A confidence score and token budget accounting
The pack is designed to be consumed directly by an LLM agent, providing traceable citations and a clear summary of what evidence was found.
Data Model
EvidencePack Structure
| Field | Type | Description |
|---|---|---|
id | string | Unique identifier (format: ctx_<hash>) |
query | string | The original search query |
scope | string | Search scope (e.g., "repo") |
intent | SearchIntent | Classified query intent |
summary | string | Human-readable summary of findings |
citations | SearchHit[] | Ranked evidence items |
files | FileContext[] | Grouped file references with reasons |
symbols | SymbolRecord[] | Relevant symbols from matched files |
graphPaths | GraphPath[] | Graph traversals between components |
memoryHits | SearchHit[] | Memory/lesson hits |
confidence | number | Estimated confidence (0.1-0.92) |
tokenEstimate | number | Estimated token count of pack |
budget | number | Requested token budget |
createdAt | string | ISO timestamp of creation |
Sources: src/search.ts:search.ts
SearchHit Structure
| Field | Type | Description |
|---|---|---|
ref | string | Reference identifier (e.g., file:src/auth.ts:1-20) |
path | string | File path |
title | string | Display title |
kind | string | Hit kind: code, doc, symbol, memory |
excerpt | string | Relevant text excerpt |
score | number | Relevance score |
rank | number | BM25 rank |
SearchIntent Enum
| Intent | Trigger Keywords |
|---|---|
exact | Code patterns, paths, symbol names with special chars |
symbol | Function names, class names, method calls |
test | test, spec, mock, fixture, unit |
memory | memory, lesson, learned, session |
impact | impact, affected, depends, blast radius |
historical | why, changed, commit, history, regression |
architectural | architecture, flow, trace, connects, imports |
docs | resource, docs, documentation, guide, readme |
vague | Default for generic queries |
Sources: src/search.ts:search.ts
Creation Flow
The createContextPack function orchestrates the entire pack creation process:
graph TD
A[createContextPack] --> B[searchContext]
B --> C[classifyQuery]
C --> D[ftsQuery + expandedTerms]
D --> E[FTS Search on chunks_fts]
E --> F[scoreFromRank]
F --> G[Select Hits within Budget]
G --> H[loadSymbolsForPaths]
G --> I[loadGraphPaths]
G --> J[Filter memoryHits]
H --> K[Build EvidencePack]
I --> K
J --> K
K --> L[saveEvidencePack]
L --> M[Return EvidencePack]Step 1: Search Context
The process begins by classifying the query intent and executing full-text search:
const search = await searchContext({ workspace, query, limit: budget * 2 });
const selected = selectWithinBudget(search.hits, budget);
Sources: src/search.ts:search.ts
Step 2: Budget-Aware Selection
Hits are selected greedily until the token estimate exceeds the budget:
function selectWithinBudget(hits: SearchHit[], budget: number): SearchHit[] {
const selected: SearchHit[] = [];
let tokenEstimate = 0;
for (const hit of hits) {
const est = estimateTokens(hit.excerpt || hit.title);
if (tokenEstimate + est >= budget) break;
selected.push(hit);
tokenEstimate += est;
}
return selected;
}
Sources: src/search.ts:search.ts
Step 3: Symbol Loading
For each selected file, related symbols are loaded (up to 20 total):
const symbols = loadSymbolsForPaths(kernel.db, paths).slice(0, 20);
The symbols query joins against the symbols table:
SELECT ref, name, kind, file_path, line, signature, exported
FROM symbols
WHERE file_path IN (...)
Sources: src/search.ts:search.ts
Step 4: Graph Path Loading
Graph paths connect files through import/dependency relationships:
const graphPaths = loadGraphPaths(kernel.db, paths, 20);
Sources: src/search.ts:search.ts
Step 5: Memory Hit Extraction
Memory hits are filtered from selected hits by kind:
const memoryHits = selected.filter((hit) => hit.kind === "memory");
Step 6: Confidence Calculation
Confidence is calculated using a clamped formula:
function confidenceFor(hits, graphPaths, memoryHits): number {
return clamp(
0.25 + hits.length * 0.05 + graphPaths.length * 0.02 + memoryHits.length * 0.05,
0.1,
0.92
);
}
- Base: 0.25
- Each hit: +0.05
- Each graph path: +0.02
- Each memory hit: +0.05
- Clamped to [0.1, 0.92]
Sources: src/search.ts:search.ts
Query Classification
The classifyQuery function determines the search intent based on keywords:
function classifyQuery(q: string): SearchIntent {
const lower = q.toLowerCase();
if (/[`"'#.:/]/.test(q) || /\b[A-Z][A-Za-z0-9_]{2,}\b/.test(q)) return "exact";
if (/\b(test|spec|mock|fixture)\b/.test(q)) return "test";
if (/\b(memory|lesson|learned|session|sessions)\b/.test(q)) return "memory";
if (/\b(impact|affected|depends|dependents|blast radius)\b/.test(q)) return "impact";
if (/\b(why|changed|commit|history|regression|introduced)\b/.test(q)) return "historical";
if (/\b(architecture|flow|path|trace|connects|calls|imports)\b/.test(q)) return "architectural";
if (/\b(resource|docs|documentation|guide|readme|how to|setup)\b/.test(q)) return "docs";
return "vague";
}
Sources: src/search.ts:search.ts
Term Expansion
The expandedTerms function adds related terms to improve recall for specific domains:
function expandedTerms(query: string): string[] {
const additions: string[] = [];
if (/\b(tool|tools|registered|register)\b/.test(lower)) {
additions.push("server", "tool", "tools", "callTool");
}
if (/\bmcp\b/.test(lower)) {
additions.push("mcp", "server", "stdio");
}
if (/\bmemory|memories|remember|remembers|lesson|lessons\b/.test(lower)) {
additions.push("memory", "memories", "lesson", "lessons", "claim", "ledger", "evidence");
}
if (/\bimpact|depends|dependents|uses\b/.test(lower)) {
additions.push("imports", "tests", "edges");
}
return [...terms, ...additions];
}
Sources: src/search.ts:search.ts
Scoring Algorithm
The scoreFromRank function calculates relevance scores:
function scoreFromRank(rank: number, q: string): number {
let bonus = 0;
const lower = q.toLowerCase();
if (/\bmemory|memories|remember|remembers|lesson|lessons|sessions\b/.test(q)) {
if (lower.includes("memory ledger")) bonus += 7;
if (lower.includes("src/memory.ts")) bonus += 5;
if (lower.includes("readme.md")) bonus += 4;
if (lower.includes("src/search.ts")) bonus -= 16;
}
if (/\b(where|how)\b/.test(q) && lower.includes("config-key")) bonus -= 2;
return 10 / (1 + Math.abs(rank)) + bonus;
}
Sources: src/search.ts:search.ts
CLI Usage
The query command creates Context Packs via CLI:
cxf query "<query>" --workspace <path> --budget 2000 --json
Options
| Option | Type | Default | Description |
|---|---|---|---|
--workspace | path | cwd | Workspace path |
--budget | number | 2000 | Approximate token budget |
--json | flag | false | Output as JSON instead of Markdown |
Example Output
# Context Pack ctx_abc123
Query: where is user auth handled
Intent: architectural
Confidence: 65%
Token estimate: 1850/2000
Found 5 evidence items for a architectural query, with 2 graph connections and 1 memory hit.
## Citations
- file:src/auth.ts:1-50 (auth module)
Handles user authentication via JWT tokens...
- file:src/middleware/auth.ts:1-30 (auth middleware)
Express middleware for auth validation...
## Graph Paths
- src/auth.ts --IMPORTS--> src/utils/jwt.ts (src/auth.ts:5)
- src/middleware/auth.ts --IMPORTS--> src/auth.ts (src/middleware/auth.ts:3)
## Memory Hits
- memory:lesson:1: JWT tokens should be validated on every protected route.
Sources: src/cli.ts:cli.ts
Rendering
Context Packs can be rendered in multiple formats via renderEvidencePackMarkdown:
export function renderEvidencePackMarkdown(pack: EvidencePack): string {
const lines = [
`# Context Pack ${pack.id}`,
"",
`Query: ${pack.query}`,
`Intent: ${pack.intent}`,
`Confidence: ${Math.round(pack.confidence * 100)}%`,
`Token estimate: ${pack.tokenEstimate}/${pack.budget}`,
"",
pack.summary,
"",
"## Citations"
];
// ... citations, graph paths, memory hits
}
Sources: src/report.ts:report.ts
Chunk Extraction
Contextual chunks are extracted during indexing for searchability:
graph LR
A[Source File] --> B[Language Detection]
B --> C[extractSymbols]
B --> D[extractEdges]
B --> E[extractChunks]
C --> F[Symbol Table]
D --> G[Edge Table]
E --> H[Chunk Table]Supported Languages
| Language | Symbol Patterns |
|---|---|
| TypeScript/JavaScript | function, class, interface, type, const arrow |
| Python | def, class |
| Go | func, type struct/interface |
| Rust | fn, struct, enum, trait, impl |
| Markdown | headings (H1-H6) |
| JSON | top-level keys |
Sources: src/extract.ts:extract.ts
Chunking Strategy
- Code files: Divided into blocks of ~60 lines, with overlap for context
- Markdown files: Split by headings, with the heading as the chunk title
- Token estimation: Used for both selection and budget accounting
function codeChunks(relativePath: string, content: string): ChunkRecord[] {
const lines = content.split(/\r?\n/);
const chunks: ChunkRecord[] = [];
// Split into ~60-line blocks with overlap
for (let start = 1; start <= lines.length; start += 50) {
const end = Math.min(start + 60 - 1, lines.length);
const text = lineRange(content, start, end);
chunks.push({
ref: fileRef(relativePath, start, end),
filePath: relativePath,
startLine: start,
endLine: end,
kind: "file",
title: `${relativePath}:${start}-${end}`,
text,
tokenEstimate: estimateTokens(text)
});
}
return chunks;
}
Sources: src/extract.ts:extract.ts
Summary Generation
The summarizePack function generates human-readable summaries:
function summarizePack(
query: string,
intent: SearchIntent,
hits: SearchHit[],
graphPaths: GraphPath[],
memoryHits: SearchHit[]
): string {
if (hits.length === 0) {
return `No indexed evidence matched "${query}". Re-index or broaden the query.`;
}
return `Found ${hits.length} evidence item${hits.length === 1 ? "" : "s"} ` +
`for a ${intent} query, with ${graphPaths.length} graph connection${graphPaths.length === 1 ? "" : "s"} ` +
`and ${memoryHits.length} memory hit${memoryHits.length === 1 ? "" : "s"}.`;
}
Sources: src/search.ts:search.ts
Persistence
Evidence packs are saved to the kernel database for audit and retrieval:
saveEvidencePack(kernel.db, {
id: pack.id,
query: pack.query,
tokenEstimate,
json: JSON.stringify(pack)
});
Sources: src/search.ts:search.ts
Design Principles
- Token budget awareness: Never exceed the requested budget; select the most relevant items first
- Cited evidence: Every piece of information is traceable to a specific file and line range
- Intent-driven: Query classification shapes what gets searched and how results are interpreted
- Graph connectivity: Beyond matching files, show how they connect through imports and dependencies
- Memory integration: Blend indexed content with evidence-backed lessons from prior sessions
Sources: src/search.ts:search.ts
Memory Ledger
Related topics: Context Packs, Search Engine
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Context Packs, Search Engine
Memory Ledger
The Memory Ledger is Contextful's evidence-backed persistent memory system that enables AI agents to retain and recall learned lessons across sessions. Unlike ephemeral context that disappears when a session ends, the Memory Ledger stores structured knowledge annotated with source evidence, allowing agents to build cumulative understanding of a codebase over time.
Overview
The Memory Ledger solves a fundamental problem in AI-assisted development: knowledge gained during one session is lost in the next. When an agent discovers how authentication works, identifies a fragile dependency, or learns a non-obvious architectural pattern, that knowledge typically vanishes when the session ends.
Contextful's approach requires every stored memory to be anchored to concrete evidence—file references, code symbols, or prior context packs. This design prevents hallucinated or unsubstantiated memories from polluting the knowledge base and ensures that recalled lessons can be traced back to their source.
The system operates entirely locally with no external API calls, embedding services, or cloud dependencies. All memory data remains within the workspace's SQLite database.
Architecture
graph TD
A[Agent Session] -->|write_lesson| B[Memory Ledger]
A -->|recall_memory| C[Memory Search]
B -->|evidence refs| D[Evidence Pack]
C -->|cited memories| A
D -->|citations| E[Source Files]
F[Workspace DB] -->|stores| B
F -->|stores| CCore Components
| Component | Role | Source |
|---|---|---|
| Memory Storage | SQLite-backed persistent storage for lessons | src/db.ts |
| Memory Search | FTS-enabled retrieval of memories by query | src/search.ts |
| Evidence Validation | Ensures evidence refs are valid before storage | src/mcp-server.ts |
| Confidence Scoring | Assigns credibility scores to stored memories | src/cli.ts:85 |
Data Model
Memory Record Structure
Each memory in the ledger contains the following fields:
| Field | Type | Description |
|---|---|---|
id | string | Unique identifier (prefixed with memory:) |
claim | string | The substantive lesson or observation |
scope | string | Granularity level: repo, file, symbol, or session |
evidenceRefs | string[] | Validated references to source evidence |
confidence | number | Credibility score from 0.0 to 1.0 |
status | string | Current state: active, superseded, or stale |
supersedes | string? | ID of the memory this replaces (if any) |
Evidence Reference Formats
Valid evidence references that can be attached to memories:
| Format | Example | Purpose |
|---|---|---|
| File range | file:src/auth.ts:10-40 | Reference specific lines in a file |
| Symbol | symbol:src/auth.ts#AuthService:12 | Point to a specific code symbol |
| Context pack | pack:ctx_abc123 | Reference a prior evidence pack |
Sources: README.md:54-56
Evidence references must come from search results or context packs—arbitrary references are rejected. This prevents storing claims without verifiable backing.
Memory Scopes
The scope field determines the durability and applicability of a memory:
| Scope | Description | Persistence |
|---|---|---|
repo | Project-wide lessons applicable across sessions | Permanent |
file | File-specific knowledge | Permanent |
symbol | Symbol-level lessons | Permanent |
session | Ephemeral session-scoped learnings | Lost on session end |
The default scope is repo, reflecting the assumption that most valuable memories have project-wide relevance.
Sources: src/cli.ts:73
Writing Memories
CLI Usage
cxf memory add \
--claim "AuthService.validateToken() throws on expired tokens without catching" \
--evidence "file:src/auth.ts:45-67" \
--evidence "file:src/api/middleware.ts:12-20" \
--confidence 0.85 \
--scope repo
MCP Tool Usage
await server.callTool("write_lesson", {
claim: "The payment module requires initialization before use",
evidence_refs: ["file:src/payment/core.ts:10-30", "symbol:src/payment/core.ts#initialize:15"],
scope: "repo",
confidence: 0.9
});
Sources: src/mcp-server.ts:79-94
Validation Rules
Memories are subject to strict validation:
- Evidence required: At least one valid evidence reference must be provided
- Evidence must be fresh: References must originate from search results or context packs
- Claim must be substantive: Empty or trivial claims are rejected
- Confidence in valid range: Must be between 0.0 and 1.0
Searching Memories
Intent Classification
Contextful automatically classifies queries to determine when to search memories. The query classifier recognizes memory-related intents through keyword detection:
const memoryPattern = /\bmemory|memories|remember|remembers|lesson|lessons|learned|session|sessions\b/;
When matched, the classifier returns intent: "memory" and the search system automatically queries the memories FTS index.
Sources: src/search.ts:14-17
Query Expansion
Memory searches benefit from automatic term expansion. When a query mentions relevant concepts, additional search terms are added:
if (/\bmemory|memories|remember|remembers|lesson|lessons|learned|session|sessions\b/.test(lower)) {
additions.push("memory", "memories", "lesson", "lessons", "claim", "ledger", "evidence");
}
This ensures that queries like "what did we learn about auth" retrieve memory results even if those exact words don't appear in the stored claims.
Sources: src/search.ts:28-30
Search Results
Memory hits in search results include:
| Field | Description |
|---|---|
ref | Memory reference in format memory:<id> |
kind | Always "memory" for memory hits |
title | Display title including scope |
excerpt | Redacted claim text (secrets removed) |
evidence | Original evidence references |
status | Current memory status |
score | Relevance score |
Memory Lifecycle
stateDiagram-v2
[*] --> Active: write_lesson
Active --> Superseded: write_lesson with supersedes
Active --> Stale: Evidence becomes invalid
Superseded --> [*]
Stale --> [*]
Active --> [*]: DeletedStatus Transitions
Active → Default state for newly written memories. Active memories are returned in search results and can supersede other memories.
Superseded → When a newer, more accurate memory replaces an older one, the superseded memory retains its ID and evidence but is excluded from search results. The supersedes field links to the replaced memory.
Stale → Memories become stale when their evidence references point to files or symbols that have changed significantly since the memory was written. The reporting system tracks stale memories for review.
Sources: src/report.ts:54-58
Integration with Context Packs
The Memory Ledger integrates with Contextful's evidence pack system:
- Before writing: Search context or create a context pack to get evidence references
- Writing lessons: Use those evidence refs to anchor the memory claim
- Recalling: Later sessions query the ledger, retrieving cited memories
// During a session: create pack, identify lessons
const pack = await createContextPack({ query: "how is auth handled", budget: 2000 });
// Later session: recall what was learned
const result = await recallMemory({ query: "auth patterns", scope: "repo" });
This bidirectional relationship means memories enhance future context packs, and context packs provide evidence for future memories.
Reporting
The report command includes memory statistics:
cxf report --workspace . --format markdown
Output includes a "Stale Memories" section listing memories whose evidence references may no longer be valid:
## Stale Memories
- memory_abc123: AuthService.validateToken() behavior changed in v2
- memory_def456: payment module initialization order is now reversed
Sources: src/report.ts:54-58
Configuration Options
| Option | CLI Flag | Default | Description |
|---|---|---|---|
| Workspace | --workspace | process.cwd() | Path to workspace with memory database |
| Claim | --claim | required | The memory content |
| Evidence | --evidence | required | One or more evidence refs |
| Scope | --scope | repo | Memory scope level |
| Confidence | --confidence | 0.7 | Credibility score |
Privacy Considerations
The Memory Ledger is designed with privacy as a core principle:
- Local only: No data leaves the workspace
- No cloud sync: Memories remain on the local machine
- Evidence-linked: Claims cannot be stored without verifiable source
- Content redaction: Secrets are automatically redacted from stored claims using pattern matching for emails, API keys, and tokens
Sources: src/util.ts:12-18
Related MCP Tools
| Tool | Purpose |
|---|---|
recall_memory | Search the memory ledger |
write_lesson | Store a new evidence-backed memory |
context_pack | Generate evidence packs that can feed into memories |
Sources: README.md:35-40
Sources: README.md:54-56
Graph Traversal and Analysis
Related topics: Search Engine, SQLite Database Schema
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Search Engine, SQLite Database Schema
Graph Traversal and Analysis
Graph Traversal and Analysis is a core feature of Contextful that builds and queries a dependency graph from source code. This system tracks relationships between files, symbols, modules, and configuration nodes, enabling sophisticated impact analysis, change tracing, and dependency exploration.
Overview
Contextful extracts code relationships during indexing and stores them in a SQLite database as a traversable graph. This enables agents to answer questions like:
- "What depends on this module?"
- "What tests cover this file?"
- "How does this symbol connect to other parts of the codebase?"
Sources: src/extract.ts:68-95
Architecture
graph TD
A[Source Files] --> B[extractEdges]
B --> C[GraphEdge Records]
C --> D[SQLite Kernel DB]
E[CLI/MCP Query] --> F[searchContext]
F --> G[traceGraph]
G --> H[GraphPath Results]
F --> I[impactAnalysis]
I --> J[Impact Results]
F --> K[whyChanged]
K --> L[Git History + Evidence]Data Flow
- Extraction Phase: During workspace indexing,
extractEdges()parses source files to identify relationships Sources: src/extract.ts:52-95 - Storage Phase: Edge data is stored in the
edgestable within the kernel SQLite database Sources: src/search.ts:1-30 - Query Phase: CLI commands and MCP tools query the graph using traversal algorithms Sources: src/search.ts:180-220
Graph Data Model
Core Types
interface GraphEdge {
sourceType: "file" | "symbol";
sourceName: string;
targetType: "file" | "symbol" | "module" | "config";
targetName: string;
edgeType: EdgeType;
filePath: string;
line: number;
}
interface GraphPath {
edges: Array<{
sourceName: string;
sourceType: string;
edgeType: string;
targetName: string;
targetType: string;
}>;
totalHops: number;
}
interface GraphNode {
name: string;
type: "file" | "symbol" | "module" | "config";
path?: string;
kind?: string;
}
Sources: src/types.ts:45-70
Edge Types
| Edge Type | Description | Source Detection |
|---|---|---|
DEFINES | File defines a symbol | Function/class declarations |
IMPORTS | File imports a module | import, require, from statements |
CONFIGURES | File/config references a key | JSON keys, package.json fields |
TESTS | Test file tests imports | Auto-generated for test files |
Sources: src/extract.ts:75-100
Language-Specific Detection
The extraction layer supports multiple languages:
| Language | Import Patterns | Symbol Patterns |
|---|---|---|
| TypeScript/JavaScript | from "module", require("module") | export function/class/interface |
| Python | from module import | def, class |
| Go | "package" | func, type struct/interface |
| Rust | use module;, mod name; | fn, struct, enum, trait |
Sources: src/extract.ts:70-95
Graph Traversal API
traceGraph
Performs graph traversal starting from a source node, optionally filtering by edge types and limiting results.
export async function traceGraph(options: {
workspace?: string;
from: string;
to?: string;
edgeTypes?: string[];
limit?: number;
}): Promise<GraphPath[]>
#### Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
workspace | string | No | Workspace path (defaults to CWD) |
from | string | Yes | Starting node name |
to | string | No | Target node for path finding |
edgeTypes | string[] | No | Filter by specific edge types |
limit | number | No | Maximum paths to return (default: 10) |
Sources: src/search.ts:180-190
loadGraphPaths
Loads graph paths from the database for a set of file paths.
function loadGraphPaths(
db: Database,
paths: string[],
limit: number
): GraphPath[]
Sources: src/search.ts:60-80
Impact Analysis
Impact analysis identifies reverse dependencies—what depends on a given file or symbol—and finds relevant test coverage.
graph LR
A[Target File/Symbol] --> B[Find All Edges Pointing TO Target]
B --> C[Group by Source File]
C --> D[Identify Test Files]
D --> E[Return Impact Set]impactAnalysis Function
export async function impactAnalysis(options: {
workspace?: string;
target: string;
limit?: number;
}): Promise<ImpactResult>
#### Impact Result Structure
| Field | Type | Description |
|---|---|---|
target | string | The analyzed symbol or file |
dependents | DependentInfo[] | Files/symbols that depend on target |
tests | SearchHit[] | Related test files |
interface DependentInfo {
path: string;
type: string;
imports: string[];
}
interface ImpactResult {
target: string;
dependents: DependentInfo[];
tests: SearchHit[];
}
Sources: src/search.ts:130-175
Test Detection Logic
Test files are identified by path patterns and edges with TESTS type:
const testPaths = paths.filter(
(path) => path.edgeType === "TESTS" ||
/(^|\/)(tests?|__tests__)\/|(\.|-)(test|spec)\./.test(path.filePath)
);
Sources: src/search.ts:165-170
Change Analysis
whyChanged
Combines current code evidence with git history to explain why a file or symbol may have changed.
export async function whyChanged(options: {
workspace?: string;
target: string;
limit?: number
}): Promise<{
target: string;
currentEvidence: SearchHit[];
commits: Array<{
hash: string;
subject: string;
date?: string;
files: string[];
}>;
}>
#### Workflow
graph TD
A[whyChanged] --> B[searchContext for target]
B --> C[Extract file paths from hits]
C --> D[readGitHistory with file paths]
D --> E[Combine evidence + commits]
E --> F[Return structured result]Sources: src/search.ts:200-230
Git History Integration
The system reads git history for affected files:
function readGitHistory(
workspace: string,
filePaths: string[],
limit: number
): Array<{
hash: string;
subject: string;
date?: string;
files: string[];
}>
Sources: src/search.ts:85-100
CLI Commands
trace Command
cxf trace --from <symbol_or_file> [--to <target>] [--edge-types <types>] [--limit <count>]
#### Options
| Option | Type | Default | Description |
|---|---|---|---|
--from | string | Required | Starting node |
--to | string | - | Target node |
--edge-types | string | all | Comma-separated edge types |
--limit | number | 10 | Maximum paths |
--workspace | string | CWD | Workspace path |
Sources: src/cli.ts:45-60
report Command
Generates a comprehensive context report including graph statistics:
cxf report --workspace <path> --format markdown|json|html
#### Report Includes
- Index status with graph node/edge counts
- Top queries by intent type
- Stale memory detection
- Recent evidence packs
Sources: src/cli.ts:70-85
MCP Server Tools
Contextful exposes graph traversal as MCP tools for integration with AI coding assistants.
trace_path
{
"name": "trace_path",
"description": "Trace graph relationships between files, symbols, modules, and config nodes.",
"inputSchema": {
"from": "string",
"to": "string (optional)",
"edge_types": ["string"] (optional),
"limit": "number (optional)"
}
}
Sources: src/mcp-server.ts:45-55
impact_analysis
{
"name": "impact_analysis",
"description": "Find likely dependents and tests for a file, symbol, or module.",
"inputSchema": {
"symbol_or_file": "string",
"limit": "number (optional)"
}
}
Sources: src/mcp-server.ts:56-65
why_changed
{
"name": "why_changed",
"description": "Explain why a file or symbol may have changed by combining current evidence with git history.",
"inputSchema": {
"symbol_or_file": "string",
"limit": "number (optional)"
}
}
Sources: src/mcp-server.ts:66-75
Usage Examples
Direct CLI Usage
# Trace dependencies of auth module
cxf trace --from src/auth.ts --edge-types IMPORTS
# Find what tests cover a file
cxf impact --target src/parser.ts
# Get change history for a symbol
cxf why --target AuthService
MCP Integration
{
"mcpServers": {
"contextful": {
"command": "npx",
"args": ["-y", "@inferensys/contextful", "server"]
}
}
}
// In an MCP client
const result = await client.callTool("trace_path", {
from: "src/auth.ts",
to: "src/database.ts",
edgeTypes: ["IMPORTS", "DEFINES"]
});
Query Intent Classification
Graph queries are automatically classified to route to appropriate traversal strategies:
| Intent | Keywords | Graph Relevance |
|---|---|---|
architectural | architecture, flow, path, connects, calls | High priority |
impact | impact, affected, depends, blast radius | Direct edge query |
historical | why, changed, history, regression | Graph + git history |
exact | Symbol names, file paths | Symbol-level traversal |
Sources: src/search.ts:115-130
Limitations and Design Decisions
Privacy Guarantees
- All processing is local-only
- No external embedding APIs used
- No source code upload
- No file editing capabilities
Sources: README.md:45-50
v1 Scope Boundaries
- Broken JSON during indexing produces warnings but continues processing
- Syntax diagnostics are intentionally out of scope
- Git history is read-only
Sources: src/extract.ts:120-125
Summary
The Graph Traversal and Analysis system in Contextful provides:
- Automatic Relationship Extraction - Builds a dependency graph during indexing
- Multiple Query Entry Points - CLI commands and MCP tools
- Path Finding - Trace connections between any two nodes
- Impact Analysis - Identify dependents and test coverage
- Change Attribution - Combine current state with git history
This enables AI coding assistants to answer sophisticated questions about code relationships without requiring manual documentation or extensive file reading.
Sources: src/extract.ts:68-95
SQLite Database Schema
Related topics: Workspace Indexing System, Search Engine
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Workspace Indexing System, Search Engine
SQLite Database Schema
Overview
Contextful uses SQLite as its primary storage engine for indexing codebase artifacts. The database schema is designed to support full-text search, symbol indexing, dependency graph traversal, and evidence pack generation for AI-assisted queries. All operations are managed through better-sqlite3 for synchronous, high-performance access.
Sources: src/db.ts:1-50
Schema Tables
Primary Storage Tables
#### chunks
Stores indexed code and documentation segments extracted from source files. Each chunk represents a logical unit of content bounded by language-specific rules (functions, classes, headings, etc.).
| Column | Type | Description |
|---|---|---|
ref | TEXT | Unique reference identifier (format: file:path:start-end) |
file_path | TEXT | Relative path to the source file |
start_line | INTEGER | Starting line number (1-indexed) |
end_line | INTEGER | Ending line number |
kind | TEXT | Chunk classification: code, doc, file |
title | TEXT | Display title for the chunk |
text | TEXT | Full content of the chunk |
token_estimate | INTEGER | Estimated token count using GPT tokenizer |
Sources: src/db.ts:23-36
#### symbols
Captures programming constructs (functions, classes, interfaces, types) extracted from source files.
| Column | Type | Description |
|---|---|---|
ref | TEXT | Unique symbol reference |
name | TEXT | Symbol name |
kind | TEXT | Symbol type: function, class, interface, type, struct, enum, trait, impl |
file_path | TEXT | Source file path |
line | INTEGER | Line number where symbol is defined |
signature | TEXT | First 160 characters of symbol declaration |
exported | INTEGER | Boolean flag (1 = exported, 0 = local) |
Sources: src/db.ts:47-60
#### edges
Represents relationships between code entities, including imports, module dependencies, and configuration references.
| Column | Type | Description |
|---|---|---|
source_name | TEXT | Name of the importing/configuring entity |
target_name | TEXT | Name or path of the imported/dependency target |
edge_type | TEXT | Relationship type: IMPORTS, CONFIGURES |
file_path | TEXT | File where the relationship is defined |
line | INTEGER | Line number of the relationship definition |
Sources: src/db.ts:38-45
Full-Text Search Index
#### chunks_fts
Virtual FTS5 table providing fast full-text search across all indexed content. Mirrors core chunk data for BM25-ranked retrieval.
| Column | Type | Description |
|---|---|---|
ref | TEXT | Chunk reference |
path | TEXT | File path for filtering |
title | TEXT | Searchable title field |
text | TEXT | Full searchable content |
Sources: src/db.ts:37-42
The FTS table is queried using BM25 ranking in search operations:
SELECT ref, path, title, text, bm25(chunks_fts) AS rank
FROM chunks_fts WHERE chunks_fts MATCH ?
Sources: src/search.ts:45-47
Graph and Metadata Tables
#### nodes
Represents graph vertices for dependency analysis and traversal operations.
| Column | Type | Description |
|---|---|---|
id | INTEGER | Auto-incrementing primary key |
ref | TEXT | Node reference |
kind | TEXT | Node classification: file, symbol, chunk, module, config |
name | TEXT | Display name |
file_path | TEXT | Associated file path (nullable) |
Sources: src/db.ts:12-22
#### files
Stores metadata about indexed source files.
| Column | Type | Description |
|---|---|---|
absolute_path | TEXT | Full absolute file path |
language | TEXT | Detected programming language |
hash | TEXT | SHA-based content hash for change detection |
size | TEXT | File size in bytes |
Sources: src/db.ts:13-17
#### fingerprints
Stores content fingerprints for deduplication and incremental indexing.
| Column | Type | Description |
|---|---|---|
ref | TEXT | Reference to the content chunk |
kind | TEXT | Content type |
fingerprint | TEXT | Hash of the content |
#### evidence_packs
Persists generated evidence packs for audit and replay.
| Column | Type | Description |
|---|---|---|
id | TEXT | Unique pack identifier |
query | TEXT | Original search query |
token_estimate | INTEGER | Total token count |
json | TEXT | Serialized pack data |
#### query_log
Records search history for analysis and debugging.
| Column | Type | Description |
|---|---|---|
query | TEXT | Search query text |
intent | TEXT | Classified search intent |
timestamp | TEXT | ISO timestamp |
Sources: src/db.ts:1-10
Data Flow Architecture
graph TD
A[Source Files] --> B[extractSymbols]
A --> C[extractEdges]
A --> D[extractChunks]
B --> E[symbols table]
C --> F[edges table]
D --> G[chunks table]
D --> H[chunks_fts index]
G --> I[Full-Text Search]
E --> J[Symbol Lookup]
F --> K[Graph Traversal]
I --> L[searchContext]
J --> L
K --> L
L --> M[Evidence Pack]
M --> N[evidence_packs]Sources: src/extract.ts:1-150
Supported Symbol Kinds
The indexer extracts and classifies symbols based on language-specific patterns:
| Language | Supported Kinds |
|---|---|
| TypeScript/JavaScript | function, class, interface, type |
| Python | function, class |
| Go | function, struct, interface |
| Rust | function, struct, enum, trait, impl |
Sources: src/extract.ts:30-60
Supported Edge Types
| Edge Type | Description | Example |
|---|---|---|
IMPORTS | Module/dependency import | import { foo } from './bar' |
CONFIGURES | Configuration key reference | "dependencies": { ... } in package.json |
The CONFIGURES edge type is specifically generated for package.json dependency sections and JSON configuration keys.
Sources: src/extract.ts:70-120
Query Classification and Intent
The search system classifies queries into intent categories that influence result ranking:
| Intent | Trigger Keywords | Purpose |
|---|---|---|
symbol | Class/function names, exact identifiers | Find symbol definitions |
code | Code-related terms | Locate implementation |
memory | memory, lessons, session | Search evidence-backed memory |
impact | depends, affected, blast radius | Reverse dependency analysis |
historical | why, changed, history, commit | Git history queries |
architectural | architecture, flow, imports | Dependency tracing |
docs | docs, documentation, readme | Documentation lookup |
exact | File paths, line refs, symbols | Precise file/line access |
vague | Default fallback | Broad search |
Sources: src/search.ts:15-30
Token Estimation
Token counts are estimated using a heuristic approximation:
export function estimateTokens(text: string): number {
return Math.ceil(text.length / 4);
}
This provides a rough approximation where 1 token ≈ 4 characters, suitable for budget management in evidence pack generation.
Sources: src/util.ts:1-10
Key Database Operations
Chunk Insertion
db.prepare(`
INSERT INTO chunks (ref, file_path, start_line, end_line, kind, title, text, token_estimate)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
`).run(chunk.ref, chunk.filePath, chunk.startLine, chunk.endLine, chunk.kind, chunk.title, chunk.text, chunk.tokenEstimate);
同步写入 chunks 表和 chunks_fts FTS 索引。
Symbol Loading
db.prepare(`SELECT ref, name, kind, file_path, line, signature, exported
FROM symbols WHERE file_path IN (${paths.map(() => "?").join(",")})`)
.all(...paths)
Sources: src/db.ts:23-42 Sources: src/search.ts:180-195
Schema Version and Metadata
The database stores schema version and workspace metadata:
| Key | Description |
|---|---|
schema_version | Current schema version number |
workspace | Workspace root path |
indexed_at | Last indexing timestamp |
parser_backend | Parser backend description |
warnings | Last 50 indexing warnings |
Sources: src/indexer.ts:80-90
Conclusion
The SQLite schema in Contextful provides a normalized, queryable representation of source code structure and content. The dual-table approach for chunks (storage + FTS index) enables both efficient storage and fast full-text retrieval. The edges and symbols tables together support graph traversal for dependency analysis, while the evidence pack system enables persistent, ranked context generation for AI queries.
Sources: src/db.ts:1-50
Workspace Indexing System
Related topics: SQLite Database Schema, Search Engine
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: SQLite Database Schema, Search Engine
Workspace Indexing System
Overview
The Workspace Indexing System is the core indexing engine of Contextful. It scans, parses, and stores representations of source code files from a workspace into a local SQLite database, enabling semantic search, dependency graph traversal, and evidence-backed context retrieval.
Primary responsibilities:
| Responsibility | Description |
|---|---|
| File Discovery | Recursively traverse workspace directories, filtering by language and ignore rules |
| Symbol Extraction | Parse and catalog functions, classes, interfaces, types, enums, traits |
| Edge Extraction | Track import/export relationships between modules and dependencies |
| Content Chunking | Split large files into manageable, line-numbered chunks for retrieval |
| Watch Mode | Monitor file system changes and incrementally re-index on modifications |
Sources: src/cli.ts:1-20
Architecture
graph TD
A[Workspace Directory] --> B[File Discovery]
B --> C[Language Detection]
C --> D[Content Extraction]
D --> E[Symbol Extraction]
D --> F[Edge Extraction]
D --> G[Chunk Generation]
E --> H[SQLite DB]
F --> H
G --> H
I[Search/Query] --> H
J[Watch Mode] --> BThe system is built around a SQLite database that stores three core entities: symbols, edges, and chunks. The indexer processes files in a single pass, extracting all three data types simultaneously to minimize I/O overhead.
Sources: src/extract.ts:1-50
Supported Languages
The indexer natively supports symbol and edge extraction for the following languages:
| Language | Symbol Patterns | Import Patterns |
|---|---|---|
| TypeScript / JavaScript | function, class, interface, type, const arrow/function | import from, require() |
| Python | def, class | from ... import, import |
| Go | func, type struct/interface | "..." (quoted imports) |
| Rust | fn, struct, enum, trait, impl | use, mod |
| Markdown | Headings (#{1,6}) | N/A |
| JSON | Config keys ("key":) | N/A |
Sources: src/extract.ts:15-45
Indexing Process
Phase 1: File Discovery
The indexer recursively scans the workspace directory, applying language-specific filtering and Gitignore-style ignore rules. Binary files are detected and skipped using a simple null-byte heuristic.
export function isLikelyBinary(buffer: Buffer): boolean {
const sample = buffer.subarray(0, Math.min(buffer.length, 4096));
return sample.includes(0);
}
Sources: src/util.ts:20-22
Phase 2: Symbol Extraction
Symbols are extracted using language-specific regular expression patterns. Each symbol record includes:
| Field | Type | Description |
|---|---|---|
name | string | Symbol identifier |
kind | string | Category: function, class, interface, type, struct, enum, trait, impl |
line | number | Declaration line number |
signature | string | First 160 characters of the declaration line |
exported | boolean | Whether the symbol is exported |
const push = (name: string, kind: string, exported = false) =>
symbols.push({ name, kind, line: lineNumber, signature: excerpt(line, 160), exported });
Sources: src/extract.ts:5-7
For TypeScript and JavaScript, the extractor captures export modifiers:
matchPush(line, /^\s*(export\s+)?(?:async\s+)?function\s+([A-Za-z_$][\w$]*)/, push, "function");
matchPush(line, /^\s*(export\s+)?class\s+([A-Za-z_$][\w$]*)/, push, "class");
Sources: src/extract.ts:12-15
Phase 3: Edge Extraction
Edges represent dependency relationships between modules. The extractor identifies:
- IMPORTS: Direct import statements for each language
- CONFIGURES: Dependencies declared in configuration files (package.json, Cargo.toml, etc.)
if (language === "typescript" || language === "javascript") {
for (const match of line.matchAll(/(?:from\s+|import\s*)["']([^"']+)["']/g))
addImport(match[1]);
for (const match of line.matchAll(/require\(["']([^"']+)["']\)/g))
addImport(match[1]);
}
Sources: src/extract.ts:67-72
For package.json, dependencies and scripts are indexed as CONFIGURES edges:
for (const section of ["dependencies", "devDependencies", "peerDependencies", "scripts"]) {
const values = parsed[section];
if (!values || typeof values !== "object") continue;
for (const key of Object.keys(values)) {
edges.push({ targetName: `${section}:${key}`, targetType: "config", edgeType: "CONFIGURES", line: 1 });
}
}
Sources: src/extract.ts:105-114
Phase 4: Chunk Generation
Large files are split into overlapping chunks to enable granular retrieval. The system uses a sliding window approach with overlap between consecutive chunks:
graph LR
A[File Lines 1-200] --> B[Chunk 1: 1-80]
A --> C[Chunk 2: 60-140]
A --> D[Chunk 3: 120-200]
B --> E[Token Estimate]
C --> E
D --> EEach chunk includes:
| Field | Description |
|---|---|
ref | Unique reference string (file:path:start-end) |
filePath | Relative path to source file |
startLine | Starting line number |
endLine | Ending line number |
kind | Chunk type: code, doc, file |
title | Human-readable title |
tokenEstimate | Estimated token count |
Sources: src/extract.ts:145-160
Phase 5: Markdown Document Chunking
Markdown files receive special treatment. Instead of fixed-size chunks, the indexer uses headings as natural section boundaries:
lines.forEach((line, index) => {
const match = line.match(/^(#{1,6})\s+(.+)$/);
if (match) headings.push({ title: match[2].trim(), line: index + 1 });
});
return headings.map((heading, index) => {
const next = headings[index + 1];
const endLine = next ? next.line - 1 : lines.length;
// ... create chunk for section
});
Sources: src/extract.ts:174-185
Watch Mode
The indexer supports continuous monitoring via file system watchers:
export async function watchWorkspace(workspace: string, onIndex: (result: IndexResult) => void): Promise<void> {
const resolved = path.resolve(workspace);
onIndex(await indexWorkspace({ workspace: resolved }));
let timer: NodeJS.Timeout | undefined;
fs.watch(resolved, { recursive: true }, () => {
if (timer) clearTimeout(timer);
timer = setTimeout(async () => {
onIndex(await indexWorkspace({ workspace: resolved }));
}, 500);
});
}
Sources: src/indexer.ts:80-91
Key characteristics:
- Debounces file change events with a 500ms delay to batch rapid successive changes
- Re-runs full indexing on each trigger
- Outputs JSON results to stdout for consumption by other processes
CLI Commands
The indexing system exposes three primary CLI commands:
| Command | Description | ||
|---|---|---|---|
cxf index --workspace <path> [--watch] | Initial or incremental indexing of a workspace | ||
cxf daemon --workspace <path> | Run as a long-lived daemon that outputs index results on file changes | ||
| `cxf report --workspace <path> --format markdown\ | json\ | html` | Generate an index status report |
# Index a workspace
npx @inferensys/contextful index --workspace .
# Watch for changes and print results
npx @inferensys/contextful daemon --workspace .
Sources: src/cli.ts:22-35
Search Integration
The indexing system powers Contextful's search capabilities. After indexing, users can query the database using natural language:
export async function searchContext(options: SearchOptions): Promise<{ intent: SearchIntent; hits: SearchHit[] }> {
const workspace = resolveWorkspace(options.options.workspace);
await ensureIndexed(workspace);
const intent = classifyQuery(options.query);
// ... perform FTS and semantic search
}
Sources: src/search.ts:45-55
Query intents are automatically classified to optimize search behavior:
| Intent | Trigger Keywords | Description |
|---|---|---|
code | function names, variable names | Code and implementation search |
exact | Backticks, quotes, #, file paths | Literal symbol/identifier lookup |
impact | impact, affected, depends, blast radius | Dependency and change analysis |
historical | why, changed, commit, history | Git history and regression tracking |
architectural | architecture, flow, trace, connects | Dependency graph traversal |
docs | resource, documentation, guide, how to | Documentation and README search |
memory | remember, session, lesson, learned | Agent memory recall |
Sources: src/search.ts:5-18
Token Estimation
Every chunk and evidence pack includes a token estimate for budget management:
export function packTokenCount(text: string): number {
return estimateTokens(text);
}
The system uses this estimate to enforce budget limits when building context packs for LLM consumption, ensuring responses stay within token budgets.
Sources: src/report.ts:50-52
Data Models
Symbol Record
interface SymbolRecord {
ref: string;
name: string;
kind: "function" | "class" | "interface" | "type" | "struct" | "enum" | "trait" | "impl";
filePath: string;
line: number;
signature: string;
exported: boolean;
}
Edge Record
interface RawEdge {
targetName: string;
targetType: "module" | "config" | "symbol";
edgeType: "IMPORTS" | "CONFIGURES" | "DEFINES";
line: number;
}
Chunk Record
interface ChunkRecord {
ref: string;
filePath: string;
startLine: number;
endLine: number;
kind: "code" | "doc" | "file";
title: string;
text: string;
tokenEstimate: number;
}
Extension Points
Adding New Language Support
To add support for a new language:
- Add language detection in the file scanner
- Implement symbol extraction patterns in
extractSymbols() - Implement edge extraction patterns in
extractEdges() - Update the chunking logic if special handling is needed
Example pattern structure:
} else if (language === "newlang") {
matchPush(line, /^\s*(pub\s+)?fn\s+([A-Za-z_][\w]*)/, push, "function");
const use = line.match(/^\s*use\s+([^;]+);/);
if (use) addImport(use[1].trim());
}
Sources: src/extract.ts:35-44
Sources: src/cli.ts:1-20
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
Users may get misleading failures or incomplete behavior unless configuration is checked carefully.
The project should not be treated as fully validated until this signal is reviewed.
Users cannot judge support quality until recent activity, releases, and issue response are checked.
The project may affect permissions, credentials, data exposure, or host boundaries.
Doramagic Pitfall Log
Doramagic extracted 7 source-linked risk signals. Review them before installing or handing real data to the project.
1. Configuration risk: Configuration risk needs validation
- Severity: medium
- Finding: Configuration risk is backed by a source signal: Configuration risk needs validation. Treat it as a review item until the current version is checked.
- User impact: Users may get misleading failures or incomplete behavior unless configuration is checked carefully.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: capability.host_targets | github_repo:1240001007 | https://github.com/Inferensys/contextful | host_targets=claude, claude_code
2. Capability assumption: README/documentation is current enough for a first validation pass.
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: The project should not be treated as fully validated until this signal is reviewed.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: capability.assumptions | github_repo:1240001007 | https://github.com/Inferensys/contextful | README/documentation is current enough for a first validation pass.
3. Maintenance risk: Maintainer activity is unknown
- Severity: medium
- Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | github_repo:1240001007 | https://github.com/Inferensys/contextful | last_activity_observed missing
4. Security or permission risk: no_demo
- Severity: medium
- Finding: no_demo
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: downstream_validation.risk_items | github_repo:1240001007 | https://github.com/Inferensys/contextful | no_demo; severity=medium
5. Security or permission risk: no_demo
- Severity: medium
- Finding: no_demo
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: risks.scoring_risks | github_repo:1240001007 | https://github.com/Inferensys/contextful | no_demo; severity=medium
6. Maintenance risk: issue_or_pr_quality=unknown
- Severity: low
- Finding: issue_or_pr_quality=unknown。
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | github_repo:1240001007 | https://github.com/Inferensys/contextful | issue_or_pr_quality=unknown
7. Maintenance risk: release_recency=unknown
- Severity: low
- Finding: release_recency=unknown。
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | github_repo:1240001007 | https://github.com/Inferensys/contextful | release_recency=unknown
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using contextful with real data or production workflows.
- Configuration risk needs validation - GitHub / issue
Source: Project Pack community evidence and pitfall evidence