Doramagic Project Pack · Human Manual
thought-mcp
| Command | Description | |---------|-------------| | thought init | Create database file + config + CLAUDE.md | | thought recall <query | Semantic recall with embeddings | | thought ask <...
Introduction to THOUGHT
Related topics: Quickstart Guide, Installation and Setup, System Architecture
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Quickstart Guide, Installation and Setup, System Architecture
Introduction to THOUGHT
THOUGHT is a local AI memory tool designed to help developers, researchers, writers, and investigators maintain persistent, queryable knowledge graphs of their work. It combines graph database technology with natural language processing to create a bi-temporal knowledge base that tracks information across time—answering questions like "what was true on date X" and "what did the system know on date X." Sources: README.md
What is THOUGHT?
THOUGHT operates as a self-hosted memory layer that runs entirely on your local machine. Unlike cloud-based AI memory solutions, THOUGHT stores everything in a local SQLite database, giving you full control over your data while still providing powerful querying capabilities through natural language or Cypher graph queries. Sources: src/thought/cli.py:1-50
The core philosophy is to treat memory as a first-class citizen in the development workflow—something that persists across sessions, understands context, and can be queried like a real database rather than a simple key-value store.
Core Architecture
THOUGHT's architecture consists of several interconnected layers that work together to provide a complete memory solution.
graph TD
A[CLI / MCP Server] --> B[Query Layer]
B --> C[Graph Layer]
B --> D[Code Layer]
C --> E[Storage Backend]
D --> E
E --> F[SQLite Database]
B --> G[LLM Providers]
G --> H[Ollama / LM Studio / OpenAI]Storage Layer
The storage layer uses SQLite with a carefully designed schema that supports bi-temporal modeling. Every entity and edge in the knowledge graph has timestamps tracking when facts became valid and when they were learned. Sources: src/thought/storage/sqlite/backend.py:1-100
| Component | Purpose |
|---|---|
SQLiteBackend | Core database operations with upsert, query, and embedding storage |
| WAL Mode | Write-Ahead Logging for crash recovery and concurrent reads |
| Migration System | Tracks applied migrations in applied_migrations table |
| Bi-temporal Columns | valid_from, valid_until, learned_at, unlearned_at |
Query Layer
The query layer provides multiple interfaces for accessing your memory:
- Natural Language: Ask questions in plain English, translated to Cypher
- Code Queries: Find callers, callees, and impact sets
- Recall: Semantic search using embeddings
- Cypher Direct: Execute graph queries directly Sources: src/thought/query/ask.py:1-50
Graph Layer
The graph layer provides the core graph operations that power all THOUGHT functionality. It handles entity and edge management with support for scopes (shared/private) and owner-based access control. Sources: src/thought/layers/graph.py
Entity Model
THOUGHT uses a flexible entity model that can represent code elements, prose content, legal documents, and research claims.
classDiagram
class Entity {
+str id
+str type
+str name
+str canonical_name
+ScopeName scope
+Tier tier
+float importance
+datetime valid_from
+datetime valid_until
+datetime learned_at
+dict~str, object~ attrs
}
class Edge {
+str id
+str source_id
+str target_id
+str relation_type
}
Entity "1" --> "*" Edge : source
Entity "1" --> "*" Edge : targetSources: src/thought/models.py:50-100
Entity Attributes
| Field | Type | Description |
|---|---|---|
id | str | Unique identifier |
type | str | Entity type (function, class, module, claim, etc.) |
name | str | Human-readable name |
canonical_name | str | Fully qualified name for disambiguation |
scope | ScopeName | "shared" or "private" |
owner_id | str | Owner for private entities |
tier | Tier | "hot", "warm", or "cold" |
valid_from | datetime | When this fact became true |
valid_until | datetime | When this fact stopped being true (null = current) |
learned_at | datetime | When THOUGHT learned this fact |
attrs | dict | Additional type-specific metadata |
Edge Relations
Edges represent relationships between entities with the following relation types:
| Relation Type | Description |
|---|---|
CALLS | Function/method invocation |
INHERITS_FROM | Class inheritance |
DEFINES | Container defines member |
IMPORTS | Module import statement |
CONTRADICTS | Logical contradiction between facts |
CITES | Source citation relationship |
Audience Verticals
THOUGHT is designed to serve multiple audiences, each with specialized commands and entity taxonomies optimized for their use case. Sources: src/thought/demo.py:1-80
graph LR
A[THOUGHT] --> B[Code Developers]
A --> C[Writers]
A --> D[Legal Investigators]
A --> E[Researchers]
B --> B1[thought scan]
B --> B2[thought impact]
B --> B3[thought callers]
C --> C1[thought ingest-prose]
C --> C2[thought timeline]
C --> C3[contradiction-check]
D --> D1[thought ingest-legal]
D --> D2[unique_predicates]
D --> D3[contradiction-graph]
E --> E1[thought ingest-claim]
E --> E2[citation-analysis]
E --> E3[reliability-filter]Code Developers
The code vertical provides tools for understanding, navigating, and analyzing source code:
thought scan: Incremental code scanning with change detectionthought impact <name>: Transitive impact set—what's affected if I change this?thought callers <name>: Direct callers ranked by Personalized PageRankthought recall: Semantic search across code by intent Sources: src/thought/layers/code.py:1-50
Writers
The writing vertical supports fiction and academic prose:
- Ingest chapter/section facts about characters
- Detect contradictions via the bi-temporal model
- Query chronological mentions across documents
- Time-travel
as_ofrecall for historical consistency
Legal Investigators
The legal vertical is designed for investigation workflows:
thought ingest-legal: Ingest witness statements with unique predicatesthought contradiction-graph: Trigger CONTRADICTS edges between testimonies- Query the contradiction graph for investigation leads
Researchers
The research vertical supports academic workflows:
thought ingest-claim: Ingest claim/source pairs- Cypher queries to find uncited claims
- Most-cited source identification
- Citation reliability filtering
CLI Commands Overview
| Command | Description |
|---|---|
thought init | Create database file + config + CLAUDE.md |
thought recall <query> | Semantic recall with embeddings |
thought ask <question> | Natural language query → Cypher → results |
thought scan <repo> | Incremental code scan with change detection |
thought callers <name> | Find direct callers ranked by PageRank |
thought impact <name> | Transitive impact set |
thought db size | Disk usage + entity/edge counts |
thought db flush | Wipe the knowledge base |
thought db backup <file> | SQLite online-backup snapshot |
thought db load <file> | Load backup file |
thought hook install | Install Claude Code hooks |
thought diff --from <sha1> --to <sha2> | Entity diff between commits |
Sources: src/thought/cli.py:50-150
Database Lifecycle Management
THOUGHT provides comprehensive database management commands under thought db:
Backup and Restore
graph LR
A[Production DB] -->|thought db backup| B[backup.db]
B -->|thought db load| C[Production DB]
B -->|thought db inspect| D[Inspection Report]The backup system uses SQLite's online backup API, ensuring consistent snapshots even during active writes. Date filters can produce clean, self-contained subset files. Sources: src/thought/storage/sqlite/backend.py:100-200
Flush Operations
Flush commands support date-bounded deletion:
--before X: Delete facts valid before date X--since X: Delete facts learned since date X--time-axis valid|learned|created: Choose which time axis to filter
All destructive operations automatically back up to <db>.bak.<timestamp> before proceeding.
Git History Integration
THOUGHT can ingest git repositories with two modes:
| Mode | Behavior | Use Case |
|---|---|---|
snapshot (default) | Ingest HEAD only, stamp with HEAD SHA | Fast code analysis |
full | Walk every commit, stamp with commit SHA | Bi-temporal historical queries |
The GitWalker class shells out to git commands rather than using native libraries, avoiding C extension dependencies while maintaining cross-platform compatibility. Sources: src/thought/ingest/code/git_walker.py:1-50
graph TD
A[thought ingest-git] --> B{Snapshot Mode?}
B -->|Yes| C[Ingest HEAD only]
B -->|No| D[Walk all commits]
C --> E[Stamp with HEAD SHA]
D --> F[Stamp each entity with commit SHA]
E --> G[Enable as_of queries]
F --> GBi-temporal Model
THOUGHT's bi-temporal model tracks two independent timelines for every fact:
| Time Axis | Description | Question Answered |
|---|---|---|
| Valid Time | When a fact was true in reality | "What was true on date X?" |
| Learned Time | When THOUGHT learned the fact | "What did the system know on date X?" |
This distinction enables sophisticated queries like:
MATCH (e:Entity)
WHERE e.valid_from <= date('2024-01-01')
AND (e.valid_until IS NULL OR e.valid_until > date('2024-01-01'))
RETURN e
Contradictions surface as CONTRADICTS edges—they're treated as data rather than warnings, allowing you to query them directly. Sources: src/thought/cli.py:1-50
LLM Provider Integration
THOUGHT supports multiple LLM providers for natural language processing:
| Provider | Features |
|---|---|
| Ollama | Native /api/embed (batched), OpenAI-compatible fallback |
| LM Studio | OpenAI-compatible API |
| Any OpenAI-compatible server | Standard embedding endpoints |
The embedder selection defaults to auto, which probes for sentence_transformers and falls back to a deterministic embedder when the optional dependency is unavailable. Sources: src/thought/storage/sqlite/backend.py:200-300
Code Extraction Support
THOUGHT can parse and extract entities from multiple programming languages:
| Language | Extractor | Key Features |
|---|---|---|
| Python | python_extractor.py | AST-based import tracking, class/function detection |
| TypeScript | typescript_extractor.py | Tree-sitter parsing, heritage analysis |
| Rust | rust_extractor.py | Module system, impl block handling |
| PHP | php_extractor.py | Namespace handling, method visibility |
All extractors produce consistent CodeEntity and CodeEdge objects that integrate with the unified graph model. Sources: src/thought/ingest/code/python_extractor.py:1-50
Getting Started
Initialization
thought init --db-path .thought/thought.db --embedder auto
This creates:
- The SQLite database file
- A
thought.tomlconfiguration file - A
CLAUDE.mdfile for MCP client integration
Quick Start Commands
# Ingest a git repository
thought ingest-git ./my-project --mode snapshot
# Recall something semantically
thought recall "authentication middleware"
# Ask a natural language question
thought ask "what calls the authenticate_user function?"
# Find impact of changing a function
thought impact MyClass.my_method
Configuration
THOUGHT uses a thought.toml file for configuration:
| Section | Option | Default | Description |
|---|---|---|---|
database | path | .thought/thought.db | Database file path |
llm | provider | auto | LLM provider selection |
embedder | model | auto | Embedding model |
scopes | default | shared | Default scope for new entities |
Configuration can be overridden via CLI flags or environment variables.
Sources: src/thought/models.py:50-100
Quickstart Guide
Related topics: Introduction to THOUGHT, Installation and Setup
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Introduction to THOUGHT, Installation and Setup
Quickstart Guide
Overview
THOUGHT is a local-AI memory tool designed to manage knowledge bases, run on local models, write graph queries, and query in natural language. It provides a comprehensive CLI for ingesting information, recalling facts, and performing code analysis with graph-based relationships.
Sources: CHANGELOG.md
Architecture Overview
graph TD
subgraph "THOUGHT Core"
CLI[CLI Interface]
DB[(SQLite Database)]
EMB[Embedder Layer]
GRAPH[Graph Layer]
end
subgraph "Ingestion Sources"
CODE[Code Ingest]
PROSE[Prose Ingest]
LEGAL[Legal Ingest]
end
subgraph "Query Interface"
RECALL[Recall Command]
REPL[Interactive REPL]
MCP[MCP Server]
end
CLI --> DB
CLI --> EMB
EMB --> DB
CODE --> CLI
PROSE --> CLI
LEGAL --> CLI
RECALL --> GRAPH
REPL --> GRAPH
MCP --> GRAPH
GRAPH --> DBSources: src/thought/cli.py
Installation and Initialization
Initial Setup
Run the init command to create the database, configuration file, and CLAUDE.md helper:
thought init
The init command accepts several options:
| Option | Default | Description |
|---|---|---|
--config | thought.toml | Path to configuration file |
--db-path | .thought/thought.db | SQLite database path |
--embedder | auto | Embedder type: auto, sentence-transformers, or deterministic |
--write-claude-md | true | Drop a CLAUDE.md for MCP clients |
--quick | false | Skip first-run embedder warmup |
Sources: src/thought/cli.py:57-78
Configuration File
The init command creates a thought.toml configuration file with the following structure:
[database]
path = ".thought/thought.db"
[embedder]
type = "auto" # or "ollama", "lm_studio", "openai_compatible"
[llm]
provider = "auto"
Core Commands
Ingest Commands
THOUGHT supports multiple ingestion modes:
| Command | Purpose |
|---|---|
thought ingest TEXT | One-shot remember from command line |
thought ingest --file PATH | Ingest a single file |
thought ingest --glob PAT | Bulk-ingest matching files |
thought ingest --stdin | Bulk-ingest one line-per-item from stdin |
Sources: src/thought/cli.py:30-42
Code Ingestion
The code ingest pipeline extracts entities and relationships from source files:
thought ingest --file src/main.py
thought ingest --glob "**/*.py"
The code extractor produces:
- Entities: modules, functions, classes, methods
- Edges:
IMPORTS,INHERITS_FROM,DEFINES,OVERRIDES
Sources: src/thought/ingest/code/pipeline.py
Git-Aware Ingest
For bi-temporal code analysis:
thought ingest-git <repo> --mode snapshot # Fast: HEAD only
thought ingest-git <repo> --mode full # Walk every commit
This enables as_of queries against historical commits.
Sources: CHANGELOG.md
Recall and Query
thought recall "what did I learn about authentication?"
thought repl
The recall command returns up to 10 results with ranked relevance. Use as_of and scope to narrow results further.
Sources: src/thought/cli.py
Database Management
| Command | Description |
|---|---|
thought db size | Disk usage + entity/edge counts |
thought db flush | Wipe the KB (with backup) |
thought db backup <file> | SQLite backup snapshot |
thought db load <file> | Load a backup file |
thought db inspect <file> | Inspect backup without loading |
Sources: CHANGELOG.md
Code Analysis Commands
Callers and Impact Analysis
# Find who calls a function (ranked by PageRank)
thought callers authenticate_user
# Transitive impact: what's affected if I change this?
thought impact JWTValidator
Sources: src/thought/layers/code.py
Diff Between Commits
thought diff --from abc1234 --to def5678
This shows entities added/removed between two ingested commits.
Built-in Demos
Run audience-specific walkthroughs:
thought demo code # Agent/developer flow (14-stage walkthrough)
thought demo writer # Novelist/paper author
thought demo legal # Investigator/paralegal
thought demo researcher # Academic use case
thought demo all # Run all demos sequentially
Each demo runs end-to-end in a self-cleaning temporary directory and produces a structured DemoReport.
Sources: src/thought/demo.py
Entity Data Model
@dataclass(frozen=True)
class CodeEntity:
name: str # Qualified name (e.g., "ClassName.method_name")
type_: CodeEntityType # "module" | "function" | "class" | "method" | "file"
language: str # Programming language
file_path: str # POSIX-style relative path
line_start: int # Starting line number
line_end: int # Ending line number
signature: str # Function/class signature
docstring: str | None
visibility: Literal["public", "private"]
Sources: src/thought/ingest/code/types.py
Supported Languages
The code ingestion pipeline supports:
| Language | Extractor | File Extension |
|---|---|---|
| Python | python_extractor.py | .py |
| TypeScript | typescript_extractor.py | .ts, .tsx |
| PHP | php_extractor.py | .php |
| Rust | rust_extractor.py | .rs |
MCP Server
Start the MCP server for integration with Claude Code:
thought serve # stdio transport (default)
thought serve --transport streamable-http # HTTP transport
Sources: src/thought/cli.py
Utility Commands
| Command | Description |
|---|---|
thought stats | Display knowledge base statistics |
thought forget PATTERN | Soft-delete entities matching SQL LIKE pattern |
thought consolidate | Run one consolidation cycle |
thought doctor | Environment health check |
Bi-Temporal Model
THOUGHT uses a bi-temporal model for knowledge tracking:
valid_from/valid_until: When facts were true in realitylearned_at/unlearned_at: When the system learned/corrected facts
Query variants:
as_of_kind='valid'— "what was true on date X"as_of_kind='learned'— "what did the system know on date X"
Sources: src/thought/models.py
Sources: CHANGELOG.md
Installation and Setup
Related topics: Quickstart Guide, System Architecture
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Quickstart Guide, System Architecture
Installation and Setup
Overview
The thought-mcp project provides a comprehensive CLI tool and MCP (Model Context Protocol) server for AI-powered memory and knowledge management. The installation and setup process involves initializing the local SQLite database, configuring MCP clients (Claude Code, Cursor, etc.), and optionally setting up Claude Code hooks for automated memory operations.
The setup system is designed with idempotency in mind — installations can be safely re-run without disrupting existing configurations.
System Architecture
graph TD
A[User] --> B[thought CLI]
B --> C[init command]
C --> D[SQLite Database]
C --> E[thought.toml Config]
C --> F[CLAUDE.md Agent Hint]
B --> G[MCP Server]
G --> D
B --> H[Client Install]
H --> I[Claude Code]
H --> J[Cursor]
H --> K[VS Code]
B --> L[Hook Install]
L --> M[.claude/settings.json]Prerequisites
| Component | Requirement | Notes |
|---|---|---|
| Python | >= 3.10 | Core runtime |
| Git | On PATH | Used by git pipeline for code ingestion |
| SQLite | 3.x | Bundled with Python stdlib |
| pip/pipx | Latest | Package installation |
Sources: CONTRIBUTING.md
Installation Methods
Standard Installation
pip install thought-mcp
Development Installation
git clone https://github.com/RNBBarrett/thought-mcp.git
cd thought-mcp
pip install -e ".[dev]"
CLI Initialization
The thought init command establishes the complete working environment. It creates three essential components in sequence.
Init Command Signature
@app.command()
def init(
config: Path = typer.Option("thought.toml", help="Path to config file."),
db_path: str = typer.Option(".thought/thought.db", help="SQLite database path."),
embedder: str = typer.Option(
"auto", help="'auto' picks sentence-transformers if available, else deterministic.",
),
write_claude_md: bool = typer.Option(
True, "--write-claude-md/--no-claude-md",
help="Drop a CLAUDE.md so MCP clients learn how to use the tool.",
),
quick: bool = typer.Option(
False, "--quick", help="Skip first-run embedder warmup.",
),
) -> None:
Sources: src/thought/cli.py:35-56
What Init Creates
graph LR
A[thought init] --> B[Create .thought/ directory]
A --> C[Create SQLite DB file]
A --> D[Write thought.toml config]
A --> E[Write CLAUDE.md]
B --> F[parents=True<br/>exist_ok=True]
C --> G[DB auto-backed up<br/>before destructive ops]#### 1. Database Initialization
The command creates the SQLite database at the specified path. Parent directories are created automatically using parents=True to ensure the path exists.
Path(db_path).parent.mkdir(parents=True, exist_ok=True)
Sources: src/thought/cli.py:52-53
#### 2. Configuration File
The thought.toml file contains runtime configuration including embedder settings and database paths.
#### 3. CLAUDE.md Agent Hint
When write_claude_md=True (default), the init command drops a CLAUDE.md file that teaches MCP clients how to interact with the tool.
Embedder Configuration
The init command supports three embedder modes:
| Mode | Behavior | Dependencies |
|---|---|---|
auto (default) | Uses sentence-transformers if available, falls back to deterministic embeddings | Optional: sentence-transformers |
sentence-transformers | Uses local transformer models for embeddings | Required: sentence-transformers |
deterministic | Uses hash-based embeddings, no ML dependencies | None |
The --quick flag skips the first-run embedder warmup process.
MCP Client Installation
The thought clients install command merges a thought MCP server entry into your client's configuration file.
Supported Clients
| Client | Config Location |
|---|---|
| Claude Code | .claude/settings.json |
| Cursor | ~/.cursor/settings.json |
| VS Code | ~/.cursor/settings.json |
Installation Workflow
graph TD
A[thought clients install] --> B{Check config exists?}
B -->|No| C[Create new config file]
B -->|Yes| D[Read existing JSON]
C --> E{Valid JSON object?}
D --> E
E -->|Yes| F[Merge mcpServers entry]
E -->|No| G[Return error]
F --> H{Backup enabled?}
H -->|Yes| I[Create .thought.bak backup]
H -->|No| J[Write merged config]
I --> J
J --> K[Return ClientInstallResult]Client Install Result States
@dataclass(frozen=True)
class ClientInstallResult:
client: ClientName
path: Path
status: Literal["installed", "already_present", "no_path", "error"]
detail: str = ""
Sources: src/thought/clients.py
Server Block Structure
The MCP server configuration block includes:
- Server name (
thought) - Command to execute
- Server arguments
- Environment variables for database path
Claude Code Hook Installation
The thought hooks install command adds hook entries to Claude Code's settings for automated memory operations.
Hook Types
| Hook Kind | Claude Code Event | Command |
|---|---|---|
recall | UserPromptSubmit | thought hook recall |
write | Stop | thought hook write |
context | SessionStart | thought hook context |
Sources: src/thought/hooks/install.py:17-22
Hook Installation Options
def settings_path(*, scope: Literal["project", "user"] = "project") -> Path:
"""Return the ``.claude/settings.json`` path for the requested scope.
Project scope is the recommended default — it travels with the repo and
is what most users actually want for THOUGHT-flavoured auto-memory.
"""
if scope == "project":
return Path.cwd() / ".claude" / "settings.json"
Hook Install Process
graph TD
A[thought hooks install recall] --> B{Backup enabled?}
B -->|Yes| C[Create settings.json.thought.bak]
B -->|No| D[Read settings.json]
C --> D
D --> E{Valid JSON?}
E -->|Yes| F[Merge recall hook entry]
E -->|No| G[Return error]
F --> H{Entry exists?}
H -->|Yes| I[Return already_present]
H -->|No| J[Write updated settings.json]
J --> K[Return HookInstallResult]Hook Install Result
@dataclass(frozen=True)
class HookInstallResult:
kind: HookKind
path: Path
status: Literal["installed", "already_present", "error"]
detail: str = ""
Sources: src/thought/hooks/install.py:28-32
Quick Start Guide
Step 1: Initialize the Environment
# Standard initialization
thought init
# Skip embedder warmup for faster startup
thought init --quick
# Custom database location
thought init --db-path /path/to/custom.db
Step 2: Install MCP Client
# Install for Claude Code
thought clients install claude_code
# Install for Cursor
thought clients install cursor
Step 3: Install Claude Code Hooks (Optional)
# Install recall hook (automatic memory on user input)
thought hooks install recall
# Install write hook (save memory on session stop)
thought hooks install write
# Install context hook (load memory on session start)
thought hooks install context
# Install all hooks
thought hooks install recall --kind write --kind context
Database Lifecycle Management
Database Size Check
thought db size
Shows disk usage of main + WAL + SHM sidecars plus entity/edge counts.
Database Backup
thought db backup <file>
Creates an SQLite online-backup snapshot. Date filters produce a clean, self-contained subset file with DELETE + VACUUM after backup.
Database Restore
thought db load <file>
Atomically replaces the active database with the backup file. Use --merge to INSERT-OR-IGNORE rows from the snapshot instead of replacing.
Database Flush
# Full flush with confirmation
thought db flush
# Skip confirmation
thought db flush --yes
# Date-bounded flush
thought db flush --before 2024-01-01
thought db flush --since 2024-06-01
Note: All destructive operations auto-backup to <db>.bak.<timestamp> before proceeding.
Verifying Installation
Run the Demo
# Run code audience demo
thought demo code
# Run all demos
thought demo all
The demo runs an audience-specific walkthrough end-to-end in a self-cleaning temporary directory, verifying the installation works correctly.
Health Check
thought doctor
Performs an environment health check to verify all dependencies and configurations are correct.
Configuration File Format
thought.toml
[database]
path = ".thought/thought.db"
[embedder]
type = "auto" # or "sentence-transformers", "deterministic"
[server]
name = "thought"
transport = "stdio" # or "streamable-http"
Troubleshooting
Common Issues
| Issue | Solution |
|---|---|
| Config file not found | Run thought init first |
| Database locked | Check for other thought processes |
| Embedder initialization slow | Use --quick flag or deterministic embedder |
| MCP client not connecting | Verify client config has correct server entry |
Reset Installation
# Backup current database
thought db backup /path/to/backup.db
# Flush and reinitialize
thought db flush --yes
thought init --db-path .thought/thought.db
Next Steps
After installation and setup, users typically:
- Ingest code:
thought ingest-git <repo>to analyze repository code - Recall information:
thought recall <query>to query the knowledge base - Run agents: Use reference agents like the vulnerability scanner or OSINT aggregator
Sources: CONTRIBUTING.md
System Architecture
Related topics: Introduction to THOUGHT, Storage and Database Layer, Memory Model and Data Structures
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Introduction to THOUGHT, Storage and Database Layer, Memory Model and Data Structures
System Architecture
Overview
The thought-mcp project is a Model Context Protocol (MCP) server implementation that provides an intelligent memory and code analysis system for AI-assisted development. The system combines semantic memory storage with code graph analysis, enabling natural language queries against codebases through a bi-temporal knowledge graph.
High-Level Architecture
graph TD
subgraph "Client Layer"
MCP[MCP Client]
CLI[Thought CLI]
Hooks[Claude Code Hooks]
end
subgraph "Server Layer"
Server[MCP Server]
Router[Query Router]
Classifier[Query Classifier]
end
subgraph "Memory Layer"
Memory[Memory Manager]
Recall[Recall Engine]
Ask[Ask - NL to Cypher]
end
subgraph "Storage Layer"
Backend[SQLite Backend]
Entities[Entity Store]
Edges[Edge Store]
Embeddings[Vector Embeddings]
end
subgraph "Ingest Layer"
CodePipeline[Code Pipeline]
GitPipeline[Git Pipeline]
Extractors[Language Extractors]
end
MCP --> Server
CLI --> Server
Hooks --> Server
Server --> Router
Router --> Classifier
Classifier --> Memory
Memory --> Backend
CodePipeline --> Backend
GitPipeline --> Backend
Ask --> RecallCore Components
MCP Server (`src/thought/server.py`)
The MCP server exposes the primary tool interface for AI clients. It implements async tool handlers that delegate to the memory layer.
Key Tools:
| Tool | Purpose |
|---|---|
recall | Semantic recall of entities using embeddings |
ask | Natural language queries translated to Cypher |
working_context | Context primitive for agent awareness |
scan | Incremental code scanning with change detection |
Sources: src/thought/server.py:1-100
Query Router and Classifier
The system routes queries through a classification system that detects:
- CODE queries: Triggered by code-shaped keywords (
function,class,caller,callee, file extensions) plus camelCase/snake_case identifiers - CHANGE queries: Historical or diff-based queries
- HYBRID combinations: CODE × CHANGE patterns like "what changed in auth.middleware since v1.0"
graph LR
Q[Query] --> C[Classifier]
C --> |CODE| CR[Code Route]
C --> |CHANGE| CH[Change Route]
C --> |HYBRID| HY[Hybrid Route]
C --> |DEFAULT| DF[Default Recall]Sources: CHANGELOG.md:1-80
Code Layer (`src/thought/layers/code.py`)
The code layer provides a high-level API for code-specific graph queries against the currently-valid view of the code graph.
class CodeLayer:
def callers_of(name) # Who calls this function
def callees_of(name) # What this function calls
def impact_set(name) # Transitive callers, ranked
def defines_in_file() # Entities in a given file
Sources: src/thought/layers/code.py:1-60
Storage Architecture
SQLite Backend
The system uses SQLite as its primary storage with the following schema features:
- Bi-temporal model: Tracks
valid_from/valid_until(business time) andlearned_at(system knowledge time) - Entity/Edge tables with code-specific columns (
code_file,code_language,code_commit_sha) - Partial indexes for efficient queries
- WAL mode with checkpointing for consistent backups
Data Models
Entity Structure:
@dataclass
class CodeEntity:
name: str
type_: str # function, class, module, method
language: str # python, typescript, rust, php
file_path: str
line_start: int
line_end: int
signature: str
docstring: str
visibility: str # public, private, protected
attrs: dict
Edge Types:
CALLS- Function/method invocationsINHERITS_FROM- Class inheritanceIMPORTS- Module importsDEFINES- Member definitions within classesOVERRIDES- Method overrides (TypeScript)
Sources: src/thought/ingest/code/pipeline.py:1-100
Code Ingestion Pipeline
Language Extractors
The system uses tree-sitter parsers for multi-language code extraction:
| Language | File | Capabilities |
|---|---|---|
| Python | python_extractor.py | Functions, classes, imports, inheritance |
| TypeScript | typescript_extractor.py | Functions, classes, imports, exports, inheritance, overrides |
| Rust | rust_extractor.py | Functions, impl blocks, traits |
| PHP | php_extractor.py | Functions, classes, methods, namespaces |
All extractors output CodeEntity and CodeEdge tuples parsed from AST nodes.
Sources: src/thought/ingest/code/python_extractor.py:1-80
Code Pipeline Flow
graph TD
F[File Input] --> LD[Language Detection]
LD --> EX[Extract Entities/Edges]
EX --> SI[Upsert Source]
SI --> WE[_write_entities]
WE --> EE[Embed Signatures]
EE --> WEd[_write_edges]
WEd --> CM[Commit Transaction]
subgraph "Entities Processing"
WE --> |"name_to_id map"| WEd
endSources: src/thought/ingest/code/pipeline.py:100-200
Git Pipeline (`src/thought/ingest/code/git_pipeline.py`)
The git pipeline enables historical code analysis with two modes:
| Mode | Behavior |
|---|---|
snapshot | Fast - ingest HEAD only, stamp entities with HEAD SHA |
full | Walk every commit chronologically, stamp each entity with its commit SHA |
The full mode enables bi-temporal as_of queries against historical commits.
Sources: src/thought/ingest/code/git_pipeline.py:1-50
Query System
Recall Engine
Semantic recall uses vector embeddings to find entities by intent rather than exact name:
def recall(
query: str,
scope: str = "all",
owner_id: str | None = None,
limit: int = 10,
) -> list[RecallHit]
The system embeds entity signatures and docstrings during ingestion, enabling natural queries like "who calls authenticate_user".
Ask Engine (`src/thought/query/ask.py`)
Natural language to Cypher translation with validation:
graph LR
NL[Natural Language] --> PROMPT[Build Prompt]
PROMPT --> LLM[LLM Provider]
LLM --> CY[Cypher Query]
CY --> VAL[Validate]
VAL --> |Valid| EXE[Execute]
VAL --> |Invalid| FB[Fallback to Recall]Constraint System:
- Read-only Cypher features only (MATCH, WHERE, RETURN)
- Validates against actual schema before execution
- Falls back to
recall()on translation failures
Sources: src/thought/query/ask.py:1-80
Integration Points
MCP Client Installation (`src/thought/clients.py`)
The system installs as an MCP server for AI coding tools:
def install(client: ClientName, *, server_name: str = "thought")
Supported clients include Claude Code and other MCP-compatible tools. Installation merges configuration without disturbing existing settings.
Claude Code Hooks (`src/thought/hooks/install.py`)
Hooks provide automatic memory integration:
| Hook | Event | Action |
|---|---|---|
recall | UserPromptSubmit | Memory recall on user input |
write | Stop | Context capture on completion |
context | SessionStart | Session initialization |
Sources: src/thought/hooks/install.py:1-50
CLI Architecture (`src/thought/cli.py`)
The command-line interface provides database lifecycle management:
| Command | Function |
|---|---|
thought init | Create database + config + CLAUDE.md |
thought db size | Disk usage + entity/edge counts |
thought db flush | Wipe KB with backup |
thought db backup | SQLite online-backup snapshot |
thought db load | Load snapshot atomically |
thought db inspect | Count + schema summary |
thought ingest-git | Git-history-aware ingestion |
thought callers | Direct callers via PageRank |
thought impact | Transitive impact set |
thought diff | Entity diff between commits |
Sources: src/thought/cli.py:1-100
Demo System (`src/thought/demo.py`)
The built-in demo provides audience-specific walkthroughs:
| Audience | Purpose |
|---|---|
code | Agent/developer flow - 14-stage code vertical |
writer | Novelist/paper author - bi-temporal recall |
legal | Investigator - contradiction detection |
researcher | Academic - claim/source relationships |
Sources: src/thought/demo.py:1-50
Configuration
Database Initialization
# thought.toml
[database]
path = ".thought/thought.db"
[llm]
provider = "anthropic" # or ollama, lmstudio, openai-compat
[embedder]
type = "auto" # sentence-transformers if available, else deterministic
Sources: src/thought/cli.py:50-80
Summary
The thought-mcp architecture combines:
- MCP Server - Tool interface for AI clients
- Bi-temporal Storage - SQLite with code-specific schema
- Multi-language Extractors - Tree-sitter based AST parsing
- Git Integration - Historical code analysis
- Query Routing - Classification-based query dispatch
- Natural Language Interface - NL to Cypher translation
This design enables both real-time code assistance and deep historical analysis of codebases through a unified query interface.
Sources: src/thought/server.py:1-100
Storage and Database Layer
Related topics: System Architecture, Memory Model and Data Structures
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: System Architecture, Memory Model and Data Structures
Storage and Database Layer
Overview
The Storage and Database Layer is the persistence backbone of the THOUGHT system, providing a structured SQLite-based knowledge base (KB) for storing entities, edges, embeddings, and operational metadata. This layer abstracts database operations through a modular backend interface, enabling CRUD operations, bi-temporal data tracking, and specialized queries for code analysis.
The architecture supports:
- Entity/Edge persistence with bi-temporal validity tracking (valid_from, valid_until, learned_at)
- Vector embeddings for semantic recall operations
- Source tracking for ingested content provenance
- Code-specific metadata including language, file path, and commit SHA
- Agent and scan logging for operational auditability
Sources: src/thought/storage/__init__.py
Sources: src/thought/storage/__init__.py
Memory Model and Data Structures
Related topics: System Architecture, Storage and Database Layer, Query and Retrieval System
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: System Architecture, Storage and Database Layer, Query and Retrieval System
Memory Model and Data Structures
Overview
The thought-mcp repository implements a multi-layered memory architecture designed for AI-assisted knowledge management. The memory model combines vector embeddings for semantic search, graph relationships for structural querying, and temporal versioning for historical analysis. This hybrid approach enables both intuitive natural-language recall and precise code-intent queries.
The core memory system operates as a knowledge base (KB) with bi-temporal semantics, tracking when facts became true (valid_from) versus when the system learned them (learned_at). This design supports time-travel queries that answer "what was true on date X" or "what did the system know on date X".
Architecture Layers
The memory system is organized into three distinct but interconnected layers:
graph TD
A[User Input] --> B[Memory Layer]
B --> C[Vector Layer]
B --> D[Graph Layer]
B --> E[Temporal Layer]
C --> F[SQLite Backend]
D --> F
E --> F
G[Query/Recall] --> BVector Layer (`src/thought/layers/vector.py`)
The vector layer handles semantic embedding and similarity search. It stores dense vector representations of entities enabling natural-language recall based on meaning rather than exact keyword matching.
Core Responsibilities:
- Embed text content (entity names, signatures, docstrings) into high-dimensional vectors
- Store embeddings with model metadata (name, version, dimensions)
- Perform similarity searches against the embedded corpus
- Support fallback to deterministic embeddings when ML models are unavailable
Key Components:
| Component | Purpose |
|---|---|
VectorStore | Persists embeddings in SQLite with metadata |
Embedder | Base protocol for embedding models |
OllamaEmbedder | Integration with Ollama's /api/embed endpoint |
DeterministicEmbedder | Fallback using hash-based vectors |
Sources: src/thought/layers/vector.py
Graph Layer (`src/thought/layers/graph.py`)
The graph layer manages entity-relationship data structures and supports Cypher-style traversals. It maintains the structural knowledge of how entities connect to each other.
Entity Types Supported:
| Type | Description |
|---|---|
module | Source file or namespace unit |
class | Class or type declarations |
function | Function definitions |
method | Class methods |
fact | General knowledge facts |
claim | Academic/research claims |
source | Citation or reference |
witness | Legal testimony statements |
Edge Relation Types:
| Relation | Meaning |
|---|---|
IMPORTS | Module dependency relationship |
INHERITS_FROM | Class inheritance |
DEFINES | Container defines a member |
OVERRIDES | Method overrides parent |
CALLS | Function invocation |
REFERS_TO | General reference |
CONTRADICTS | Logical opposition between facts |
Sources: src/thought/layers/graph.py
Temporal Layer (`src/thought/layers/temporal.py`)
The temporal layer implements bi-temporal data modeling, tracking both valid time and learned time for all entities. This enables sophisticated time-travel queries and contradiction detection.
Bi-Temporal Model:
graph LR
A[Entity] --> B[valid_from<br/>When fact became true]
A --> C[learned_at<br/>When KB learned fact]
D[as_of valid] --> E[Historical state query]
D --> F[as_of learned<br/>System knowledge query]Key Temporal Features:
valid_from: Timestamp when the fact became true in realitylearned_at: Timestamp when the system recorded the factvalid_until: Optional expiration of fact validityCONTRADICTSedges: Automatically surface when facts conflict across time axes
Sources: src/thought/layers/temporal.py
Core Data Models
Entity Model (`src/thought/models.py`)
The base Entity model represents all stored knowledge items in the system.
class Entity:
id: str # Unique identifier
name: str # Canonical name
type: str # Entity type (see table above)
scope: str # "shared" or "private"
owner_id: str | None # Owner for private entities
valid_from: datetime # When fact became true
learned_at: datetime # When system learned it
source_ref: str | None # Reference to source document
tier: str # "hot", "warm", "cold" (access frequency)
attrs: dict # Type-specific attributes
Entity Attributes by Type:
| Entity Type | Key Attributes |
|---|---|
code_* | code_file, code_language, code_commit_sha, signature, visibility, line_start, line_end |
fact | predicates, unique_predicates, source_doc |
claim | citation_key, reliability_score |
Sources: src/thought/models.py
Code Entities (`src/thought/ingest/entities.py`)
Code-specific entities extend the base model with language-aware attributes:
class CodeEntity:
name: str
type_: str # "module", "class", "function", "method"
language: str # "python", "typescript", "rust", "php"
file_path: str
line_start: int
line_end: int
signature: str # Function/class signature
visibility: str # "public", "private", "protected"
docstring: str | None
attrs: dict # Language-specific (e.g., `class` for methods)
Edge Model (`src/thought/ingest/entities.py`)
Relationships between entities are modeled as typed, directed edges:
class CodeEdge:
source_name: str # Origin entity
target_name: str # Destination entity
relation_type: str # IMPORTS, DEFINES, INHERITS_FROM, etc.
line_number: int | None
attrs: dict
Sources: src/thought/ingest/entities.py
Consolidation Engine (`src/thought/consolidation/engine.py`)
The consolidation engine handles fact deduplication, merging, and contradiction detection. It processes incoming data through a pipeline that ensures data quality and consistency.
graph TD
A[Raw Input] --> B[Jaccard Deduplication]
B --> C[Fact Extraction]
C --> D[Predicate Matching]
D --> E{Conflict?}
E -->|Yes| F[Create CONTRADICTS Edge]
E -->|No| G[Merge into KB]
F --> GConsolidation Pipeline Steps:
- Jaccard Deduplication: Skip content with >50% overlap to existing facts
- Fact Extraction: Parse structured predicates from unstructured text
- Predicate Matching: Match against existing knowledge using unique predicates
- Contradiction Detection: Create
CONTRADICTSedges when facts conflict - Entity Merging: Upsert with identity
(name, code_file, code_commit_sha)
Sources: src/thought/consolidation/engine.py
Storage Backend
The system uses SQLite as its primary storage engine with the following schema:
graph TD
A[SQLite Database] --> B[entities table]
A --> C[edges table]
A --> D[embeddings table]
A --> E[applied_migrations table]
B --> F[code_file<br/>code_language<br/>code_commit_sha]
C --> G[relation_type<br/>source_name<br/>target_name]
D --> H[model_name<br/>model_version<br/>vector BLOB]Key Backend Classes:
| Class | Responsibility |
|---|---|
Backend | Core CRUD operations on entities/edges |
find_code_entity() | Fast lookup by name + file/commit disambiguators |
upsert_entity() | Insert or update with identity awareness |
store_embedding() | Persist vectors with model metadata |
Sources: src/thought/storage/sqlite/backend.py (inferred from CHANGELOG.md)
Query Pathways
The memory system supports multiple query mechanisms:
Recall (Vector Search)
def recall(
query: str,
scope: str = "all",
owner_id: str | None = None,
max_results: int = 10,
) -> list[RecallResult]
Returns up to 10 semantically similar entities based on embedding similarity.
Ask (Natural Language to Cypher)
Routes natural-language questions through an LLM to generate Cypher queries:
QUESTION: "who calls authenticate_user"
→ CYPHER: MATCH (caller)-[:CALLS]->(f:Function {name: 'authenticate_user'})
RETURN caller.name
Sources: src/thought/query/ask.py
Code Intelligence Queries
| Command | Purpose |
|---|---|
thought callers <name> | Direct callers via Personalized PageRank |
thought impact <name> | Transitive impact set (what breaks if changed) |
thought diff --from SHA1 --to SHA2 | Entity diff between commits |
Ingest Pipelines
Code ingestion follows a standardized pipeline:
graph TD
A[Source File] --> B[Language Detection]
B --> C[AST Parser<br/>tree-sitter]
C --> D[Extractor<br/>Language-specific]
D --> E[CodeEntity list]
D --> F[CodeEdge list]
E --> G[Embedding]
G --> H[Backend upsert]
F --> H
H --> I[Call Graph Builder<br/>optional]Supported Languages:
- Python (
.py) - via tree-sitter-python - TypeScript (
.ts,.tsx) - via tree-sitter-typescript - Rust (
.rs) - via tree-sitter-rust - PHP (
.php) - via tree-sitter-php
Extracted Metadata:
- Module/namespace names
- Class declarations with heritage (extends, implements)
- Function and method definitions
- Import/use declarations
- Visibility modifiers (public, private, protected)
Sources: src/thought/ingest/code/python_extractor.py, src/thought/ingest/code/typescript_extractor.py
Auto-Memory Hooks
The system integrates with Claude Code via hooks for automatic memory management:
| Hook | Event | Action |
|---|---|---|
recall | UserPromptSubmit | Embeds prompt, recalls relevant context |
write | Stop | Extracts facts from session transcript |
context | SessionStart | Loads relevant context for new session |
Sources: src/thought/hooks/install.py
Versioning and Snapshots
The storage layer supports full database lifecycle management:
| Operation | Description |
|---|---|
db size | Disk usage + entity/edge counts |
db flush | Wipe KB with date-bounded options |
db backup <file> | SQLite online backup snapshot |
db load <file> | Restore or merge from snapshot |
db inspect <file> | Preview backup without loading |
WAL (Write-Ahead Logging) checkpoints ensure consistent backups.
Summary
The thought-mcp memory model implements a production-grade knowledge management system with:
- Three-layer architecture: Vector for semantics, Graph for structure, Temporal for history
- Bi-temporal semantics: Tracks both validity and knowledge acquisition times
- Code-aware extraction: AST-based parsing for multiple programming languages
- Contradiction detection: Automatic
CONTRADICTSedges between conflicting facts - Multiple query pathways: Semantic recall, natural-language Cypher, and code-intelligence commands
- Git-aware versioning: Commits can be stamped on entities for historical queries
This architecture enables sophisticated AI memory capabilities while maintaining query performance through strategic use of SQLite with proper indexing.
Sources: src/thought/layers/vector.py
Query and Retrieval System
Related topics: Memory Model and Data Structures, Storage and Database Layer, Agent Adapters and SDK Integration
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Memory Model and Data Structures, Storage and Database Layer, Agent Adapters and SDK Integration
Query and Retrieval System
The Query and Retrieval System is a core subsystem within the thought-mcp project that enables users to query the knowledge graph using natural language. It translates human-readable questions into structured Cypher queries or SQL statements, executes them against the underlying SQLite backend, and returns ranked, relevant results. The system serves as the primary interface for retrieving facts, code entities, relationships, and historical data stored in the memory database.
Architecture Overview
The Query and Retrieval System is composed of several interconnected modules that work together to process, route, and execute queries. At its core, the system leverages a Router to classify incoming queries into semantic categories, then delegates processing to specialized handlers based on the query type.
graph TD
A[User Query] --> B[Router]
B --> C{Code Query?}
B --> D{Natural Language?}
B --> E{Search Query?}
C --> F[Code Layer]
D --> G[Ask Module]
G --> H[Cypher Translator]
H --> I[Query Validator]
I --> J[SQLite Backend]
E --> K[Recall Hook]
K --> J
J --> L[Results]
F --> LThe system follows a layered approach where queries are first classified by intent, then transformed into appropriate database queries. Natural language queries are translated to Cypher through an LLM-based translator, while code-specific queries bypass translation and directly execute predefined graph traversal operations.
Query Classification
The Router module plays a critical role in determining how each query should be processed. Based on keyword detection and pattern matching, queries are classified into distinct types that trigger different handling paths.
Query Types
| Query Type | Trigger Keywords | Handler | Use Case |
|---|---|---|---|
| CODE | function, class, caller, callee, impact, file extensions, camelCase identifiers | CodeLayer | Code graph traversal |
| CHANGE | since v1.0, before this commit, diff | GitIngestReport | Version-aware queries |
| HYBRID | CODE × CHANGE combinations | GraphLayer + GitWalker | Historical code analysis |
| SEARCH | General text | Recall Hook | Semantic search |
| ASK | Natural language questions | Ask Module | Natural language to Cypher |
Sources: src/thought/query/ask.py:1-30
CODE Query Detection
The CODE query class is triggered by code-shaped keywords and identifier patterns. This includes function names, class declarations, caller/callee relationships, file extensions such as .py or .ts, and version-related phrases like since v1.0 or before this commit. Additionally, camelCase and snake_case identifiers automatically route to the CODE handler, enabling queries like "who calls authenticate_user" to be processed through the call-graph machinery without explicit CLI invocation.
Sources: src/thought/query/ask.py:1-30
Natural Language to Cypher Translation
The Ask module (src/thought/query/ask.py) is responsible for translating natural language questions into Cypher queries. This translation is performed by an LLM provider configured in the [llm] section of the configuration file, supporting multiple backends through a unified interface.
Translation Process
sequenceDiagram
participant U as User
participant A as Ask Module
participant L as LLM Provider
participant V as Cypher Validator
participant B as SQLite Backend
U->>A: "What functions call authenticate_user?"
A->>A: Build Prompt with Schema
A->>L: Send Prompt
L-->>A: Cypher Query
A->>V: Validate Cypher
alt Valid
V->>B: Execute Query
B-->>V: Results
V-->>U: Ranked Results
else Invalid
A->>A: Fallback to Recall
A-->>U: Semantic Search Results
endThe translation process begins with constructing a prompt that includes the database schema, entity types, and relationship types. The LLM generates a Cypher query that is then validated against a parser before execution. If validation fails or the query cannot be executed, the system gracefully falls back to a plain recall() call, ensuring the user always receives some response.
Prompt Constraints
The Ask module enforces strict constraints on generated queries to maintain system safety and performance:
- Only read-only Cypher features are permitted, including
MATCH,WHERE,RETURN,LIMIT, andAS_OF - Query types are restricted to
MERGE,CREATE,DELETE,SET, andWITHbeing explicitly forbidden - All entity types and relationship types must come from the defined schema
- Single Cypher queries are required without explanations or markdown formatting
Sources: src/thought/query/ask.py:1-50
AskResult Data Model
The AskResult dataclass encapsulates the outcome of a query translation and execution attempt:
| Field | Type | Description | |
|---|---|---|---|
cypher | `str \ | None` | The generated Cypher query |
sql | `str \ | None` | Alternative SQL query if applicable |
rows | `list[dict[str, Any]] \ | None` | Query results |
fallback_used | bool | Whether fallback to recall was triggered | |
fallback_reason | str | Explanation if fallback occurred |
Sources: src/thought/query/ask.py:1-50
Recall Hook
The recall hook (src/thought/hooks/recall.py) provides semantic search functionality as a fallback mechanism and primary retrieval method. It uses embedding vectors to find semantically similar entities in the knowledge graph, supporting the core recall operation used throughout the system.
Recall Behavior
Recall operations are bounded by design to prevent overwhelming the user with too many results. The system never returns more than 10 hits regardless of knowledge base size, encouraging users to narrow their queries using as_of and scope parameters for more targeted retrieval.
The recall mechanism supports bi-temporal queries through the as_of_kind parameter:
valid: Returns what was true on a given date, answering "what was true on date X"learned: Returns what the system knew on a given date, answering "what did the system know on date X"
These two modes differ when facts are corrected after the fact, enabling users to perform historical analysis of their knowledge graph.
Sources: src/thought/query/views.py
Code Layer
The Code Layer (src/thought/layers/code.py) provides a specialized interface for code-specific graph queries. It wraps the GraphLayer with operations native to programmers, operating against the currently-valid view of the code graph using the valid_until IS NULL filter.
Core Operations
| Method | Description | Use Case |
|---|---|---|
callers_of(name) | Direct callers ranked by PageRank | Finding who uses a function |
callees_of(name) | Direct callees within the package | Finding what a function calls |
impact_set(name) | Transitive callers ranked | Dependency analysis |
defines_in_file() | All entities in a file | File-level inspection |
All four operations support optional as_of parameters to query historical snapshots when bi-temporal git ingest has been configured. The code_commit_sha field enables time-travel queries against the code graph.
Sources: src/thought/layers/code.py:1-50
Entity Resolution
The _resolve_entity_id method handles name resolution with multiple fallback strategies:
- Intra-file match with exact name
- Cross-file match with unique qualified suffix
- Cross-file bare-name match for top-level functions
- Stub creation for unresolved references
This multi-stage resolution ensures that queries like obj.method() can resolve to ClassName.method when it is unique in the knowledge base, and that bare function names can be found across different files.
Sources: src/thought/query/cypher.py
Cypher Query Engine
The Cypher module (src/thought/query/cypher.py) handles the parsing, validation, and execution of Cypher queries against the SQLite backend. It provides a bridge between the graph query language and the relational database storage.
Query Validation
Before executing any Cypher query, the system validates it against the defined grammar to prevent malformed queries from reaching the database. This validation step catches syntax errors, unsupported features, and schema violations before they can cause runtime errors.
Execution Model
Cypher queries are translated into equivalent SQL statements that operate against the SQLite schema. The translation preserves the semantic meaning of graph patterns while adapting them to the relational storage model used by the backend.
Views and Data Models
The views module (src/thought/query/views.py) defines the data structures and return formats used throughout the Query and Retrieval System.
Entity Model
The Entity model represents nodes in the knowledge graph with the following key attributes:
| Attribute | Type | Description | |
|---|---|---|---|
id | str | Unique identifier | |
type | str | Entity type (PERSON, function, class, etc.) | |
name | str | Display name | |
canonical_name | str | Normalized name for matching | |
scope | ScopeName | shared, private, or all | |
tier | Tier | hot, warm, or cold | |
valid_from | datetime | Start of validity period | |
valid_until | `datetime \ | None` | End of validity period |
attrs | dict[str, object] | Additional attributes |
Scope Filter
The ScopeFilter class determines visibility of entities based on ownership and scope:
shared: All entities with scope = "shared"private: Entities matching both scope = "private" AND owner_idall: Shared entities plus private entities owned by the requesting user
The scope filter generates SQL fragments that join against the entity table aliased as e, enabling fine-grained access control across the query system.
Sources: src/thought/models.py:1-80
CLI Commands
The Query and Retrieval System is exposed through several CLI commands under the thought command group:
| Command | Description |
|---|---|
thought recall <query> | Semantic search across the knowledge graph |
thought ask <question> | Natural language query with Cypher translation |
thought callers <name> | Find direct callers ranked by PageRank |
thought callees <name> | Find direct callees within the package |
thought impact <name> | Transitive impact set analysis |
thought browse <name> | Drill into a topic with PPR-ranked neighborhood |
thought diff --from <sha1> --to <sha2> | Compare entities between commits |
Browse Command
The browse command (mcp__thought__browse_topic) implements a two-step resolution process. First, the name is matched against entity types for a type facet. If no type matches, the name is resolved as an entity using canonical-name matching, and the PPR-ranked neighborhood is returned. The via field in results indicates whether the hit came from type_facet, ppr, or bfs matching.
Sources: src/thought/cli.py
Configuration
The Query and Retrieval System respects configuration from the thought.toml file and environment variables:
| Option | Default | Description |
|---|---|---|
embedder | auto | Embedder selection: auto, sentence-transformers, or deterministic |
llm.provider | openai | LLM provider for Ask module |
llm.model | varies | Model name for translation |
db_path | .thought/thought.db | SQLite database path |
The auto embedder selector probes the sentence_transformers package via importlib.util.find_spec before returning the wrapper, falling back to the deterministic embedder when the optional dependency is missing.
Integration Points
The Query and Retrieval System integrates with several other subsystems:
- Storage Layer: SQLite backend provides entity and edge persistence
- Ingest System: Code extractors populate entities that are later queried
- Memory Module: Coordinates between recall, browse, and scan operations
- Server: Exposes query functionality via MCP protocol
The bidirectional relationship between the Code Layer and the Cypher query engine enables both natural language queries like "who calls authenticate_user" and structured queries using the CODE query class, providing flexibility for different user interaction patterns.
Error Handling
The system implements graceful degradation throughout the query pipeline. If Cypher translation fails or validation rejects the generated query, execution falls back to the recall hook, ensuring users always receive results. Bounded result sets prevent resource exhaustion, and the contradiction detection mechanism surfaces conflicts as CONTRADICTS edges in the graph rather than throwing errors, allowing downstream applications to handle them as data.
Sources: src/thought/query/ask.py:1-30
Multi-Language Code Parsing
Related topics: Git History Integration, Storage and Database Layer
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Git History Integration, Storage and Database Layer
Multi-Language Code Parsing
The Multi-Language Code Parsing system is the foundational code-vertical layer in THOUGHT. It provides language-agnostic AST extraction across six programming languages using tree-sitter grammars, produces standardized code entities and relationship edges, and enables downstream features like caller analysis, impact queries, and cross-file call-graph resolution.
Overview
The parsing system operates in two phases:
- Phase 1 – AST Extraction: Each language has a dedicated extractor that walks the tree-sitter parse tree and emits
CodeEntityandCodeEdgeobjects. - Phase 2 – Call Graph Resolution: After all files are ingested, a separate pass resolves
CALLSedges by matching callee names against the entity index.
Sources: src/thought/ingest/code/ast_extractor.py:1-15
Supported Languages
The system supports six languages through language-specific extractors:
| Language | Extractor File | Tree-sitter Grammar |
|---|---|---|
| Python | python_extractor.py | tree-sitter-python |
| TypeScript / TSX / JSX | typescript_extractor.py | tree-sitter-typescript |
| Go | go_extractor.py | tree-sitter-go |
| Rust | rust_extractor.py | tree-sitter-rust |
| Java | java_extractor.py | tree-sitter-java |
| PHP | php_extractor.py | tree-sitter-php |
Sources: src/thought/ingest/code/ast_extractor.py:30-55
Architecture
graph TD
A[Code File] --> B[Language Detection]
B --> C[ast_extractor.py Dispatcher]
C --> D{Python?}
C --> E{TypeScript?}
C --> F{Go?}
C --> G{Rust?}
C --> H{Java?}
C --> I{PHP?}
D --> J[python_extractor.extract]
E --> K[typescript_extractor.extract]
F --> L[go_extractor.extract]
G --> M[rust_extractor.extract]
H --> N[java_extractor.extract]
I --> O[php_extractor.extract]
J --> P[(CodeEntity, CodeEdge)]
K --> P
L --> P
M --> P
N --> P
O --> P
P --> Q[CodeIngestPipeline]
Q --> R[build_call_graph]
R --> S[(CALLS Edges)]Dispatcher Pattern
The ast_extractor.py module uses lazy loading to avoid importing heavy tree-sitter C extensions at module load time:
_REGISTRY: dict[str, Callable[[str, str], tuple[list[CodeEntity], list[CodeEdge]]]] = {}
def _python_extractor():
from . import python_extractor
return python_extractor.extract
Each language loader is registered in _LOADERS and invoked only when that language is first requested. Sources: src/thought/ingest/code/ast_extractor.py:9-35
Data Models
CodeEntity
Represents a code element extracted from the AST:
| Field | Type | Description | |
|---|---|---|---|
name | str | Canonical identifier (module, function, class, method) | |
type_ | str | Entity kind: module, function, class, method | |
language | str | Source language: python, typescript, go, rust, java, php | |
file_path | str | Path to source file (relative to repo root) | |
line_start | int | 1-indexed start line | |
line_end | int | 1-indexed end line | |
signature | str | Declaration signature (e.g., module foo, def bar(self, x)) | |
docstring | `str \ | None` | Extracted docstring text |
visibility | str | public or private based on naming conventions | |
attrs | dict | Language-specific metadata |
Sources: src/thought/ingest/code/python_extractor.py:14-25
CodeEdge
Represents a relationship between entities:
| Field | Type | Description |
|---|---|---|
source_name | str | Entity that is the subject of the relation |
target_name | str | Entity that is the object of the relation |
relation_type | str | One of: IMPORTS, INHERITS_FROM, DEFINES, OVERRIDES, CALLS |
line_number | int | Source line where the relationship was discovered |
attrs | dict | Additional metadata (e.g., from_import: true) |
Sources: src/thought/ingest/code/typescript_extractor.py:110-115
Extractor Interface
All language extractors share a common signature:
def extract(source: str, file_path: str) -> tuple[list[CodeEntity], list[CodeEdge]]:
...
This uniform interface allows the dispatcher to route to any language without knowing implementation details. Sources: src/thought/ingest/code/python_extractor.py:28-40
Supported Edge Types
| Relation | Source | Target | Languages |
|---|---|---|---|
IMPORTS | module | imported module | Python, TypeScript, PHP, Go, Rust, Java |
INHERITS_FROM | class | parent class | Python, TypeScript, Java, PHP |
DEFINES | class/module | contained member | All languages |
OVERRIDES | method | overridden method | TypeScript (currently) |
CALLS | function/method | called function | All (via call-graph pass) |
Sources: src/thought/ingest/code/python_extractor.py:1-15, src/thought/ingest/code/typescript_extractor.py:1-20
Language-Specific Extractors
Python Extractor
The Python extractor uses tree-sitter-python and handles:
- Module entities as the root node
- Function definitions (
function_item) - Class declarations (
class_declaration) - Method definitions within classes
- Import statements (
import_from_statement,import_statement) - Class inheritance via
basefield
def extract(source: str, file_path: str) -> tuple[list[CodeEntity], list[CodeEdge]]:
parser = _get_parser()
source_bytes = source.encode("utf-8")
tree = parser.parse(source_bytes)
root = tree.root_node
module_name = _module_name_from_path(file_path)
entities: list[CodeEntity] = []
edges: list[CodeEdge] = []
entities.append(CodeEntity(
name=module_name,
type_="module",
language="python",
...
))
Sources: src/thought/ingest/code/python_extractor.py:28-50
TypeScript Extractor
The TypeScript extractor supports both .ts and .tsx files using separate tree-sitter grammars:
def extract(source: str, file_path: str) -> tuple[list[CodeEntity], list[CodeEdge]]:
use_tsx = file_path.endswith((".tsx", ".jsx"))
parser = _get_parser(use_tsx=use_tsx)
...
Node types processed include function_declaration, arrow_function, class_declaration, method_definition, import_statement, and export_statement. Sources: src/thought/ingest/code/typescript_extractor.py:120-145
PHP Extractor
The PHP extractor handles files starting with <?php and recursively scans for definitions nested under namespace_definition blocks:
def _scan(node: Node) -> None:
for child in node.named_children:
...
Sources: src/thought/ingest/code/php_extractor.py:45-60
Rust Extractor
The Rust extractor uses tree-sitter-rust and tracks method visibility through impl_type attributes:
out_entities.append(CodeEntity(
name=qualified, type_="method", language="rust",
visibility=_rust_visibility(child, source_bytes),
attrs={"impl_type": type_name},
))
Sources: src/thought/ingest/code/rust_extractor.py:1-30
Call Graph Resolution
The call graph is built in a separate Phase 2 pass after all files are ingested. The build_call_graph function resolves callee references using a cascade of strategies:
- Exact match within same file — direct intra-file resolution
- Qualified suffix match —
obj.method()resolves toClassName.method - Cross-file bare-name match — top-level functions defined elsewhere
- Stub creation — synthetic stub for unknown callees (filtered from impact graphs)
tgt_id = backend.find_code_entity(
canonical_name=callee_name, scope_filter=sf, code_file=file_path,
)
if tgt_id is None and "." not in callee_name:
# Unique qualified suffix match.
rows = backend._conn.execute(
"SELECT id FROM entities "
"WHERE type IN ('method','function') AND valid_until IS NULL "
"AND canonical_name LIKE ? ...",
(f"%.{callee_name.lower()}", commit_sha),
).fetchall()
Sources: src/thought/ingest/code/call_graph.py:1-60
CodeIngestPipeline
The CodeIngestPipeline orchestrates the full ingest workflow:
- Reads source file content
- Detects or validates language
- Calls the appropriate extractor
- Creates a source reference record
- Writes entities within a single transaction
- Embeds entity signatures and docstrings for VIBE recall
- Writes edges and resolves call graph
graph LR
A[Source File] --> B[detect_language]
B --> C[extract entities/edges]
C --> D[upsert_source]
D --> E[begin transaction]
E --> F[_write_entities + embed]
F --> G[_write_edges]
G --> H[build_call_graph]
H --> I[commit]The pipeline embeds entity signatures and docstrings so that queries like "who calls authenticate_user" can find functions by intent rather than exact name. Sources: src/thought/ingest/code/pipeline.py:1-80
CodeLayer API
The CodeLayer provides a high-level interface for code graph queries:
| Method | Description |
|---|---|
callers_of(name) | Direct callers, ranked by Personalized PageRank |
callees_of(name) | Direct callees (intra-package) |
impact_set(name) | Transitive callers, ranked — for thought impact command |
defines_in_file(path) | All entities discovered in a file |
All methods operate against the currently-valid view (valid_until IS NULL). Pass as_of= for historical snapshots. Sources: src/thought/layers/code.py:1-40
Git-Aware Ingest
The GitWalker enables two ingestion modes:
| Mode | Behavior |
|---|---|
snapshot (default) | Ingest HEAD only, stamp every entity with HEAD SHA |
full | Walk every commit chronologically, stamp each entity with its commit SHA |
This enables bi-temporal as_of queries against historical commits. Sources: src/thought/ingest/code/git_pipeline.py:1-50
Configuration
Language is auto-detected by file extension when language=None:
| Extension | Language |
|---|---|
.py | python |
.ts, .tsx, .js, .jsx | typescript |
.go | go |
.rs | rust |
.java | java |
.php | php |
Pass language= explicitly to override detection. Sources: src/thought/ingest/code/pipeline.py:25-35
Git History Integration
Related topics: Multi-Language Code Parsing, Memory Model and Data Structures
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Multi-Language Code Parsing, Memory Model and Data Structures
Git History Integration
Overview
Git History Integration enables thought-mcp to ingest source code with full commit-level provenance, allowing bi-temporal queries that can reconstruct what a codebase looked like at any point in its history. This feature stamps every extracted code entity (functions, classes, modules) with the exact git commit SHA where it was discovered, creating a temporal graph that supports "as-of" queries.
The system provides two ingestion modes: a fast snapshot mode for current-state analysis and a comprehensive full-history mode for complete historical reconstruction.
Sources: CHANGELOG.md
Architecture
Component Overview
graph TD
subgraph "Git History Integration"
CLI["thought ingest-git CLI"]
Pipeline["GitIngestPipeline"]
Walker["GitWalker"]
Storage["SQLite Backend"]
end
subgraph "Git Operations"
Git["git executable"]
RevParse["rev-parse HEAD"]
Log["log --format"]
LsTree["ls-tree -r"]
Show["show <sha>:<path>"]
end
CLI --> Pipeline
Pipeline --> Walker
Walker --> Git
Git --> RevParse
Git --> Log
Git --> LsTree
Git --> Show
Pipeline --> StorageData Flow
sequenceDiagram
participant User
participant CLI
participant Pipeline
participant Walker
participant Extractor
participant Backend
User->>CLI: thought ingest-git /repo --mode full
CLI->>Pipeline: run(repo_path, mode)
alt snapshot mode
Pipeline->>Walker: get_head_commit()
Walker->>Git: rev-parse HEAD
Git-->>Walker: sha
Pipeline->>Pipeline: ingest single snapshot
else full mode
Pipeline->>Walker: get_all_commits()
Walker->>Git: log --format
Git-->>Walker: commit list
Loop for each commit
Pipeline->>Git: ls-tree -r sha
Pipeline->>Git: show sha:path
Git-->>Pipeline: file content
Pipeline->>Extractor: extract(entities, edges)
Extractor-->>Pipeline: CodeEntity[], CodeEdge[]
Pipeline->>Backend: upsert with commit_sha
end
end
Pipeline-->>User: GitIngestReportSources: src/thought/ingest/code/git_pipeline.py:1-95 Sources: src/thought/ingest/code/git_walker.py:1-60
Core Components
GitWalker
The GitWalker class provides a read-only interface to git repositories using pure subprocess calls. It deliberately avoids native dependencies like pygit2 to minimize installation footprint.
| Method | Git Command | Purpose |
|---|---|---|
get_head_sha() | rev-parse HEAD | Get current HEAD commit SHA |
get_all_commits() | log --format=... | List all commits chronologically |
get_files_at_commit(sha) | ls-tree -r <sha> | List files in tree at commit |
get_file_at_commit(sha, path) | show <sha>:<path> | Get file content at commit |
#### Commit Data Model
@dataclass(frozen=True)
class Commit:
sha: str # Full commit SHA
author: str # Author name
author_email: str # Author email
author_date: datetime # Commit timestamp
subject: str # Commit message first line
Sources: src/thought/ingest/code/git_walker.py:24-31
#### Initialization Validation
def __init__(self, repo_path: Path | str) -> None:
self.repo = Path(repo_path)
if shutil.which("git") is None:
raise RuntimeError("git executable not on PATH")
if not (self.repo / ".git").exists():
raise ValueError(f"not a git repository: {self.repo}")
The walker validates that:
- The
gitexecutable exists on PATH - The target path is a valid git repository (contains
.gitdirectory)
Sources: src/thought/ingest/code/git_walker.py:35-42
GitIngestPipeline
The pipeline orchestrates the complete ingestion process, coordinating between git history traversal and code extraction.
| Parameter | Type | Description |
|---|---|---|
repo_path | Path | Path to git repository |
mode | GitMode | "snapshot" (HEAD only) or "full" (all commits) |
patterns | tuple[str, ...] | Glob patterns to filter files (e.g., *.py) |
#### Ingestion Report
@dataclass(frozen=True)
class GitIngestReport:
head_sha: str # SHA of HEAD at time of ingest
mode: GitMode # Mode used for ingestion
commits_visited: int # Number of commits processed
files_ingested: int # Total files ingested
call_edges: int # Call graph edges created
Sources: src/thought/ingest/code/git_pipeline.py:35-41
Ingestion Modes
Snapshot Mode (Default)
Snapshot mode ingests only the current HEAD commit. This is the recommended mode for:
- Initial repository ingestion
- Quick code analysis workflows
- When historical queries are not needed
Performance characteristics:
- Single-pass through current tree
- No duplicate processing
- Typical runtime: seconds to minutes depending on repository size
Entity stamping: All extracted entities receive the HEAD SHA as their code_commit_sha attribute, enabling queries like "what did auth.middleware look like at HEAD?" or future comparisons.
Sources: src/thought/ingest/code/git_pipeline.py:7-16
Full History Mode
Full mode walks every commit in chronological order, ingesting the file tree at each point. This enables:
- Historical queries: "what did function X look like at commit Y?"
- Diff analysis between any two commits
- Complete temporal reconstruction of code evolution
Performance considerations:
| Repository Size | Estimated Commits | Estimated Time |
|---|---|---|
| Small (<100 files) | ~100 | ~30 seconds |
| Medium (500 files) | ~1000 | ~5 minutes |
| Large (1000+ files) | ~5000+ | ~25+ minutes |
Note: Full-history ingest is bounded by file count × commits. The per-commit cost is dominated by tree-sitter parsing, not git operations.
Sources: src/thought/ingest/code/git_pipeline.py:16-25
CLI Usage
Command Syntax
thought ingest-git <repo_path> [OPTIONS]
#### Options
| Option | Short | Default | Description |
|---|---|---|---|
--mode | --mode snapshot or --mode full | snapshot | Ingestion mode |
--paths | --paths "*.py,*.js" | *.py | Comma-separated glob patterns |
--config | --config path/to/config | thought.toml | Configuration file |
Examples
# Ingest current directory as git repo (HEAD only)
thought ingest-git .
# Ingest specific repository with full history
thought ingest-git /path/to/repo --mode full
# Ingest Python and TypeScript files only
thought ingest-git . --paths "*.py,*.ts,*.tsx"
# Ingest with full git history, multiple file types
thought ingest-git /project --mode full --paths "*.py,*.js,*.go"
Sources: src/thought/cli.py:90-120
Code Commit Stamping
Every extracted code entity receives metadata linking it to its source commit:
eid = self._backend.upsert_entity(
# ... other fields ...
code_file=ent.file_path,
code_language=language,
code_commit_sha=commit_sha, # Links entity to specific commit
)
The database schema includes:
| Column | Type | Purpose |
|---|---|---|
code_file | TEXT | File path relative to repo root |
code_language | TEXT | Programming language detected |
code_commit_sha | TEXT | Git commit where entity was found |
These columns have partial indexes for fast lookups by commit.
Sources: CHANGELOG.md Sources: src/thought/ingest/code/pipeline.py:60-75
CodeLayer Query Interface
The CodeLayer class provides convenience methods for querying the code graph with temporal awareness:
class CodeLayer:
def callers_of(name, *, code_commit_sha=None) # Find who calls this function
def callees_of(name, *, code_commit_sha=None) # Find what this function calls
def impact_set(name) # Transitive callers, ranked
def defines_in_file(path) # Entities in a file
Temporal Queries
All lookups operate against the currently-valid view of the code graph. To query historical snapshots, pass the as_of parameter or filter by code_commit_sha:
# Query current state
impact = code_layer.impact_set("authenticate_user")
# Query historical state (when full-history ingest was used)
impact_historical = code_layer.impact_set(
"authenticate_user",
code_commit_sha="abc123..."
)
Sources: src/thought/layers/code.py:1-50
Diff Between Commits
The system supports computing the difference between any two ingested commits:
thought diff --from <sha1> --to <sha2>
This returns:
- Added entities: Entities present at
--tobut not at--from - Removed entities: Entities present at
--frombut not at--to
The diff operates on the set of entities by name, comparing their commit stamps.
Sources: CHANGELOG.md
Supported Languages
The git ingestion pipeline uses language-specific extractors:
| Language | Extractor | Extensions |
|---|---|---|
| Python | python_extractor.py | .py |
| Rust | rust_extractor.py | .rs |
| TypeScript | typescript_extractor.py | .ts, .tsx |
| PHP | php_extractor.py | .php |
Each extractor uses tree-sitter for AST parsing, extracting:
- Entities: modules, functions, classes, methods
- Edges: IMPORTS, DEFINES, CALLS, INHERITS_FROM, OVERRIDES
Sources: src/thought/ingest/code/python_extractor.py Sources: src/thought/ingest/code/rust_extractor.py Sources: src/thought/ingest/code/typescript_extractor.py Sources: src/thought/ingest/code/php_extractor.py
Configuration
Thought Configuration (thought.toml)
[embedder]
type = "auto" # or "ollama", "openai", "deterministic"
[storage]
path = "thought.db"
Environment Variables
| Variable | Description |
|---|---|
OLLAMA_BASE_URL | Ollama server URL for embeddings |
OPENAI_API_KEY | OpenAI API key for embeddings |
Best Practices
Initial Ingestion
- Start with snapshot mode to verify the setup works
- Run
thought statsto confirm entities were created - Query a function to verify call graph edges exist
Full History Ingestion
- Ensure adequate disk space (full mode creates temporary copies)
- Use
--pathsto filter to relevant file types on large repos - Consider running during off-peak hours for large repositories
Query Optimization
- Use
code_filefilter when querying specific files - Use
code_commit_shafilter for historical lookups - Combine with vector similarity for intent-based queries
Troubleshooting
"git executable not on PATH"
Solution: Install git or ensure it's in your system PATH.
# Verify git is available
git --version
"not a git repository"
Solution: Ensure the path contains a .git directory:
# Initialize if needed
git init
Slow Full-History Ingestion
Mitigation:
- Use
--pathsto filter file types - Use snapshot mode for initial setup
- Consider parallelizing with multiple
--pathspasses
Summary
Git History Integration transforms thought-mcp from a current-state code analysis tool into a full temporal code repository that can answer questions about code at any point in history. By combining git's commit tracking with bi-temporal database queries, users can reconstruct how functions evolved, who called what across commits, and the complete impact chain of changes over time.
The architecture prioritizes:
- No native dependencies: Pure subprocess git operations
- Two-mode flexibility: Fast snapshots or complete history
- Temporal provenance: Every entity stamped with its commit SHA
- Language generality: Support for multiple programming languages via tree-sitter
Sources: CHANGELOG.md
Agent Adapters and SDK Integration
Related topics: Query and Retrieval System
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Query and Retrieval System
Agent Adapters and SDK Integration
Overview
The Agent Adapters and SDK Integration subsystem provides a seamless bridge between THOUGHT's knowledge base and external AI agent frameworks. This system enables any Claude-Agent-SDK-shaped agent to interact with THOUGHT's memory, context retrieval, and code analysis capabilities through a standardized adapter interface.
The integration layer consists of three primary components:
- Claude SDK Adapter (
ThoughtMemoryProvider) — A drop-in memory adapter for Claude Agent SDK - MCP Server Surface — Exposes core primitives via the Model Context Protocol
- Claude Code Hook Installer — Integrates THOUGHT directly into Claude Code's event loop
Sources: CHANGELOG.md
Architecture Overview
graph TD
subgraph "Agent Frameworks"
ClaudeSDK[Claude Agent SDK]
ClaudeCode[Claude Code CLI]
MCPClients[MCP-Compatible Clients]
end
subgraph "THOUGHT Integration Layer"
ClaudeSDKAdapter[ThoughtMemoryProvider]
MCPServer[MCP Server Surface]
HookInstaller[Claude Code Hook Installer]
end
subgraph "Core THOUGHT"
Memory[Memory / Knowledge Base]
Embedder[Embedder Service]
CodeAnalysis[Code Analysis Engine]
Backend[SQLite Backend]
end
ClaudeSDK --> ClaudeSDKAdapter
ClaudeSDKAdapter --> Memory
ClaudeSDKAdapter --> Embedder
ClaudeCode --> HookInstaller
HookInstaller --> Memory
MCPClients --> MCPServer
MCPServer --> Memory
MCPServer --> CodeAnalysis
MCPServer --> Backend
Memory --> Backend
Embedder --> Backend
CodeAnalysis --> BackendThe Claude SDK Adapter
Purpose and Scope
The ThoughtMemoryProvider class serves as a drop-in memory adapter for any Claude-Agent-SDK-shaped agent. It wraps THOUGHT's core memory primitives and exposes them through a familiar interface that agent developers expect.
Sources: src/thought/adapters/claude_sdk.py
Core Methods
The adapter implements three primary methods that cover the complete agent loop:
| Method | Purpose | Returns |
|---|---|---|
context_for(target, role) | Returns a working-context dict for a specific target entity and role | dict with anchor, neighbours, recent_contradictions, role_view |
render_context(target) | Returns the same payload as a plain-text system-prompt augmentation | str formatted for LLM consumption |
record(content) | Persists what the agent learned to the knowledge base | str — source ID of recorded content |
scan(repo_path) | Runs an incremental scan under the agent's name | dict with scan results |
Sources: src/thought/adapters/claude_sdk.py
Working Context Structure
The context_for() method returns a ranked, role-aware payload containing:
{
"anchor": "<entity-name>", # The target entity
"neighbours": [...], # Top-K related entities
"recent_contradictions": [...], # Entities that contradict this one
"role_view": "<saved-view-name>" # Optional named view for the role
}
The context is token-budgeted to prevent overwhelming the agent's context window.
Sources: CHANGELOG.md
Integration Flow
sequenceDiagram
participant Agent as Claude Agent SDK
participant Adapter as ThoughtMemoryProvider
participant Memory as THOUGHT Memory
participant Embedder as Embedder Service
participant Backend as SQLite Backend
Agent->>Adapter: context_for("authenticate", role="code")
Adapter->>Memory: working_context(target, role, budget_tokens)
Memory->>Embedder: embed("authenticate")
Embedder->>Memory: vector embedding
Memory->>Backend: query similar entities
Backend-->>Memory: ranked entity results
Memory-->>Adapter: structured context dict
Adapter-->>Agent: context payload
Agent->>Adapter: record("Learned: auth uses JWT")
Adapter->>Backend: upsert_source(content, mime_type)
Adapter->>Backend: store entity + edges
Backend-->>Adapter: source_id
Adapter-->>Agent: source_idMCP Server Surface
The MCP (Model Context Protocol) server exposes THOUGHT's primitives as tools that any MCP-compatible client can invoke.
Sources: src/thought/server.py
Available Tools
#### working_context
Universal "what does my agent need to know about X right now" primitive.
@app.tool()
async def working_context(
target: str, # "function:authenticate" / "chapter:5" / entity name
role: str = "default", # Contextual role for view filtering
budget_tokens: int = 1024,
scope: str | None = None,
owner_id: str | None = None,
) -> dict
Returns:
{
"anchor": str,
"neighbours": list[dict],
"recent_contradictions": list[dict],
"role_view": str | None
}
Sources: src/thought/server.py:48-63
#### scan
Incremental code-scan primitive for keeping the knowledge base current.
@app.tool()
async def scan(
repo_path: str, # Repository to scan
agent: str | None = None, # Agent name for scan attribution
since: str | None = None, # Only files changed since this time/commit
max_files: int | None = None,
note: str | None = None,
) -> dict
Sources: src/thought/server.py:65-78
#### scan_log_list
Lists recent scan runs for tracking incremental progress.
@app.tool()
async def scan_log_list(
agent: str | None = None,
limit: int = 10,
) -> dict
Returns:
{
"scans": [
{
"scan_id": str,
"agent": str,
"timestamp": str,
"files_processed": int,
"note": str | None
},
...
]
}
Sources: src/thought/server.py:80-91
Client Installation
THOUGHT supports installation into multiple MCP-compatible clients. The installation process merges a thought MCP server entry into the client's configuration file.
Sources: src/thought/clients.py
Supported Clients
| Client | Configuration Path |
|---|---|
| Project | .claude/settings.json |
| User | ~/.claude/settings.json |
Sources: src/thought/clients.py
Installation Function
def install(
client: ClientName,
*,
server_name: str = "thought",
block: dict | None = None,
backup: bool = True,
) -> ClientInstallResult
Parameters:
| Parameter | Type | Default | Description | |
|---|---|---|---|---|
client | ClientName | Required | Target client name | |
server_name | str | "thought" | Name for the server entry | |
block | `dict \ | None` | None | Custom server block; defaults to server_block() |
backup | bool | True | Backup existing config before modification |
Return Type: ClientInstallResult
@dataclass
class ClientInstallResult:
client: ClientName
path: Path | None
status: Literal["installed", "already_present", "error", "no_path"]
detail: str = ""
Sources: src/thought/clients.py
Installation Behavior
The install() function performs the following:
- Read existing config — Parses the client's JSON configuration
- Merge server entry — Adds the
thoughtserver block undermcpServers - Backup — Creates
settings.json.thought.bakbefore any write - Idempotency check — Returns
already_presentif entry exists and matches
Sources: src/thought/clients.py
graph TD
A[install called] --> B{Config exists?}
B -->|No| C[Create new config]
B -->|Yes| D{Valid JSON?}
D -->|No| E[Return error]
D -->|Yes| F{Server entry exists?}
F -->|Yes, matches| G[Return already_present]
F -->|Yes, differs| H[Backup config]
F -->|No| I[Add server entry]
H --> J[Write merged config]
I --> J
C --> J
J --> K[Return installed]Claude Code Hook Integration
The hook installer provides Claude Code event-driven integration, enabling THOUGHT to automatically capture context at key points in the development workflow.
Sources: src/thought/hooks/install.py
Hook Kinds
| Hook Kind | Claude Code Event | Command | Trigger |
|---|---|---|---|
recall | UserPromptSubmit | thought hook recall | After user submits a prompt |
write | Stop | thought hook write | After agent completes work |
context | SessionStart | thought hook context | When session begins |
Sources: src/thought/hooks/install.py:15-22
Hook Installation Result
@dataclass(frozen=True)
class HookInstallResult:
kind: HookKind
path: Path
status: Literal["installed", "already_present", "error"]
detail: str = ""
Settings Path Resolution
def settings_path(*, scope: Literal["project", "user"] = "project") -> Path
- Project scope —
.claude/settings.json(recommended default) - User scope —
~/.claude/settings.json
Sources: src/thought/hooks/install.py:41-50
Demo Integration
The thought demo command includes a built-in walkthrough specifically for the Claude Agent SDK adapter:
- ``code`` Agent / developer flow — the 14-stage code-vertical
walkthrough including agent identity, ``thought scan``,
``working_context``, 4 new-language extractors, and the
Claude Agent SDK adapter.
Sources: src/thought/demo.py
Demo Audiences
| Audience | Purpose | Key Features |
|---|---|---|
code | Agent/developer flow | SDK adapter, scan, working_context |
writer | Novelist/paper author | Bi-temporal model, contradiction detection |
legal | Investigator/paralegal | unique_predicates, CONTRADICTS edges |
researcher | Academic | Claim/source pairs, Cypher queries |
all | Sequential all audiences | Full demonstration suite |
Configuration
Environment Variables
The integration layer respects the following environment variables for embedder configuration:
| Variable | Purpose |
|---|---|
THOUGHT_DB_PATH | Override database path |
THOUGHT_EMBEDDER | Embedder choice (auto, sentence-transformers, etc.) |
THOUGHT_OLLAMA_HOST | Ollama server host |
THOUGHT_OLLAMA_MODEL | Ollama model name |
THOUGHT_LMSTUDIO_URL | LM Studio server URL |
THOUGHT_LMSTUDIO_MODEL | LM Studio model name |
THOUGHT_OPENAI_COMPAT_URL | OpenAI-compatible API URL |
THOUGHT_OPENAI_COMPAT_MODEL | OpenAI-compatible model name |
THOUGHT_OPENAI_COMPAT_API_KEY | API key for OpenAI-compatible endpoints |
Sources: src/thought/config.py
Config File (`thought.toml`)
[embedding]
choice = "auto" # or specific embedder name
[db]
path = ".thought/thought.db"
Dependencies
The adapter package requires the following extras:
[project.optional-dependencies]
adapters = ["httpx>=0.27"]
Sources: CHANGELOG.md
Usage Example
from thought.adapters.claude_sdk import ThoughtMemoryProvider
# Initialize adapter
memory = ThoughtMemoryProvider()
# Get working context for a function
context = memory.context_for(
target="authenticate_user",
role="security-reviewer",
budget_tokens=2048,
)
# Record what the agent learned
source_id = memory.record(
"Session token validation happens in this function. "
"Uses HMAC-SHA256 for signature verification."
)
# Run incremental scan
result = memory.scan(
repo_path="/path/to/project",
agent="security-audit",
note="Weekly security review scan"
)
Summary
The Agent Adapters and SDK Integration system provides three complementary pathways for integrating THOUGHT with external agents:
- Direct SDK Integration —
ThoughtMemoryProviderfor Claude Agent SDK agents - MCP Protocol — Standard tool interface for any MCP-compatible client
- Claude Code Hooks — Event-driven integration for Claude Code CLI users
All pathways share the same underlying memory primitives, ensuring consistent behavior regardless of how the agent connects to THOUGHT.
Sources: CHANGELOG.md
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
Users may get misleading failures or incomplete behavior unless configuration is checked carefully.
The project should not be treated as fully validated until this signal is reviewed.
Users cannot judge support quality until recent activity, releases, and issue response are checked.
Users cannot judge support quality until recent activity, releases, and issue response are checked.
Doramagic Pitfall Log
Doramagic extracted 8 source-linked risk signals. Review them before installing or handing real data to the project.
1. Configuration risk: Configuration risk needs validation
- Severity: medium
- Finding: Configuration risk is backed by a source signal: Configuration risk needs validation. Treat it as a review item until the current version is checked.
- User impact: Users may get misleading failures or incomplete behavior unless configuration is checked carefully.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: capability.host_targets | github_repo:1238261514 | https://github.com/RNBBarrett/thought-mcp | host_targets=mcp_host, claude, claude_code, chatgpt
2. Capability assumption: README/documentation is current enough for a first validation pass.
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: The project should not be treated as fully validated until this signal is reviewed.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: capability.assumptions | github_repo:1238261514 | https://github.com/RNBBarrett/thought-mcp | README/documentation is current enough for a first validation pass.
3. Maintenance risk: v0.2.1 — thought upgrade + mcp-extras fix
- Severity: medium
- Finding: Maintenance risk is backed by a source signal: v0.2.1 — thought upgrade + mcp-extras fix. Treat it as a review item until the current version is checked.
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/RNBBarrett/thought-mcp/releases/tag/v0.2.1
4. Maintenance risk: Maintainer activity is unknown
- Severity: medium
- Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | github_repo:1238261514 | https://github.com/RNBBarrett/thought-mcp | last_activity_observed missing
5. Security or permission risk: no_demo
- Severity: medium
- Finding: no_demo
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: downstream_validation.risk_items | github_repo:1238261514 | https://github.com/RNBBarrett/thought-mcp | no_demo; severity=medium
6. Security or permission risk: no_demo
- Severity: medium
- Finding: no_demo
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: risks.scoring_risks | github_repo:1238261514 | https://github.com/RNBBarrett/thought-mcp | no_demo; severity=medium
7. Maintenance risk: issue_or_pr_quality=unknown
- Severity: low
- Finding: issue_or_pr_quality=unknown。
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | github_repo:1238261514 | https://github.com/RNBBarrett/thought-mcp | issue_or_pr_quality=unknown
8. Maintenance risk: release_recency=unknown
- Severity: low
- Finding: release_recency=unknown。
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | github_repo:1238261514 | https://github.com/RNBBarrett/thought-mcp | release_recency=unknown
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using thought-mcp with real data or production workflows.
- v0.2.2 — MCP stdio transport fix - github / github_release
- v0.2.1 — thought upgrade + mcp-extras fix - github / github_release
- Configuration risk needs validation - GitHub / issue
Source: Project Pack community evidence and pitfall evidence