thought-mcp Manual - Doramagic.ai

Doramagic Project Pack · Human Manual

thought-mcp

| Command | Description | |---------|-------------| | thought init | Create database file + config + CLAUDE.md | | thought recall <query | Semantic recall with embeddings | | thought ask <...

Introduction to THOUGHT

Related topics: Quickstart Guide, Installation and Setup, System Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Storage Layer

Continue reading this section for the full explanation and source context.

Section Query Layer

Continue reading this section for the full explanation and source context.

Section Graph Layer

Continue reading this section for the full explanation and source context.

Introduction to THOUGHT

THOUGHT is a local AI memory tool designed to help developers, researchers, writers, and investigators maintain persistent, queryable knowledge graphs of their work. It combines graph database technology with natural language processing to create a bi-temporal knowledge base that tracks information across time—answering questions like "what was true on date X" and "what did the system know on date X." Sources: README.md

What is THOUGHT?

THOUGHT operates as a self-hosted memory layer that runs entirely on your local machine. Unlike cloud-based AI memory solutions, THOUGHT stores everything in a local SQLite database, giving you full control over your data while still providing powerful querying capabilities through natural language or Cypher graph queries. Sources: src/thought/cli.py:1-50

The core philosophy is to treat memory as a first-class citizen in the development workflow—something that persists across sessions, understands context, and can be queried like a real database rather than a simple key-value store.

Core Architecture

THOUGHT's architecture consists of several interconnected layers that work together to provide a complete memory solution.

graph TD
    A[CLI / MCP Server] --> B[Query Layer]
    B --> C[Graph Layer]
    B --> D[Code Layer]
    C --> E[Storage Backend]
    D --> E
    E --> F[SQLite Database]
    B --> G[LLM Providers]
    G --> H[Ollama / LM Studio / OpenAI]

Storage Layer

The storage layer uses SQLite with a carefully designed schema that supports bi-temporal modeling. Every entity and edge in the knowledge graph has timestamps tracking when facts became valid and when they were learned. Sources: src/thought/storage/sqlite/backend.py:1-100

Component	Purpose
`SQLiteBackend`	Core database operations with upsert, query, and embedding storage
WAL Mode	Write-Ahead Logging for crash recovery and concurrent reads
Migration System	Tracks applied migrations in `applied_migrations` table
Bi-temporal Columns	`valid_from`, `valid_until`, `learned_at`, `unlearned_at`

Query Layer

The query layer provides multiple interfaces for accessing your memory:

Natural Language: Ask questions in plain English, translated to Cypher
Code Queries: Find callers, callees, and impact sets
Recall: Semantic search using embeddings
Cypher Direct: Execute graph queries directly Sources: src/thought/query/ask.py:1-50

Graph Layer

The graph layer provides the core graph operations that power all THOUGHT functionality. It handles entity and edge management with support for scopes (shared/private) and owner-based access control. Sources: src/thought/layers/graph.py

Entity Model

THOUGHT uses a flexible entity model that can represent code elements, prose content, legal documents, and research claims.

classDiagram
    class Entity {
        +str id
        +str type
        +str name
        +str canonical_name
        +ScopeName scope
        +Tier tier
        +float importance
        +datetime valid_from
        +datetime valid_until
        +datetime learned_at
        +dict~str, object~ attrs
    }
    
    class Edge {
        +str id
        +str source_id
        +str target_id
        +str relation_type
    }
    
    Entity "1" --> "*" Edge : source
    Entity "1" --> "*" Edge : target

Sources: src/thought/models.py:50-100

Entity Attributes

Field	Type	Description
`id`	str	Unique identifier
`type`	str	Entity type (function, class, module, claim, etc.)
`name`	str	Human-readable name
`canonical_name`	str	Fully qualified name for disambiguation
`scope`	ScopeName	"shared" or "private"
`owner_id`	str	Owner for private entities
`tier`	Tier	"hot", "warm", or "cold"
`valid_from`	datetime	When this fact became true
`valid_until`	datetime	When this fact stopped being true (null = current)
`learned_at`	datetime	When THOUGHT learned this fact
`attrs`	dict	Additional type-specific metadata

Edge Relations

Edges represent relationships between entities with the following relation types:

Relation Type	Description
`CALLS`	Function/method invocation
`INHERITS_FROM`	Class inheritance
`DEFINES`	Container defines member
`IMPORTS`	Module import statement
`CONTRADICTS`	Logical contradiction between facts
`CITES`	Source citation relationship

Audience Verticals

THOUGHT is designed to serve multiple audiences, each with specialized commands and entity taxonomies optimized for their use case. Sources: src/thought/demo.py:1-80

graph LR
    A[THOUGHT] --> B[Code Developers]
    A --> C[Writers]
    A --> D[Legal Investigators]
    A --> E[Researchers]
    
    B --> B1[thought scan]
    B --> B2[thought impact]
    B --> B3[thought callers]
    
    C --> C1[thought ingest-prose]
    C --> C2[thought timeline]
    C --> C3[contradiction-check]
    
    D --> D1[thought ingest-legal]
    D --> D2[unique_predicates]
    D --> D3[contradiction-graph]
    
    E --> E1[thought ingest-claim]
    E --> E2[citation-analysis]
    E --> E3[reliability-filter]

Code Developers

The code vertical provides tools for understanding, navigating, and analyzing source code:

thought scan: Incremental code scanning with change detection
thought impact <name>: Transitive impact set—what's affected if I change this?
thought callers <name>: Direct callers ranked by Personalized PageRank
thought recall: Semantic search across code by intent Sources: src/thought/layers/code.py:1-50

Writers

The writing vertical supports fiction and academic prose:

Ingest chapter/section facts about characters
Detect contradictions via the bi-temporal model
Query chronological mentions across documents
Time-travel as_of recall for historical consistency

Legal Investigators

The legal vertical is designed for investigation workflows:

thought ingest-legal: Ingest witness statements with unique predicates
thought contradiction-graph: Trigger CONTRADICTS edges between testimonies
Query the contradiction graph for investigation leads

Researchers

The research vertical supports academic workflows:

thought ingest-claim: Ingest claim/source pairs
Cypher queries to find uncited claims
Most-cited source identification
Citation reliability filtering

CLI Commands Overview

Command	Description
`thought init`	Create database file + config + CLAUDE.md
`thought recall <query>`	Semantic recall with embeddings
`thought ask <question>`	Natural language query → Cypher → results
`thought scan <repo>`	Incremental code scan with change detection
`thought callers <name>`	Find direct callers ranked by PageRank
`thought impact <name>`	Transitive impact set
`thought db size`	Disk usage + entity/edge counts
`thought db flush`	Wipe the knowledge base
`thought db backup <file>`	SQLite online-backup snapshot
`thought db load <file>`	Load backup file
`thought hook install`	Install Claude Code hooks
`thought diff --from <sha1> --to <sha2>`	Entity diff between commits

Sources: src/thought/cli.py:50-150

Database Lifecycle Management

THOUGHT provides comprehensive database management commands under thought db:

Backup and Restore

graph LR
    A[Production DB] -->|thought db backup| B[backup.db]
    B -->|thought db load| C[Production DB]
    B -->|thought db inspect| D[Inspection Report]

The backup system uses SQLite's online backup API, ensuring consistent snapshots even during active writes. Date filters can produce clean, self-contained subset files. Sources: src/thought/storage/sqlite/backend.py:100-200

Flush Operations

Flush commands support date-bounded deletion:

--before X: Delete facts valid before date X
--since X: Delete facts learned since date X
--time-axis valid|learned|created: Choose which time axis to filter

All destructive operations automatically back up to <db>.bak.<timestamp> before proceeding.

Git History Integration

THOUGHT can ingest git repositories with two modes:

Mode	Behavior	Use Case
`snapshot` (default)	Ingest HEAD only, stamp with HEAD SHA	Fast code analysis
`full`	Walk every commit, stamp with commit SHA	Bi-temporal historical queries

The GitWalker class shells out to git commands rather than using native libraries, avoiding C extension dependencies while maintaining cross-platform compatibility. Sources: src/thought/ingest/code/git_walker.py:1-50

graph TD
    A[thought ingest-git] --> B{Snapshot Mode?}
    B -->|Yes| C[Ingest HEAD only]
    B -->|No| D[Walk all commits]
    C --> E[Stamp with HEAD SHA]
    D --> F[Stamp each entity with commit SHA]
    E --> G[Enable as_of queries]
    F --> G

Bi-temporal Model

THOUGHT's bi-temporal model tracks two independent timelines for every fact:

Time Axis	Description	Question Answered
Valid Time	When a fact was true in reality	"What was true on date X?"
Learned Time	When THOUGHT learned the fact	"What did the system know on date X?"

This distinction enables sophisticated queries like:

MATCH (e:Entity)
WHERE e.valid_from <= date('2024-01-01')
  AND (e.valid_until IS NULL OR e.valid_until > date('2024-01-01'))
RETURN e

Contradictions surface as CONTRADICTS edges—they're treated as data rather than warnings, allowing you to query them directly. Sources: src/thought/cli.py:1-50

LLM Provider Integration

THOUGHT supports multiple LLM providers for natural language processing:

Provider	Features
Ollama	Native `/api/embed` (batched), OpenAI-compatible fallback
LM Studio	OpenAI-compatible API
Any OpenAI-compatible server	Standard embedding endpoints

The embedder selection defaults to auto, which probes for sentence_transformers and falls back to a deterministic embedder when the optional dependency is unavailable. Sources: src/thought/storage/sqlite/backend.py:200-300

Code Extraction Support

THOUGHT can parse and extract entities from multiple programming languages:

Language	Extractor	Key Features
Python	`python_extractor.py`	AST-based import tracking, class/function detection
TypeScript	`typescript_extractor.py`	Tree-sitter parsing, heritage analysis
Rust	`rust_extractor.py`	Module system, impl block handling
PHP	`php_extractor.py`	Namespace handling, method visibility

All extractors produce consistent CodeEntity and CodeEdge objects that integrate with the unified graph model. Sources: src/thought/ingest/code/python_extractor.py:1-50

Getting Started

Initialization

thought init --db-path .thought/thought.db --embedder auto

This creates:

The SQLite database file
A thought.toml configuration file
A CLAUDE.md file for MCP client integration

Quick Start Commands

# Ingest a git repository
thought ingest-git ./my-project --mode snapshot

# Recall something semantically
thought recall "authentication middleware"

# Ask a natural language question
thought ask "what calls the authenticate_user function?"

# Find impact of changing a function
thought impact MyClass.my_method

Configuration

THOUGHT uses a thought.toml file for configuration:

Section	Option	Default	Description
`database`	`path`	`.thought/thought.db`	Database file path
`llm`	`provider`	`auto`	LLM provider selection
`embedder`	`model`	`auto`	Embedding model
`scopes`	`default`	`shared`	Default scope for new entities

Configuration can be overridden via CLI flags or environment variables.

Sources: src/thought/models.py:50-100

Quickstart Guide

Related topics: Introduction to THOUGHT, Installation and Setup

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Initial Setup

Continue reading this section for the full explanation and source context.

Section Configuration File

Continue reading this section for the full explanation and source context.

Section Ingest Commands

Continue reading this section for the full explanation and source context.

Quickstart Guide

Overview

THOUGHT is a local-AI memory tool designed to manage knowledge bases, run on local models, write graph queries, and query in natural language. It provides a comprehensive CLI for ingesting information, recalling facts, and performing code analysis with graph-based relationships.

Sources: CHANGELOG.md

Architecture Overview

graph TD
    subgraph "THOUGHT Core"
        CLI[CLI Interface]
        DB[(SQLite Database)]
        EMB[Embedder Layer]
        GRAPH[Graph Layer]
    end
    
    subgraph "Ingestion Sources"
        CODE[Code Ingest]
        PROSE[Prose Ingest]
        LEGAL[Legal Ingest]
    end
    
    subgraph "Query Interface"
        RECALL[Recall Command]
        REPL[Interactive REPL]
        MCP[MCP Server]
    end
    
    CLI --> DB
    CLI --> EMB
    EMB --> DB
    CODE --> CLI
    PROSE --> CLI
    LEGAL --> CLI
    RECALL --> GRAPH
    REPL --> GRAPH
    MCP --> GRAPH
    GRAPH --> DB

Sources: src/thought/cli.py

Installation and Initialization

Initial Setup

Run the init command to create the database, configuration file, and CLAUDE.md helper:

thought init

The init command accepts several options:

Option	Default	Description
`--config`	`thought.toml`	Path to configuration file
`--db-path`	`.thought/thought.db`	SQLite database path
`--embedder`	`auto`	Embedder type: `auto`, `sentence-transformers`, or `deterministic`
`--write-claude-md`	`true`	Drop a CLAUDE.md for MCP clients
`--quick`	`false`	Skip first-run embedder warmup

Sources: src/thought/cli.py:57-78

Configuration File

The init command creates a thought.toml configuration file with the following structure:

[database]
path = ".thought/thought.db"

[embedder]
type = "auto"  # or "ollama", "lm_studio", "openai_compatible"

[llm]
provider = "auto"

Core Commands

Ingest Commands

THOUGHT supports multiple ingestion modes:

Command	Purpose
`thought ingest TEXT`	One-shot remember from command line
`thought ingest --file PATH`	Ingest a single file
`thought ingest --glob PAT`	Bulk-ingest matching files
`thought ingest --stdin`	Bulk-ingest one line-per-item from stdin

Sources: src/thought/cli.py:30-42

Code Ingestion

The code ingest pipeline extracts entities and relationships from source files:

thought ingest --file src/main.py
thought ingest --glob "**/*.py"

The code extractor produces:

Entities: modules, functions, classes, methods
Edges: IMPORTS, INHERITS_FROM, DEFINES, OVERRIDES

Sources: src/thought/ingest/code/pipeline.py

Git-Aware Ingest

For bi-temporal code analysis:

thought ingest-git <repo> --mode snapshot  # Fast: HEAD only
thought ingest-git <repo> --mode full      # Walk every commit

This enables as_of queries against historical commits.

Sources: CHANGELOG.md

Recall and Query

thought recall "what did I learn about authentication?"
thought repl

The recall command returns up to 10 results with ranked relevance. Use as_of and scope to narrow results further.

Sources: src/thought/cli.py

Database Management

Command	Description
`thought db size`	Disk usage + entity/edge counts
`thought db flush`	Wipe the KB (with backup)
`thought db backup <file>`	SQLite backup snapshot
`thought db load <file>`	Load a backup file
`thought db inspect <file>`	Inspect backup without loading

Sources: CHANGELOG.md

Code Analysis Commands

Callers and Impact Analysis

# Find who calls a function (ranked by PageRank)
thought callers authenticate_user

# Transitive impact: what's affected if I change this?
thought impact JWTValidator

Sources: src/thought/layers/code.py

Diff Between Commits

thought diff --from abc1234 --to def5678

This shows entities added/removed between two ingested commits.

Built-in Demos

Run audience-specific walkthroughs:

thought demo code        # Agent/developer flow (14-stage walkthrough)
thought demo writer       # Novelist/paper author
thought demo legal        # Investigator/paralegal
thought demo researcher   # Academic use case
thought demo all          # Run all demos sequentially

Each demo runs end-to-end in a self-cleaning temporary directory and produces a structured DemoReport.

Sources: src/thought/demo.py

Entity Data Model

@dataclass(frozen=True)
class CodeEntity:
    name: str           # Qualified name (e.g., "ClassName.method_name")
    type_: CodeEntityType  # "module" | "function" | "class" | "method" | "file"
    language: str       # Programming language
    file_path: str      # POSIX-style relative path
    line_start: int     # Starting line number
    line_end: int       # Ending line number
    signature: str      # Function/class signature
    docstring: str | None
    visibility: Literal["public", "private"]

Sources: src/thought/ingest/code/types.py

Supported Languages

The code ingestion pipeline supports:

Language	Extractor	File Extension
Python	`python_extractor.py`	`.py`
TypeScript	`typescript_extractor.py`	`.ts`, `.tsx`
PHP	`php_extractor.py`	`.php`
Rust	`rust_extractor.py`	`.rs`

MCP Server

Start the MCP server for integration with Claude Code:

thought serve                          # stdio transport (default)
thought serve --transport streamable-http  # HTTP transport

Sources: src/thought/cli.py

Utility Commands

Command	Description
`thought stats`	Display knowledge base statistics
`thought forget PATTERN`	Soft-delete entities matching SQL LIKE pattern
`thought consolidate`	Run one consolidation cycle
`thought doctor`	Environment health check

Bi-Temporal Model

THOUGHT uses a bi-temporal model for knowledge tracking:

valid_from / valid_until: When facts were true in reality
learned_at / unlearned_at: When the system learned/corrected facts

Query variants:

as_of_kind='valid' — "what was true on date X"
as_of_kind='learned' — "what did the system know on date X"

Sources: src/thought/models.py

Sources: CHANGELOG.md

Installation and Setup

Related topics: Quickstart Guide, System Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Standard Installation

Continue reading this section for the full explanation and source context.

Section Development Installation

Continue reading this section for the full explanation and source context.

Section Init Command Signature

Continue reading this section for the full explanation and source context.

Related topics: Quickstart Guide, System Architecture

Installation and Setup

Overview

The thought-mcp project provides a comprehensive CLI tool and MCP (Model Context Protocol) server for AI-powered memory and knowledge management. The installation and setup process involves initializing the local SQLite database, configuring MCP clients (Claude Code, Cursor, etc.), and optionally setting up Claude Code hooks for automated memory operations.

The setup system is designed with idempotency in mind — installations can be safely re-run without disrupting existing configurations.

System Architecture

graph TD
    A[User] --> B[thought CLI]
    B --> C[init command]
    C --> D[SQLite Database]
    C --> E[thought.toml Config]
    C --> F[CLAUDE.md Agent Hint]
    B --> G[MCP Server]
    G --> D
    B --> H[Client Install]
    H --> I[Claude Code]
    H --> J[Cursor]
    H --> K[VS Code]
    B --> L[Hook Install]
    L --> M[.claude/settings.json]

Prerequisites

Component	Requirement	Notes
Python	>= 3.10	Core runtime
Git	On PATH	Used by git pipeline for code ingestion
SQLite	3.x	Bundled with Python stdlib
pip/pipx	Latest	Package installation

Sources: CONTRIBUTING.md

Installation Methods

Standard Installation

pip install thought-mcp

Development Installation

git clone https://github.com/RNBBarrett/thought-mcp.git
cd thought-mcp
pip install -e ".[dev]"

CLI Initialization

The thought init command establishes the complete working environment. It creates three essential components in sequence.

Init Command Signature

@app.command()
def init(
    config: Path = typer.Option("thought.toml", help="Path to config file."),
    db_path: str = typer.Option(".thought/thought.db", help="SQLite database path."),
    embedder: str = typer.Option(
        "auto", help="'auto' picks sentence-transformers if available, else deterministic.",
    ),
    write_claude_md: bool = typer.Option(
        True, "--write-claude-md/--no-claude-md",
        help="Drop a CLAUDE.md so MCP clients learn how to use the tool.",
    ),
    quick: bool = typer.Option(
        False, "--quick", help="Skip first-run embedder warmup.",
    ),
) -> None:

Sources: src/thought/cli.py:35-56

What Init Creates

graph LR
    A[thought init] --> B[Create .thought/ directory]
    A --> C[Create SQLite DB file]
    A --> D[Write thought.toml config]
    A --> E[Write CLAUDE.md]
    
    B --> F[parents=True<br/>exist_ok=True]
    C --> G[DB auto-backed up<br/>before destructive ops]

#### 1. Database Initialization

The command creates the SQLite database at the specified path. Parent directories are created automatically using parents=True to ensure the path exists.

Path(db_path).parent.mkdir(parents=True, exist_ok=True)

Sources: src/thought/cli.py:52-53

#### 2. Configuration File

The thought.toml file contains runtime configuration including embedder settings and database paths.

#### 3. CLAUDE.md Agent Hint

When write_claude_md=True (default), the init command drops a CLAUDE.md file that teaches MCP clients how to interact with the tool.

Embedder Configuration

The init command supports three embedder modes:

Mode	Behavior	Dependencies
`auto` (default)	Uses `sentence-transformers` if available, falls back to deterministic embeddings	Optional: sentence-transformers
`sentence-transformers`	Uses local transformer models for embeddings	Required: sentence-transformers
`deterministic`	Uses hash-based embeddings, no ML dependencies	None

The --quick flag skips the first-run embedder warmup process.

MCP Client Installation

The thought clients install command merges a thought MCP server entry into your client's configuration file.

Supported Clients

Client	Config Location
Claude Code	`.claude/settings.json`
Cursor	`~/.cursor/settings.json`
VS Code	`~/.cursor/settings.json`

Installation Workflow

graph TD
    A[thought clients install] --> B{Check config exists?}
    B -->|No| C[Create new config file]
    B -->|Yes| D[Read existing JSON]
    C --> E{Valid JSON object?}
    D --> E
    E -->|Yes| F[Merge mcpServers entry]
    E -->|No| G[Return error]
    F --> H{Backup enabled?}
    H -->|Yes| I[Create .thought.bak backup]
    H -->|No| J[Write merged config]
    I --> J
    J --> K[Return ClientInstallResult]

Client Install Result States

@dataclass(frozen=True)
class ClientInstallResult:
    client: ClientName
    path: Path
    status: Literal["installed", "already_present", "no_path", "error"]
    detail: str = ""

Sources: src/thought/clients.py

Server Block Structure

The MCP server configuration block includes:

Server name (thought)
Command to execute
Server arguments
Environment variables for database path

Claude Code Hook Installation

The thought hooks install command adds hook entries to Claude Code's settings for automated memory operations.

Hook Types

Hook Kind	Claude Code Event	Command
`recall`	UserPromptSubmit	`thought hook recall`
`write`	Stop	`thought hook write`
`context`	SessionStart	`thought hook context`

Sources: src/thought/hooks/install.py:17-22

Hook Installation Options

def settings_path(*, scope: Literal["project", "user"] = "project") -> Path:
    """Return the ``.claude/settings.json`` path for the requested scope.

    Project scope is the recommended default — it travels with the repo and
    is what most users actually want for THOUGHT-flavoured auto-memory.
    """
    if scope == "project":
        return Path.cwd() / ".claude" / "settings.json"

Hook Install Process

graph TD
    A[thought hooks install recall] --> B{Backup enabled?}
    B -->|Yes| C[Create settings.json.thought.bak]
    B -->|No| D[Read settings.json]
    C --> D
    D --> E{Valid JSON?}
    E -->|Yes| F[Merge recall hook entry]
    E -->|No| G[Return error]
    F --> H{Entry exists?}
    H -->|Yes| I[Return already_present]
    H -->|No| J[Write updated settings.json]
    J --> K[Return HookInstallResult]

Hook Install Result

@dataclass(frozen=True)
class HookInstallResult:
    kind: HookKind
    path: Path
    status: Literal["installed", "already_present", "error"]
    detail: str = ""

Sources: src/thought/hooks/install.py:28-32

Quick Start Guide

Step 1: Initialize the Environment

# Standard initialization
thought init

# Skip embedder warmup for faster startup
thought init --quick

# Custom database location
thought init --db-path /path/to/custom.db

Step 2: Install MCP Client

# Install for Claude Code
thought clients install claude_code

# Install for Cursor
thought clients install cursor

Step 3: Install Claude Code Hooks (Optional)

# Install recall hook (automatic memory on user input)
thought hooks install recall

# Install write hook (save memory on session stop)
thought hooks install write

# Install context hook (load memory on session start)
thought hooks install context

# Install all hooks
thought hooks install recall --kind write --kind context

Database Lifecycle Management

Database Size Check

thought db size

Shows disk usage of main + WAL + SHM sidecars plus entity/edge counts.

Database Backup

thought db backup <file>

Creates an SQLite online-backup snapshot. Date filters produce a clean, self-contained subset file with DELETE + VACUUM after backup.

Database Restore

thought db load <file>

Atomically replaces the active database with the backup file. Use --merge to INSERT-OR-IGNORE rows from the snapshot instead of replacing.

Database Flush

# Full flush with confirmation
thought db flush

# Skip confirmation
thought db flush --yes

# Date-bounded flush
thought db flush --before 2024-01-01
thought db flush --since 2024-06-01

Note: All destructive operations auto-backup to <db>.bak.<timestamp> before proceeding.

Verifying Installation

Run the Demo

# Run code audience demo
thought demo code

# Run all demos
thought demo all

The demo runs an audience-specific walkthrough end-to-end in a self-cleaning temporary directory, verifying the installation works correctly.

Health Check

thought doctor

Performs an environment health check to verify all dependencies and configurations are correct.

Configuration File Format

thought.toml

[database]
path = ".thought/thought.db"

[embedder]
type = "auto"  # or "sentence-transformers", "deterministic"

[server]
name = "thought"
transport = "stdio"  # or "streamable-http"

Troubleshooting

Common Issues

Issue	Solution
Config file not found	Run `thought init` first
Database locked	Check for other `thought` processes
Embedder initialization slow	Use `--quick` flag or `deterministic` embedder
MCP client not connecting	Verify client config has correct server entry

Reset Installation

# Backup current database
thought db backup /path/to/backup.db

# Flush and reinitialize
thought db flush --yes
thought init --db-path .thought/thought.db

Next Steps

After installation and setup, users typically:

Ingest code: thought ingest-git <repo> to analyze repository code
Recall information: thought recall <query> to query the knowledge base
Run agents: Use reference agents like the vulnerability scanner or OSINT aggregator

Sources: CONTRIBUTING.md

System Architecture

Related topics: Introduction to THOUGHT, Storage and Database Layer, Memory Model and Data Structures

Section Related Pages

Continue reading this section for the full explanation and source context.

Section MCP Server (src/thought/server.py)

Continue reading this section for the full explanation and source context.

Section Query Router and Classifier

Continue reading this section for the full explanation and source context.

Section Code Layer (src/thought/layers/code.py)

Continue reading this section for the full explanation and source context.

System Architecture

Overview

The thought-mcp project is a Model Context Protocol (MCP) server implementation that provides an intelligent memory and code analysis system for AI-assisted development. The system combines semantic memory storage with code graph analysis, enabling natural language queries against codebases through a bi-temporal knowledge graph.

High-Level Architecture

graph TD
    subgraph "Client Layer"
        MCP[MCP Client]
        CLI[Thought CLI]
        Hooks[Claude Code Hooks]
    end
    
    subgraph "Server Layer"
        Server[MCP Server]
        Router[Query Router]
        Classifier[Query Classifier]
    end
    
    subgraph "Memory Layer"
        Memory[Memory Manager]
        Recall[Recall Engine]
        Ask[Ask - NL to Cypher]
    end
    
    subgraph "Storage Layer"
        Backend[SQLite Backend]
        Entities[Entity Store]
        Edges[Edge Store]
        Embeddings[Vector Embeddings]
    end
    
    subgraph "Ingest Layer"
        CodePipeline[Code Pipeline]
        GitPipeline[Git Pipeline]
        Extractors[Language Extractors]
    end
    
    MCP --> Server
    CLI --> Server
    Hooks --> Server
    Server --> Router
    Router --> Classifier
    Classifier --> Memory
    Memory --> Backend
    CodePipeline --> Backend
    GitPipeline --> Backend
    Ask --> Recall

Core Components

MCP Server (`src/thought/server.py`)

The MCP server exposes the primary tool interface for AI clients. It implements async tool handlers that delegate to the memory layer.

Key Tools:

Tool	Purpose
`recall`	Semantic recall of entities using embeddings
`ask`	Natural language queries translated to Cypher
`working_context`	Context primitive for agent awareness
`scan`	Incremental code scanning with change detection

Sources: src/thought/server.py:1-100

Query Router and Classifier

The system routes queries through a classification system that detects:

CODE queries: Triggered by code-shaped keywords (function, class, caller, callee, file extensions) plus camelCase/snake_case identifiers
CHANGE queries: Historical or diff-based queries
HYBRID combinations: CODE × CHANGE patterns like "what changed in auth.middleware since v1.0"

graph LR
    Q[Query] --> C[Classifier]
    C --> |CODE| CR[Code Route]
    C --> |CHANGE| CH[Change Route]
    C --> |HYBRID| HY[Hybrid Route]
    C --> |DEFAULT| DF[Default Recall]

Sources: CHANGELOG.md:1-80

Code Layer (`src/thought/layers/code.py`)

The code layer provides a high-level API for code-specific graph queries against the currently-valid view of the code graph.

class CodeLayer:
    def callers_of(name)    # Who calls this function
    def callees_of(name)    # What this function calls
    def impact_set(name)    # Transitive callers, ranked
    def defines_in_file()   # Entities in a given file

Sources: src/thought/layers/code.py:1-60

Storage Architecture

SQLite Backend

The system uses SQLite as its primary storage with the following schema features:

Bi-temporal model: Tracks valid_from/valid_until (business time) and learned_at (system knowledge time)
Entity/Edge tables with code-specific columns (code_file, code_language, code_commit_sha)
Partial indexes for efficient queries
WAL mode with checkpointing for consistent backups

Data Models

Entity Structure:

@dataclass
class CodeEntity:
    name: str
    type_: str           # function, class, module, method
    language: str        # python, typescript, rust, php
    file_path: str
    line_start: int
    line_end: int
    signature: str
    docstring: str
    visibility: str      # public, private, protected
    attrs: dict

Edge Types:

CALLS - Function/method invocations
INHERITS_FROM - Class inheritance
IMPORTS - Module imports
DEFINES - Member definitions within classes
OVERRIDES - Method overrides (TypeScript)

Sources: src/thought/ingest/code/pipeline.py:1-100

Code Ingestion Pipeline

Language Extractors

The system uses tree-sitter parsers for multi-language code extraction:

Language	File	Capabilities
Python	`python_extractor.py`	Functions, classes, imports, inheritance
TypeScript	`typescript_extractor.py`	Functions, classes, imports, exports, inheritance, overrides
Rust	`rust_extractor.py`	Functions, impl blocks, traits
PHP	`php_extractor.py`	Functions, classes, methods, namespaces

All extractors output CodeEntity and CodeEdge tuples parsed from AST nodes.

Sources: src/thought/ingest/code/python_extractor.py:1-80

Code Pipeline Flow

graph TD
    F[File Input] --> LD[Language Detection]
    LD --> EX[Extract Entities/Edges]
    EX --> SI[Upsert Source]
    SI --> WE[_write_entities]
    WE --> EE[Embed Signatures]
    EE --> WEd[_write_edges]
    WEd --> CM[Commit Transaction]
    
    subgraph "Entities Processing"
        WE --> |"name_to_id map"| WEd
    end

Sources: src/thought/ingest/code/pipeline.py:100-200

Git Pipeline (`src/thought/ingest/code/git_pipeline.py`)

The git pipeline enables historical code analysis with two modes:

Mode	Behavior
`snapshot`	Fast - ingest HEAD only, stamp entities with HEAD SHA
`full`	Walk every commit chronologically, stamp each entity with its commit SHA

The full mode enables bi-temporal as_of queries against historical commits.

Sources: src/thought/ingest/code/git_pipeline.py:1-50

Query System

Recall Engine

Semantic recall uses vector embeddings to find entities by intent rather than exact name:

def recall(
    query: str,
    scope: str = "all",
    owner_id: str | None = None,
    limit: int = 10,
) -> list[RecallHit]

The system embeds entity signatures and docstrings during ingestion, enabling natural queries like "who calls authenticate_user".

Ask Engine (`src/thought/query/ask.py`)

Natural language to Cypher translation with validation:

graph LR
    NL[Natural Language] --> PROMPT[Build Prompt]
    PROMPT --> LLM[LLM Provider]
    LLM --> CY[Cypher Query]
    CY --> VAL[Validate]
    VAL --> |Valid| EXE[Execute]
    VAL --> |Invalid| FB[Fallback to Recall]

Constraint System:

Read-only Cypher features only (MATCH, WHERE, RETURN)
Validates against actual schema before execution
Falls back to recall() on translation failures

Sources: src/thought/query/ask.py:1-80

Integration Points

MCP Client Installation (`src/thought/clients.py`)

The system installs as an MCP server for AI coding tools:

def install(client: ClientName, *, server_name: str = "thought")

Supported clients include Claude Code and other MCP-compatible tools. Installation merges configuration without disturbing existing settings.

Claude Code Hooks (`src/thought/hooks/install.py`)

Hooks provide automatic memory integration:

Hook	Event	Action
`recall`	UserPromptSubmit	Memory recall on user input
`write`	Stop	Context capture on completion
`context`	SessionStart	Session initialization

Sources: src/thought/hooks/install.py:1-50

CLI Architecture (`src/thought/cli.py`)

The command-line interface provides database lifecycle management:

Command	Function
`thought init`	Create database + config + CLAUDE.md
`thought db size`	Disk usage + entity/edge counts
`thought db flush`	Wipe KB with backup
`thought db backup`	SQLite online-backup snapshot
`thought db load`	Load snapshot atomically
`thought db inspect`	Count + schema summary
`thought ingest-git`	Git-history-aware ingestion
`thought callers`	Direct callers via PageRank
`thought impact`	Transitive impact set
`thought diff`	Entity diff between commits

Sources: src/thought/cli.py:1-100

Demo System (`src/thought/demo.py`)

The built-in demo provides audience-specific walkthroughs:

Audience	Purpose
`code`	Agent/developer flow - 14-stage code vertical
`writer`	Novelist/paper author - bi-temporal recall
`legal`	Investigator - contradiction detection
`researcher`	Academic - claim/source relationships

Sources: src/thought/demo.py:1-50

Configuration

Database Initialization

# thought.toml
[database]
path = ".thought/thought.db"

[llm]
provider = "anthropic"  # or ollama, lmstudio, openai-compat

[embedder]
type = "auto"  # sentence-transformers if available, else deterministic

Sources: src/thought/cli.py:50-80

Summary

The thought-mcp architecture combines:

MCP Server - Tool interface for AI clients
Bi-temporal Storage - SQLite with code-specific schema
Multi-language Extractors - Tree-sitter based AST parsing
Git Integration - Historical code analysis
Query Routing - Classification-based query dispatch
Natural Language Interface - NL to Cypher translation

This design enables both real-time code assistance and deep historical analysis of codebases through a unified query interface.

Sources: src/thought/server.py:1-100

Storage and Database Layer

Related topics: System Architecture, Memory Model and Data Structures

Section Related Pages

Continue reading this section for the full explanation and source context.

Storage and Database Layer

Overview

The Storage and Database Layer is the persistence backbone of the THOUGHT system, providing a structured SQLite-based knowledge base (KB) for storing entities, edges, embeddings, and operational metadata. This layer abstracts database operations through a modular backend interface, enabling CRUD operations, bi-temporal data tracking, and specialized queries for code analysis.

The architecture supports:

Entity/Edge persistence with bi-temporal validity tracking (valid_from, valid_until, learned_at)
Vector embeddings for semantic recall operations
Source tracking for ingested content provenance
Code-specific metadata including language, file path, and commit SHA
Agent and scan logging for operational auditability

Sources: src/thought/storage/__init__.py

Memory Model and Data Structures

Related topics: System Architecture, Storage and Database Layer, Query and Retrieval System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Vector Layer (src/thought/layers/vector.py)

Continue reading this section for the full explanation and source context.

Section Graph Layer (src/thought/layers/graph.py)

Continue reading this section for the full explanation and source context.

Section Temporal Layer (src/thought/layers/temporal.py)

Continue reading this section for the full explanation and source context.

Memory Model and Data Structures

Overview

The thought-mcp repository implements a multi-layered memory architecture designed for AI-assisted knowledge management. The memory model combines vector embeddings for semantic search, graph relationships for structural querying, and temporal versioning for historical analysis. This hybrid approach enables both intuitive natural-language recall and precise code-intent queries.

The core memory system operates as a knowledge base (KB) with bi-temporal semantics, tracking when facts became true (valid_from) versus when the system learned them (learned_at). This design supports time-travel queries that answer "what was true on date X" or "what did the system know on date X".

Architecture Layers

The memory system is organized into three distinct but interconnected layers:

graph TD
    A[User Input] --> B[Memory Layer]
    B --> C[Vector Layer]
    B --> D[Graph Layer]
    B --> E[Temporal Layer]
    C --> F[SQLite Backend]
    D --> F
    E --> F
    G[Query/Recall] --> B

Vector Layer (`src/thought/layers/vector.py`)

The vector layer handles semantic embedding and similarity search. It stores dense vector representations of entities enabling natural-language recall based on meaning rather than exact keyword matching.

Core Responsibilities:

Embed text content (entity names, signatures, docstrings) into high-dimensional vectors
Store embeddings with model metadata (name, version, dimensions)
Perform similarity searches against the embedded corpus
Support fallback to deterministic embeddings when ML models are unavailable

Key Components:

Component	Purpose
`VectorStore`	Persists embeddings in SQLite with metadata
`Embedder`	Base protocol for embedding models
`OllamaEmbedder`	Integration with Ollama's `/api/embed` endpoint
`DeterministicEmbedder`	Fallback using hash-based vectors

Sources: src/thought/layers/vector.py

Graph Layer (`src/thought/layers/graph.py`)

The graph layer manages entity-relationship data structures and supports Cypher-style traversals. It maintains the structural knowledge of how entities connect to each other.

Entity Types Supported:

Type	Description
`module`	Source file or namespace unit
`class`	Class or type declarations
`function`	Function definitions
`method`	Class methods
`fact`	General knowledge facts
`claim`	Academic/research claims
`source`	Citation or reference
`witness`	Legal testimony statements

Edge Relation Types:

Relation	Meaning
`IMPORTS`	Module dependency relationship
`INHERITS_FROM`	Class inheritance
`DEFINES`	Container defines a member
`OVERRIDES`	Method overrides parent
`CALLS`	Function invocation
`REFERS_TO`	General reference
`CONTRADICTS`	Logical opposition between facts

Sources: src/thought/layers/graph.py

Temporal Layer (`src/thought/layers/temporal.py`)

The temporal layer implements bi-temporal data modeling, tracking both valid time and learned time for all entities. This enables sophisticated time-travel queries and contradiction detection.

Bi-Temporal Model:

graph LR
    A[Entity] --> B[valid_from<br/>When fact became true]
    A --> C[learned_at<br/>When KB learned fact]
    D[as_of valid] --> E[Historical state query]
    D --> F[as_of learned<br/>System knowledge query]

Key Temporal Features:

valid_from: Timestamp when the fact became true in reality
learned_at: Timestamp when the system recorded the fact
valid_until: Optional expiration of fact validity
CONTRADICTS edges: Automatically surface when facts conflict across time axes

Sources: src/thought/layers/temporal.py

Core Data Models

Entity Model (`src/thought/models.py`)

The base Entity model represents all stored knowledge items in the system.

class Entity:
    id: str                    # Unique identifier
    name: str                  # Canonical name
    type: str                  # Entity type (see table above)
    scope: str                  # "shared" or "private"
    owner_id: str | None        # Owner for private entities
    valid_from: datetime        # When fact became true
    learned_at: datetime        # When system learned it
    source_ref: str | None      # Reference to source document
    tier: str                   # "hot", "warm", "cold" (access frequency)
    attrs: dict                 # Type-specific attributes

Entity Attributes by Type:

Entity Type	Key Attributes
`code_*`	`code_file`, `code_language`, `code_commit_sha`, `signature`, `visibility`, `line_start`, `line_end`
`fact`	`predicates`, `unique_predicates`, `source_doc`
`claim`	`citation_key`, `reliability_score`

Sources: src/thought/models.py

Code Entities (`src/thought/ingest/entities.py`)

Code-specific entities extend the base model with language-aware attributes:

class CodeEntity:
    name: str
    type_: str                  # "module", "class", "function", "method"
    language: str               # "python", "typescript", "rust", "php"
    file_path: str
    line_start: int
    line_end: int
    signature: str              # Function/class signature
    visibility: str             # "public", "private", "protected"
    docstring: str | None
    attrs: dict                 # Language-specific (e.g., `class` for methods)

Edge Model (`src/thought/ingest/entities.py`)

Relationships between entities are modeled as typed, directed edges:

class CodeEdge:
    source_name: str             # Origin entity
    target_name: str             # Destination entity
    relation_type: str          # IMPORTS, DEFINES, INHERITS_FROM, etc.
    line_number: int | None
    attrs: dict

Sources: src/thought/ingest/entities.py

Consolidation Engine (`src/thought/consolidation/engine.py`)

The consolidation engine handles fact deduplication, merging, and contradiction detection. It processes incoming data through a pipeline that ensures data quality and consistency.

graph TD
    A[Raw Input] --> B[Jaccard Deduplication]
    B --> C[Fact Extraction]
    C --> D[Predicate Matching]
    D --> E{Conflict?}
    E -->|Yes| F[Create CONTRADICTS Edge]
    E -->|No| G[Merge into KB]
    F --> G

Consolidation Pipeline Steps:

Jaccard Deduplication: Skip content with >50% overlap to existing facts
Fact Extraction: Parse structured predicates from unstructured text
Predicate Matching: Match against existing knowledge using unique predicates
Contradiction Detection: Create CONTRADICTS edges when facts conflict
Entity Merging: Upsert with identity (name, code_file, code_commit_sha)

Sources: src/thought/consolidation/engine.py

Storage Backend

The system uses SQLite as its primary storage engine with the following schema:

graph TD
    A[SQLite Database] --> B[entities table]
    A --> C[edges table]
    A --> D[embeddings table]
    A --> E[applied_migrations table]
    B --> F[code_file<br/>code_language<br/>code_commit_sha]
    C --> G[relation_type<br/>source_name<br/>target_name]
    D --> H[model_name<br/>model_version<br/>vector BLOB]

Key Backend Classes:

Class	Responsibility
`Backend`	Core CRUD operations on entities/edges
`find_code_entity()`	Fast lookup by name + file/commit disambiguators
`upsert_entity()`	Insert or update with identity awareness
`store_embedding()`	Persist vectors with model metadata

Sources: src/thought/storage/sqlite/backend.py (inferred from CHANGELOG.md)

Query Pathways

The memory system supports multiple query mechanisms:

Recall (Vector Search)

def recall(
    query: str,
    scope: str = "all",
    owner_id: str | None = None,
    max_results: int = 10,
) -> list[RecallResult]

Returns up to 10 semantically similar entities based on embedding similarity.

Ask (Natural Language to Cypher)

Routes natural-language questions through an LLM to generate Cypher queries:

QUESTION: "who calls authenticate_user"
→ CYPHER: MATCH (caller)-[:CALLS]->(f:Function {name: 'authenticate_user'}) 
          RETURN caller.name

Sources: src/thought/query/ask.py

Code Intelligence Queries

Command	Purpose
`thought callers <name>`	Direct callers via Personalized PageRank
`thought impact <name>`	Transitive impact set (what breaks if changed)
`thought diff --from SHA1 --to SHA2`	Entity diff between commits

Ingest Pipelines

Code ingestion follows a standardized pipeline:

graph TD
    A[Source File] --> B[Language Detection]
    B --> C[AST Parser<br/>tree-sitter]
    C --> D[Extractor<br/>Language-specific]
    D --> E[CodeEntity list]
    D --> F[CodeEdge list]
    E --> G[Embedding]
    G --> H[Backend upsert]
    F --> H
    H --> I[Call Graph Builder<br/>optional]

Supported Languages:

Python (.py) - via tree-sitter-python
TypeScript (.ts, .tsx) - via tree-sitter-typescript
Rust (.rs) - via tree-sitter-rust
PHP (.php) - via tree-sitter-php

Extracted Metadata:

Module/namespace names
Class declarations with heritage (extends, implements)
Function and method definitions
Import/use declarations
Visibility modifiers (public, private, protected)

Sources: src/thought/ingest/code/python_extractor.py, src/thought/ingest/code/typescript_extractor.py

Auto-Memory Hooks

The system integrates with Claude Code via hooks for automatic memory management:

Hook	Event	Action
`recall`	`UserPromptSubmit`	Embeds prompt, recalls relevant context
`write`	`Stop`	Extracts facts from session transcript
`context`	`SessionStart`	Loads relevant context for new session

Sources: src/thought/hooks/install.py

Versioning and Snapshots

The storage layer supports full database lifecycle management:

Operation	Description
`db size`	Disk usage + entity/edge counts
`db flush`	Wipe KB with date-bounded options
`db backup <file>`	SQLite online backup snapshot
`db load <file>`	Restore or merge from snapshot
`db inspect <file>`	Preview backup without loading

WAL (Write-Ahead Logging) checkpoints ensure consistent backups.

Summary

The thought-mcp memory model implements a production-grade knowledge management system with:

Three-layer architecture: Vector for semantics, Graph for structure, Temporal for history
Bi-temporal semantics: Tracks both validity and knowledge acquisition times
Code-aware extraction: AST-based parsing for multiple programming languages
Contradiction detection: Automatic CONTRADICTS edges between conflicting facts
Multiple query pathways: Semantic recall, natural-language Cypher, and code-intelligence commands
Git-aware versioning: Commits can be stamped on entities for historical queries

This architecture enables sophisticated AI memory capabilities while maintaining query performance through strategic use of SQLite with proper indexing.

Sources: src/thought/layers/vector.py

Query and Retrieval System

Related topics: Memory Model and Data Structures, Storage and Database Layer, Agent Adapters and SDK Integration

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Query Types

Continue reading this section for the full explanation and source context.

Section CODE Query Detection

Continue reading this section for the full explanation and source context.

Section Translation Process

Continue reading this section for the full explanation and source context.

Query and Retrieval System

The Query and Retrieval System is a core subsystem within the thought-mcp project that enables users to query the knowledge graph using natural language. It translates human-readable questions into structured Cypher queries or SQL statements, executes them against the underlying SQLite backend, and returns ranked, relevant results. The system serves as the primary interface for retrieving facts, code entities, relationships, and historical data stored in the memory database.

Architecture Overview

The Query and Retrieval System is composed of several interconnected modules that work together to process, route, and execute queries. At its core, the system leverages a Router to classify incoming queries into semantic categories, then delegates processing to specialized handlers based on the query type.

graph TD
    A[User Query] --> B[Router]
    B --> C{Code Query?}
    B --> D{Natural Language?}
    B --> E{Search Query?}
    C --> F[Code Layer]
    D --> G[Ask Module]
    G --> H[Cypher Translator]
    H --> I[Query Validator]
    I --> J[SQLite Backend]
    E --> K[Recall Hook]
    K --> J
    J --> L[Results]
    F --> L

The system follows a layered approach where queries are first classified by intent, then transformed into appropriate database queries. Natural language queries are translated to Cypher through an LLM-based translator, while code-specific queries bypass translation and directly execute predefined graph traversal operations.

Query Classification

The Router module plays a critical role in determining how each query should be processed. Based on keyword detection and pattern matching, queries are classified into distinct types that trigger different handling paths.

Query Types

Query Type	Trigger Keywords	Handler	Use Case
CODE	`function`, `class`, `caller`, `callee`, `impact`, file extensions, camelCase identifiers	CodeLayer	Code graph traversal
CHANGE	`since v1.0`, `before this commit`, `diff`	GitIngestReport	Version-aware queries
HYBRID	CODE × CHANGE combinations	GraphLayer + GitWalker	Historical code analysis
SEARCH	General text	Recall Hook	Semantic search
ASK	Natural language questions	Ask Module	Natural language to Cypher

Sources: src/thought/query/ask.py:1-30

CODE Query Detection

The CODE query class is triggered by code-shaped keywords and identifier patterns. This includes function names, class declarations, caller/callee relationships, file extensions such as .py or .ts, and version-related phrases like since v1.0 or before this commit. Additionally, camelCase and snake_case identifiers automatically route to the CODE handler, enabling queries like "who calls authenticate_user" to be processed through the call-graph machinery without explicit CLI invocation.

Sources: src/thought/query/ask.py:1-30

Natural Language to Cypher Translation

The Ask module (src/thought/query/ask.py) is responsible for translating natural language questions into Cypher queries. This translation is performed by an LLM provider configured in the [llm] section of the configuration file, supporting multiple backends through a unified interface.

Translation Process

sequenceDiagram
    participant U as User
    participant A as Ask Module
    participant L as LLM Provider
    participant V as Cypher Validator
    participant B as SQLite Backend
    
    U->>A: "What functions call authenticate_user?"
    A->>A: Build Prompt with Schema
    A->>L: Send Prompt
    L-->>A: Cypher Query
    A->>V: Validate Cypher
    alt Valid
        V->>B: Execute Query
        B-->>V: Results
        V-->>U: Ranked Results
    else Invalid
        A->>A: Fallback to Recall
        A-->>U: Semantic Search Results
    end

The translation process begins with constructing a prompt that includes the database schema, entity types, and relationship types. The LLM generates a Cypher query that is then validated against a parser before execution. If validation fails or the query cannot be executed, the system gracefully falls back to a plain recall() call, ensuring the user always receives some response.

Prompt Constraints

The Ask module enforces strict constraints on generated queries to maintain system safety and performance:

Only read-only Cypher features are permitted, including MATCH, WHERE, RETURN, LIMIT, and AS_OF
Query types are restricted to MERGE, CREATE, DELETE, SET, and WITH being explicitly forbidden
All entity types and relationship types must come from the defined schema
Single Cypher queries are required without explanations or markdown formatting

Sources: src/thought/query/ask.py:1-50

AskResult Data Model

The AskResult dataclass encapsulates the outcome of a query translation and execution attempt:

Field	Type	Description
`cypher`	`str \	None`	The generated Cypher query
`sql`	`str \	None`	Alternative SQL query if applicable
`rows`	`list[dict[str, Any]] \	None`	Query results
`fallback_used`	`bool`	Whether fallback to recall was triggered
`fallback_reason`	`str`	Explanation if fallback occurred

Sources: src/thought/query/ask.py:1-50

Recall Hook

The recall hook (src/thought/hooks/recall.py) provides semantic search functionality as a fallback mechanism and primary retrieval method. It uses embedding vectors to find semantically similar entities in the knowledge graph, supporting the core recall operation used throughout the system.

Recall Behavior

Recall operations are bounded by design to prevent overwhelming the user with too many results. The system never returns more than 10 hits regardless of knowledge base size, encouraging users to narrow their queries using as_of and scope parameters for more targeted retrieval.

The recall mechanism supports bi-temporal queries through the as_of_kind parameter:

valid: Returns what was true on a given date, answering "what was true on date X"
learned: Returns what the system knew on a given date, answering "what did the system know on date X"

These two modes differ when facts are corrected after the fact, enabling users to perform historical analysis of their knowledge graph.

Sources: src/thought/query/views.py

Code Layer

The Code Layer (src/thought/layers/code.py) provides a specialized interface for code-specific graph queries. It wraps the GraphLayer with operations native to programmers, operating against the currently-valid view of the code graph using the valid_until IS NULL filter.

Core Operations

Method	Description	Use Case
`callers_of(name)`	Direct callers ranked by PageRank	Finding who uses a function
`callees_of(name)`	Direct callees within the package	Finding what a function calls
`impact_set(name)`	Transitive callers ranked	Dependency analysis
`defines_in_file()`	All entities in a file	File-level inspection

All four operations support optional as_of parameters to query historical snapshots when bi-temporal git ingest has been configured. The code_commit_sha field enables time-travel queries against the code graph.

Sources: src/thought/layers/code.py:1-50

Entity Resolution

The _resolve_entity_id method handles name resolution with multiple fallback strategies:

Intra-file match with exact name
Cross-file match with unique qualified suffix
Cross-file bare-name match for top-level functions
Stub creation for unresolved references

This multi-stage resolution ensures that queries like obj.method() can resolve to ClassName.method when it is unique in the knowledge base, and that bare function names can be found across different files.

Sources: src/thought/query/cypher.py

Cypher Query Engine

The Cypher module (src/thought/query/cypher.py) handles the parsing, validation, and execution of Cypher queries against the SQLite backend. It provides a bridge between the graph query language and the relational database storage.

Query Validation

Before executing any Cypher query, the system validates it against the defined grammar to prevent malformed queries from reaching the database. This validation step catches syntax errors, unsupported features, and schema violations before they can cause runtime errors.

Execution Model

Cypher queries are translated into equivalent SQL statements that operate against the SQLite schema. The translation preserves the semantic meaning of graph patterns while adapting them to the relational storage model used by the backend.

Views and Data Models

The views module (src/thought/query/views.py) defines the data structures and return formats used throughout the Query and Retrieval System.

Entity Model

The Entity model represents nodes in the knowledge graph with the following key attributes:

Attribute	Type	Description
`id`	`str`	Unique identifier
`type`	`str`	Entity type (PERSON, function, class, etc.)
`name`	`str`	Display name
`canonical_name`	`str`	Normalized name for matching
`scope`	`ScopeName`	shared, private, or all
`tier`	`Tier`	hot, warm, or cold
`valid_from`	`datetime`	Start of validity period
`valid_until`	`datetime \	None`	End of validity period
`attrs`	`dict[str, object]`	Additional attributes

Scope Filter

The ScopeFilter class determines visibility of entities based on ownership and scope:

shared: All entities with scope = "shared"
private: Entities matching both scope = "private" AND owner_id
all: Shared entities plus private entities owned by the requesting user

The scope filter generates SQL fragments that join against the entity table aliased as e, enabling fine-grained access control across the query system.

Sources: src/thought/models.py:1-80

CLI Commands

The Query and Retrieval System is exposed through several CLI commands under the thought command group:

Command	Description
`thought recall <query>`	Semantic search across the knowledge graph
`thought ask <question>`	Natural language query with Cypher translation
`thought callers <name>`	Find direct callers ranked by PageRank
`thought callees <name>`	Find direct callees within the package
`thought impact <name>`	Transitive impact set analysis
`thought browse <name>`	Drill into a topic with PPR-ranked neighborhood
`thought diff --from <sha1> --to <sha2>`	Compare entities between commits

Browse Command

The browse command (mcp__thought__browse_topic) implements a two-step resolution process. First, the name is matched against entity types for a type facet. If no type matches, the name is resolved as an entity using canonical-name matching, and the PPR-ranked neighborhood is returned. The via field in results indicates whether the hit came from type_facet, ppr, or bfs matching.

Sources: src/thought/cli.py

Configuration

The Query and Retrieval System respects configuration from the thought.toml file and environment variables:

Option	Default	Description
`embedder`	`auto`	Embedder selection: auto, sentence-transformers, or deterministic
`llm.provider`	`openai`	LLM provider for Ask module
`llm.model`	varies	Model name for translation
`db_path`	`.thought/thought.db`	SQLite database path

The auto embedder selector probes the sentence_transformers package via importlib.util.find_spec before returning the wrapper, falling back to the deterministic embedder when the optional dependency is missing.

Integration Points

The Query and Retrieval System integrates with several other subsystems:

Storage Layer: SQLite backend provides entity and edge persistence
Ingest System: Code extractors populate entities that are later queried
Memory Module: Coordinates between recall, browse, and scan operations
Server: Exposes query functionality via MCP protocol

The bidirectional relationship between the Code Layer and the Cypher query engine enables both natural language queries like "who calls authenticate_user" and structured queries using the CODE query class, providing flexibility for different user interaction patterns.

Error Handling

The system implements graceful degradation throughout the query pipeline. If Cypher translation fails or validation rejects the generated query, execution falls back to the recall hook, ensuring users always receive results. Bounded result sets prevent resource exhaustion, and the contradiction detection mechanism surfaces conflicts as CONTRADICTS edges in the graph rather than throwing errors, allowing downstream applications to handle them as data.

Sources: src/thought/query/ask.py:1-30

Multi-Language Code Parsing

Related topics: Git History Integration, Storage and Database Layer

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Dispatcher Pattern

Continue reading this section for the full explanation and source context.

Section CodeEntity

Continue reading this section for the full explanation and source context.

Section CodeEdge

Continue reading this section for the full explanation and source context.

Multi-Language Code Parsing

The Multi-Language Code Parsing system is the foundational code-vertical layer in THOUGHT. It provides language-agnostic AST extraction across six programming languages using tree-sitter grammars, produces standardized code entities and relationship edges, and enables downstream features like caller analysis, impact queries, and cross-file call-graph resolution.

Overview

The parsing system operates in two phases:

Phase 1 – AST Extraction: Each language has a dedicated extractor that walks the tree-sitter parse tree and emits CodeEntity and CodeEdge objects.
Phase 2 – Call Graph Resolution: After all files are ingested, a separate pass resolves CALLS edges by matching callee names against the entity index.

Sources: src/thought/ingest/code/ast_extractor.py:1-15

Supported Languages

The system supports six languages through language-specific extractors:

Language	Extractor File	Tree-sitter Grammar
Python	`python_extractor.py`	`tree-sitter-python`
TypeScript / TSX / JSX	`typescript_extractor.py`	`tree-sitter-typescript`
Go	`go_extractor.py`	`tree-sitter-go`
Rust	`rust_extractor.py`	`tree-sitter-rust`
Java	`java_extractor.py`	`tree-sitter-java`
PHP	`php_extractor.py`	`tree-sitter-php`

Sources: src/thought/ingest/code/ast_extractor.py:30-55

Architecture

graph TD
    A[Code File] --> B[Language Detection]
    B --> C[ast_extractor.py Dispatcher]
    C --> D{Python?}
    C --> E{TypeScript?}
    C --> F{Go?}
    C --> G{Rust?}
    C --> H{Java?}
    C --> I{PHP?}
    D --> J[python_extractor.extract]
    E --> K[typescript_extractor.extract]
    F --> L[go_extractor.extract]
    G --> M[rust_extractor.extract]
    H --> N[java_extractor.extract]
    I --> O[php_extractor.extract]
    J --> P[(CodeEntity, CodeEdge)]
    K --> P
    L --> P
    M --> P
    N --> P
    O --> P
    P --> Q[CodeIngestPipeline]
    Q --> R[build_call_graph]
    R --> S[(CALLS Edges)]

Dispatcher Pattern

The ast_extractor.py module uses lazy loading to avoid importing heavy tree-sitter C extensions at module load time:

_REGISTRY: dict[str, Callable[[str, str], tuple[list[CodeEntity], list[CodeEdge]]]] = {}

def _python_extractor():
    from . import python_extractor
    return python_extractor.extract

Each language loader is registered in _LOADERS and invoked only when that language is first requested. Sources: src/thought/ingest/code/ast_extractor.py:9-35

Data Models

CodeEntity

Represents a code element extracted from the AST:

Field	Type	Description
`name`	`str`	Canonical identifier (module, function, class, method)
`type_`	`str`	Entity kind: `module`, `function`, `class`, `method`
`language`	`str`	Source language: `python`, `typescript`, `go`, `rust`, `java`, `php`
`file_path`	`str`	Path to source file (relative to repo root)
`line_start`	`int`	1-indexed start line
`line_end`	`int`	1-indexed end line
`signature`	`str`	Declaration signature (e.g., `module foo`, `def bar(self, x)`)
`docstring`	`str \	None`	Extracted docstring text
`visibility`	`str`	`public` or `private` based on naming conventions
`attrs`	`dict`	Language-specific metadata

Sources: src/thought/ingest/code/python_extractor.py:14-25

CodeEdge

Represents a relationship between entities:

Field	Type	Description
`source_name`	`str`	Entity that is the subject of the relation
`target_name`	`str`	Entity that is the object of the relation
`relation_type`	`str`	One of: `IMPORTS`, `INHERITS_FROM`, `DEFINES`, `OVERRIDES`, `CALLS`
`line_number`	`int`	Source line where the relationship was discovered
`attrs`	`dict`	Additional metadata (e.g., `from_import: true`)

Sources: src/thought/ingest/code/typescript_extractor.py:110-115

Extractor Interface

All language extractors share a common signature:

def extract(source: str, file_path: str) -> tuple[list[CodeEntity], list[CodeEdge]]:
    ...

This uniform interface allows the dispatcher to route to any language without knowing implementation details. Sources: src/thought/ingest/code/python_extractor.py:28-40

Supported Edge Types

Relation	Source	Target	Languages
`IMPORTS`	module	imported module	Python, TypeScript, PHP, Go, Rust, Java
`INHERITS_FROM`	class	parent class	Python, TypeScript, Java, PHP
`DEFINES`	class/module	contained member	All languages
`OVERRIDES`	method	overridden method	TypeScript (currently)
`CALLS`	function/method	called function	All (via call-graph pass)

Sources: src/thought/ingest/code/python_extractor.py:1-15, src/thought/ingest/code/typescript_extractor.py:1-20

Language-Specific Extractors

Python Extractor

The Python extractor uses tree-sitter-python and handles:

Module entities as the root node
Function definitions (function_item)
Class declarations (class_declaration)
Method definitions within classes
Import statements (import_from_statement, import_statement)
Class inheritance via base field

def extract(source: str, file_path: str) -> tuple[list[CodeEntity], list[CodeEdge]]:
    parser = _get_parser()
    source_bytes = source.encode("utf-8")
    tree = parser.parse(source_bytes)
    root = tree.root_node

    module_name = _module_name_from_path(file_path)
    entities: list[CodeEntity] = []
    edges: list[CodeEdge] = []

    entities.append(CodeEntity(
        name=module_name,
        type_="module",
        language="python",
        ...
    ))

Sources: src/thought/ingest/code/python_extractor.py:28-50

TypeScript Extractor

The TypeScript extractor supports both .ts and .tsx files using separate tree-sitter grammars:

def extract(source: str, file_path: str) -> tuple[list[CodeEntity], list[CodeEdge]]:
    use_tsx = file_path.endswith((".tsx", ".jsx"))
    parser = _get_parser(use_tsx=use_tsx)
    ...

Node types processed include function_declaration, arrow_function, class_declaration, method_definition, import_statement, and export_statement. Sources: src/thought/ingest/code/typescript_extractor.py:120-145

PHP Extractor

The PHP extractor handles files starting with <?php and recursively scans for definitions nested under namespace_definition blocks:

def _scan(node: Node) -> None:
    for child in node.named_children:
        ...

Sources: src/thought/ingest/code/php_extractor.py:45-60

Rust Extractor

The Rust extractor uses tree-sitter-rust and tracks method visibility through impl_type attributes:

out_entities.append(CodeEntity(
    name=qualified, type_="method", language="rust",
    visibility=_rust_visibility(child, source_bytes),
    attrs={"impl_type": type_name},
))

Sources: src/thought/ingest/code/rust_extractor.py:1-30

Call Graph Resolution

The call graph is built in a separate Phase 2 pass after all files are ingested. The build_call_graph function resolves callee references using a cascade of strategies:

Exact match within same file — direct intra-file resolution
Qualified suffix match — obj.method() resolves to ClassName.method
Cross-file bare-name match — top-level functions defined elsewhere
Stub creation — synthetic stub for unknown callees (filtered from impact graphs)

tgt_id = backend.find_code_entity(
    canonical_name=callee_name, scope_filter=sf, code_file=file_path,
)
if tgt_id is None and "." not in callee_name:
    # Unique qualified suffix match.
    rows = backend._conn.execute(
        "SELECT id FROM entities "
        "WHERE type IN ('method','function') AND valid_until IS NULL "
        "AND canonical_name LIKE ? ...",
        (f"%.{callee_name.lower()}", commit_sha),
    ).fetchall()

Sources: src/thought/ingest/code/call_graph.py:1-60

CodeIngestPipeline

The CodeIngestPipeline orchestrates the full ingest workflow:

Reads source file content
Detects or validates language
Calls the appropriate extractor
Creates a source reference record
Writes entities within a single transaction
Embeds entity signatures and docstrings for VIBE recall
Writes edges and resolves call graph

graph LR
    A[Source File] --> B[detect_language]
    B --> C[extract entities/edges]
    C --> D[upsert_source]
    D --> E[begin transaction]
    E --> F[_write_entities + embed]
    F --> G[_write_edges]
    G --> H[build_call_graph]
    H --> I[commit]

The pipeline embeds entity signatures and docstrings so that queries like "who calls authenticate_user" can find functions by intent rather than exact name. Sources: src/thought/ingest/code/pipeline.py:1-80

CodeLayer API

The CodeLayer provides a high-level interface for code graph queries:

Method	Description
`callers_of(name)`	Direct callers, ranked by Personalized PageRank
`callees_of(name)`	Direct callees (intra-package)
`impact_set(name)`	Transitive callers, ranked — for `thought impact` command
`defines_in_file(path)`	All entities discovered in a file

All methods operate against the currently-valid view (valid_until IS NULL). Pass as_of= for historical snapshots. Sources: src/thought/layers/code.py:1-40

Git-Aware Ingest

The GitWalker enables two ingestion modes:

Mode	Behavior
`snapshot` (default)	Ingest HEAD only, stamp every entity with HEAD SHA
`full`	Walk every commit chronologically, stamp each entity with its commit SHA

This enables bi-temporal as_of queries against historical commits. Sources: src/thought/ingest/code/git_pipeline.py:1-50

Configuration

Language is auto-detected by file extension when language=None:

Extension	Language
`.py`	`python`
`.ts`, `.tsx`, `.js`, `.jsx`	`typescript`
`.go`	`go`
`.rs`	`rust`
`.java`	`java`
`.php`	`php`

Pass language= explicitly to override detection. Sources: src/thought/ingest/code/pipeline.py:25-35

Sources: src/thought/ingest/code/ast_extractor.py:1-15

Git History Integration

Related topics: Multi-Language Code Parsing, Memory Model and Data Structures

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Component Overview

Continue reading this section for the full explanation and source context.

Section Data Flow

Continue reading this section for the full explanation and source context.

Section GitWalker

Continue reading this section for the full explanation and source context.

Git History Integration

Overview

Git History Integration enables thought-mcp to ingest source code with full commit-level provenance, allowing bi-temporal queries that can reconstruct what a codebase looked like at any point in its history. This feature stamps every extracted code entity (functions, classes, modules) with the exact git commit SHA where it was discovered, creating a temporal graph that supports "as-of" queries.

The system provides two ingestion modes: a fast snapshot mode for current-state analysis and a comprehensive full-history mode for complete historical reconstruction.

Sources: CHANGELOG.md

Architecture

Component Overview

graph TD
    subgraph "Git History Integration"
        CLI["thought ingest-git CLI"]
        Pipeline["GitIngestPipeline"]
        Walker["GitWalker"]
        Storage["SQLite Backend"]
    end
    
    subgraph "Git Operations"
        Git["git executable"]
        RevParse["rev-parse HEAD"]
        Log["log --format"]
        LsTree["ls-tree -r"]
        Show["show <sha>:<path>"]
    end
    
    CLI --> Pipeline
    Pipeline --> Walker
    Walker --> Git
    Git --> RevParse
    Git --> Log
    Git --> LsTree
    Git --> Show
    Pipeline --> Storage

Data Flow

sequenceDiagram
    participant User
    participant CLI
    participant Pipeline
    participant Walker
    participant Extractor
    participant Backend
    
    User->>CLI: thought ingest-git /repo --mode full
    CLI->>Pipeline: run(repo_path, mode)
    
    alt snapshot mode
        Pipeline->>Walker: get_head_commit()
        Walker->>Git: rev-parse HEAD
        Git-->>Walker: sha
        Pipeline->>Pipeline: ingest single snapshot
    else full mode
        Pipeline->>Walker: get_all_commits()
        Walker->>Git: log --format
        Git-->>Walker: commit list
        Loop for each commit
            Pipeline->>Git: ls-tree -r sha
            Pipeline->>Git: show sha:path
            Git-->>Pipeline: file content
            Pipeline->>Extractor: extract(entities, edges)
            Extractor-->>Pipeline: CodeEntity[], CodeEdge[]
            Pipeline->>Backend: upsert with commit_sha
        end
    end
    
    Pipeline-->>User: GitIngestReport

Sources: src/thought/ingest/code/git_pipeline.py:1-95 Sources: src/thought/ingest/code/git_walker.py:1-60

Core Components

GitWalker

The GitWalker class provides a read-only interface to git repositories using pure subprocess calls. It deliberately avoids native dependencies like pygit2 to minimize installation footprint.

Method	Git Command	Purpose
`get_head_sha()`	`rev-parse HEAD`	Get current HEAD commit SHA
`get_all_commits()`	`log --format=...`	List all commits chronologically
`get_files_at_commit(sha)`	`ls-tree -r <sha>`	List files in tree at commit
`get_file_at_commit(sha, path)`	`show <sha>:<path>`	Get file content at commit

#### Commit Data Model

@dataclass(frozen=True)
class Commit:
    sha: str                    # Full commit SHA
    author: str                 # Author name
    author_email: str           # Author email
    author_date: datetime      # Commit timestamp
    subject: str                # Commit message first line

Sources: src/thought/ingest/code/git_walker.py:24-31

#### Initialization Validation

def __init__(self, repo_path: Path | str) -> None:
    self.repo = Path(repo_path)
    if shutil.which("git") is None:
        raise RuntimeError("git executable not on PATH")
    if not (self.repo / ".git").exists():
        raise ValueError(f"not a git repository: {self.repo}")

The walker validates that:

The git executable exists on PATH
The target path is a valid git repository (contains .git directory)

Sources: src/thought/ingest/code/git_walker.py:35-42

GitIngestPipeline

The pipeline orchestrates the complete ingestion process, coordinating between git history traversal and code extraction.

Parameter	Type	Description
`repo_path`	`Path`	Path to git repository
`mode`	`GitMode`	`"snapshot"` (HEAD only) or `"full"` (all commits)
`patterns`	`tuple[str, ...]`	Glob patterns to filter files (e.g., `*.py`)

#### Ingestion Report

@dataclass(frozen=True)
class GitIngestReport:
    head_sha: str           # SHA of HEAD at time of ingest
    mode: GitMode           # Mode used for ingestion
    commits_visited: int    # Number of commits processed
    files_ingested: int     # Total files ingested
    call_edges: int         # Call graph edges created

Sources: src/thought/ingest/code/git_pipeline.py:35-41

Ingestion Modes

Snapshot Mode (Default)

Snapshot mode ingests only the current HEAD commit. This is the recommended mode for:

Initial repository ingestion
Quick code analysis workflows
When historical queries are not needed

Performance characteristics:

Single-pass through current tree
No duplicate processing
Typical runtime: seconds to minutes depending on repository size

Entity stamping: All extracted entities receive the HEAD SHA as their code_commit_sha attribute, enabling queries like "what did auth.middleware look like at HEAD?" or future comparisons.

Sources: src/thought/ingest/code/git_pipeline.py:7-16

Full History Mode

Full mode walks every commit in chronological order, ingesting the file tree at each point. This enables:

Historical queries: "what did function X look like at commit Y?"
Diff analysis between any two commits
Complete temporal reconstruction of code evolution

Performance considerations:

Repository Size	Estimated Commits	Estimated Time
Small (<100 files)	~100	~30 seconds
Medium (500 files)	~1000	~5 minutes
Large (1000+ files)	~5000+	~25+ minutes

Note: Full-history ingest is bounded by file count × commits. The per-commit cost is dominated by tree-sitter parsing, not git operations.

Sources: src/thought/ingest/code/git_pipeline.py:16-25

CLI Usage

Command Syntax

thought ingest-git <repo_path> [OPTIONS]

#### Options

Option	Short	Default	Description
`--mode`	`--mode snapshot` or `--mode full`	`snapshot`	Ingestion mode
`--paths`	`--paths ".py,.js"`	`*.py`	Comma-separated glob patterns
`--config`	`--config path/to/config`	`thought.toml`	Configuration file

Examples

# Ingest current directory as git repo (HEAD only)
thought ingest-git .

# Ingest specific repository with full history
thought ingest-git /path/to/repo --mode full

# Ingest Python and TypeScript files only
thought ingest-git . --paths "*.py,*.ts,*.tsx"

# Ingest with full git history, multiple file types
thought ingest-git /project --mode full --paths "*.py,*.js,*.go"

Sources: src/thought/cli.py:90-120

Code Commit Stamping

Every extracted code entity receives metadata linking it to its source commit:

eid = self._backend.upsert_entity(
    # ... other fields ...
    code_file=ent.file_path,
    code_language=language,
    code_commit_sha=commit_sha,  # Links entity to specific commit
)

The database schema includes:

Column	Type	Purpose
`code_file`	`TEXT`	File path relative to repo root
`code_language`	`TEXT`	Programming language detected
`code_commit_sha`	`TEXT`	Git commit where entity was found

These columns have partial indexes for fast lookups by commit.

Sources: CHANGELOG.md Sources: src/thought/ingest/code/pipeline.py:60-75

CodeLayer Query Interface

The CodeLayer class provides convenience methods for querying the code graph with temporal awareness:

class CodeLayer:
    def callers_of(name, *, code_commit_sha=None)  # Find who calls this function
    def callees_of(name, *, code_commit_sha=None)  # Find what this function calls
    def impact_set(name)                            # Transitive callers, ranked
    def defines_in_file(path)                       # Entities in a file

Temporal Queries

All lookups operate against the currently-valid view of the code graph. To query historical snapshots, pass the as_of parameter or filter by code_commit_sha:

# Query current state
impact = code_layer.impact_set("authenticate_user")

# Query historical state (when full-history ingest was used)
impact_historical = code_layer.impact_set(
    "authenticate_user",
    code_commit_sha="abc123..."
)

Sources: src/thought/layers/code.py:1-50

Diff Between Commits

The system supports computing the difference between any two ingested commits:

thought diff --from <sha1> --to <sha2>

This returns:

Added entities: Entities present at --to but not at --from
Removed entities: Entities present at --from but not at --to

The diff operates on the set of entities by name, comparing their commit stamps.

Sources: CHANGELOG.md

Supported Languages

The git ingestion pipeline uses language-specific extractors:

Language	Extractor	Extensions
Python	`python_extractor.py`	`.py`
Rust	`rust_extractor.py`	`.rs`
TypeScript	`typescript_extractor.py`	`.ts`, `.tsx`
PHP	`php_extractor.py`	`.php`

Each extractor uses tree-sitter for AST parsing, extracting:

Entities: modules, functions, classes, methods
Edges: IMPORTS, DEFINES, CALLS, INHERITS_FROM, OVERRIDES

Sources: src/thought/ingest/code/python_extractor.py Sources: src/thought/ingest/code/rust_extractor.py Sources: src/thought/ingest/code/typescript_extractor.py Sources: src/thought/ingest/code/php_extractor.py

Configuration

Thought Configuration (thought.toml)

[embedder]
type = "auto"  # or "ollama", "openai", "deterministic"

[storage]
path = "thought.db"

Environment Variables

Variable	Description
`OLLAMA_BASE_URL`	Ollama server URL for embeddings
`OPENAI_API_KEY`	OpenAI API key for embeddings

Best Practices

Initial Ingestion

Start with snapshot mode to verify the setup works
Run thought stats to confirm entities were created
Query a function to verify call graph edges exist

Full History Ingestion

Ensure adequate disk space (full mode creates temporary copies)
Use --paths to filter to relevant file types on large repos
Consider running during off-peak hours for large repositories

Query Optimization

Use code_file filter when querying specific files
Use code_commit_sha filter for historical lookups
Combine with vector similarity for intent-based queries

Troubleshooting

"git executable not on PATH"

Solution: Install git or ensure it's in your system PATH.

# Verify git is available
git --version

"not a git repository"

Solution: Ensure the path contains a .git directory:

# Initialize if needed
git init

Slow Full-History Ingestion

Mitigation:

Use --paths to filter file types
Use snapshot mode for initial setup
Consider parallelizing with multiple --paths passes

Summary

Git History Integration transforms thought-mcp from a current-state code analysis tool into a full temporal code repository that can answer questions about code at any point in history. By combining git's commit tracking with bi-temporal database queries, users can reconstruct how functions evolved, who called what across commits, and the complete impact chain of changes over time.

The architecture prioritizes:

No native dependencies: Pure subprocess git operations
Two-mode flexibility: Fast snapshots or complete history
Temporal provenance: Every entity stamped with its commit SHA
Language generality: Support for multiple programming languages via tree-sitter

Sources: CHANGELOG.md

Agent Adapters and SDK Integration

Related topics: Query and Retrieval System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Purpose and Scope

Continue reading this section for the full explanation and source context.

Section Core Methods

Continue reading this section for the full explanation and source context.

Section Working Context Structure

Continue reading this section for the full explanation and source context.

Related topics: Query and Retrieval System

Agent Adapters and SDK Integration

Overview

The Agent Adapters and SDK Integration subsystem provides a seamless bridge between THOUGHT's knowledge base and external AI agent frameworks. This system enables any Claude-Agent-SDK-shaped agent to interact with THOUGHT's memory, context retrieval, and code analysis capabilities through a standardized adapter interface.

The integration layer consists of three primary components:

Claude SDK Adapter (ThoughtMemoryProvider) — A drop-in memory adapter for Claude Agent SDK
MCP Server Surface — Exposes core primitives via the Model Context Protocol
Claude Code Hook Installer — Integrates THOUGHT directly into Claude Code's event loop

Sources: CHANGELOG.md

Architecture Overview

graph TD
    subgraph "Agent Frameworks"
        ClaudeSDK[Claude Agent SDK]
        ClaudeCode[Claude Code CLI]
        MCPClients[MCP-Compatible Clients]
    end

    subgraph "THOUGHT Integration Layer"
        ClaudeSDKAdapter[ThoughtMemoryProvider]
        MCPServer[MCP Server Surface]
        HookInstaller[Claude Code Hook Installer]
    end

    subgraph "Core THOUGHT"
        Memory[Memory / Knowledge Base]
        Embedder[Embedder Service]
        CodeAnalysis[Code Analysis Engine]
        Backend[SQLite Backend]
    end

    ClaudeSDK --> ClaudeSDKAdapter
    ClaudeSDKAdapter --> Memory
    ClaudeSDKAdapter --> Embedder
    
    ClaudeCode --> HookInstaller
    HookInstaller --> Memory
    
    MCPClients --> MCPServer
    MCPServer --> Memory
    MCPServer --> CodeAnalysis
    MCPServer --> Backend

    Memory --> Backend
    Embedder --> Backend
    CodeAnalysis --> Backend

The Claude SDK Adapter

Purpose and Scope

The ThoughtMemoryProvider class serves as a drop-in memory adapter for any Claude-Agent-SDK-shaped agent. It wraps THOUGHT's core memory primitives and exposes them through a familiar interface that agent developers expect.

Sources: src/thought/adapters/claude_sdk.py

Core Methods

The adapter implements three primary methods that cover the complete agent loop:

Method	Purpose	Returns
`context_for(target, role)`	Returns a working-context dict for a specific target entity and role	`dict` with anchor, neighbours, recent_contradictions, role_view
`render_context(target)`	Returns the same payload as a plain-text system-prompt augmentation	`str` formatted for LLM consumption
`record(content)`	Persists what the agent learned to the knowledge base	`str` — source ID of recorded content
`scan(repo_path)`	Runs an incremental scan under the agent's name	`dict` with scan results

Sources: src/thought/adapters/claude_sdk.py

Working Context Structure

The context_for() method returns a ranked, role-aware payload containing:

{
    "anchor": "<entity-name>",           # The target entity
    "neighbours": [...],                  # Top-K related entities
    "recent_contradictions": [...],       # Entities that contradict this one
    "role_view": "<saved-view-name>"      # Optional named view for the role
}

The context is token-budgeted to prevent overwhelming the agent's context window.

Sources: CHANGELOG.md

Integration Flow

sequenceDiagram
    participant Agent as Claude Agent SDK
    participant Adapter as ThoughtMemoryProvider
    participant Memory as THOUGHT Memory
    participant Embedder as Embedder Service
    participant Backend as SQLite Backend

    Agent->>Adapter: context_for("authenticate", role="code")
    Adapter->>Memory: working_context(target, role, budget_tokens)
    Memory->>Embedder: embed("authenticate")
    Embedder->>Memory: vector embedding
    Memory->>Backend: query similar entities
    Backend-->>Memory: ranked entity results
    Memory-->>Adapter: structured context dict
    Adapter-->>Agent: context payload

    Agent->>Adapter: record("Learned: auth uses JWT")
    Adapter->>Backend: upsert_source(content, mime_type)
    Adapter->>Backend: store entity + edges
    Backend-->>Adapter: source_id
    Adapter-->>Agent: source_id

MCP Server Surface

The MCP (Model Context Protocol) server exposes THOUGHT's primitives as tools that any MCP-compatible client can invoke.

Sources: src/thought/server.py

Available Tools

#### working_context

Universal "what does my agent need to know about X right now" primitive.

@app.tool()
async def working_context(
    target: str,           # "function:authenticate" / "chapter:5" / entity name
    role: str = "default", # Contextual role for view filtering
    budget_tokens: int = 1024,
    scope: str | None = None,
    owner_id: str | None = None,
) -> dict

Returns:

{
    "anchor": str,
    "neighbours": list[dict],
    "recent_contradictions": list[dict],
    "role_view": str | None
}

Sources: src/thought/server.py:48-63

#### scan

Incremental code-scan primitive for keeping the knowledge base current.

@app.tool()
async def scan(
    repo_path: str,           # Repository to scan
    agent: str | None = None, # Agent name for scan attribution
    since: str | None = None, # Only files changed since this time/commit
    max_files: int | None = None,
    note: str | None = None,
) -> dict

Sources: src/thought/server.py:65-78

#### scan_log_list

Lists recent scan runs for tracking incremental progress.

@app.tool()
async def scan_log_list(
    agent: str | None = None,
    limit: int = 10,
) -> dict

Returns:

{
    "scans": [
        {
            "scan_id": str,
            "agent": str,
            "timestamp": str,
            "files_processed": int,
            "note": str | None
        },
        ...
    ]
}

Sources: src/thought/server.py:80-91

Client Installation

THOUGHT supports installation into multiple MCP-compatible clients. The installation process merges a thought MCP server entry into the client's configuration file.

Sources: src/thought/clients.py

Supported Clients

Client	Configuration Path
Project	`.claude/settings.json`
User	`~/.claude/settings.json`

Sources: src/thought/clients.py

Installation Function

def install(
    client: ClientName,
    *,
    server_name: str = "thought",
    block: dict | None = None,
    backup: bool = True,
) -> ClientInstallResult

Parameters:

Parameter	Type	Default	Description
`client`	`ClientName`	Required	Target client name
`server_name`	`str`	`"thought"`	Name for the server entry
`block`	`dict \	None`	`None`	Custom server block; defaults to `server_block()`
`backup`	`bool`	`True`	Backup existing config before modification

Return Type: ClientInstallResult

@dataclass
class ClientInstallResult:
    client: ClientName
    path: Path | None
    status: Literal["installed", "already_present", "error", "no_path"]
    detail: str = ""

Sources: src/thought/clients.py

Installation Behavior

The install() function performs the following:

Read existing config — Parses the client's JSON configuration
Merge server entry — Adds the thought server block under mcpServers
Backup — Creates settings.json.thought.bak before any write
Idempotency check — Returns already_present if entry exists and matches

Sources: src/thought/clients.py

graph TD
    A[install called] --> B{Config exists?}
    B -->|No| C[Create new config]
    B -->|Yes| D{Valid JSON?}
    D -->|No| E[Return error]
    D -->|Yes| F{Server entry exists?}
    F -->|Yes, matches| G[Return already_present]
    F -->|Yes, differs| H[Backup config]
    F -->|No| I[Add server entry]
    H --> J[Write merged config]
    I --> J
    C --> J
    J --> K[Return installed]

Claude Code Hook Integration

The hook installer provides Claude Code event-driven integration, enabling THOUGHT to automatically capture context at key points in the development workflow.

Sources: src/thought/hooks/install.py

Hook Kinds

Hook Kind	Claude Code Event	Command	Trigger
`recall`	`UserPromptSubmit`	`thought hook recall`	After user submits a prompt
`write`	`Stop`	`thought hook write`	After agent completes work
`context`	`SessionStart`	`thought hook context`	When session begins

Sources: src/thought/hooks/install.py:15-22

Hook Installation Result

@dataclass(frozen=True)
class HookInstallResult:
    kind: HookKind
    path: Path
    status: Literal["installed", "already_present", "error"]
    detail: str = ""

Settings Path Resolution

def settings_path(*, scope: Literal["project", "user"] = "project") -> Path

Project scope — .claude/settings.json (recommended default)
User scope — ~/.claude/settings.json

Sources: src/thought/hooks/install.py:41-50

Demo Integration

The thought demo command includes a built-in walkthrough specifically for the Claude Agent SDK adapter:

- ``code``  Agent / developer flow — the 14-stage code-vertical
            walkthrough including agent identity, ``thought scan``,
            ``working_context``, 4 new-language extractors, and the
            Claude Agent SDK adapter.

Sources: src/thought/demo.py

Demo Audiences

Audience	Purpose	Key Features
`code`	Agent/developer flow	SDK adapter, scan, working_context
`writer`	Novelist/paper author	Bi-temporal model, contradiction detection
`legal`	Investigator/paralegal	`unique_predicates`, CONTRADICTS edges
`researcher`	Academic	Claim/source pairs, Cypher queries
`all`	Sequential all audiences	Full demonstration suite

Configuration

Environment Variables

The integration layer respects the following environment variables for embedder configuration:

Variable	Purpose
`THOUGHT_DB_PATH`	Override database path
`THOUGHT_EMBEDDER`	Embedder choice (`auto`, `sentence-transformers`, etc.)
`THOUGHT_OLLAMA_HOST`	Ollama server host
`THOUGHT_OLLAMA_MODEL`	Ollama model name
`THOUGHT_LMSTUDIO_URL`	LM Studio server URL
`THOUGHT_LMSTUDIO_MODEL`	LM Studio model name
`THOUGHT_OPENAI_COMPAT_URL`	OpenAI-compatible API URL
`THOUGHT_OPENAI_COMPAT_MODEL`	OpenAI-compatible model name
`THOUGHT_OPENAI_COMPAT_API_KEY`	API key for OpenAI-compatible endpoints

Sources: src/thought/config.py

Config File (`thought.toml`)

[embedding]
choice = "auto"  # or specific embedder name

[db]
path = ".thought/thought.db"

Dependencies

The adapter package requires the following extras:

[project.optional-dependencies]
adapters = ["httpx>=0.27"]

Sources: CHANGELOG.md

Usage Example

from thought.adapters.claude_sdk import ThoughtMemoryProvider

# Initialize adapter
memory = ThoughtMemoryProvider()

# Get working context for a function
context = memory.context_for(
    target="authenticate_user",
    role="security-reviewer",
    budget_tokens=2048,
)

# Record what the agent learned
source_id = memory.record(
    "Session token validation happens in this function. "
    "Uses HMAC-SHA256 for signature verification."
)

# Run incremental scan
result = memory.scan(
    repo_path="/path/to/project",
    agent="security-audit",
    note="Weekly security review scan"
)

Summary

The Agent Adapters and SDK Integration system provides three complementary pathways for integrating THOUGHT with external agents:

Direct SDK Integration — ThoughtMemoryProvider for Claude Agent SDK agents
MCP Protocol — Standard tool interface for any MCP-compatible client
Claude Code Hooks — Event-driven integration for Claude Code CLI users

All pathways share the same underlying memory primitives, ensuring consistent behavior regardless of how the agent connects to THOUGHT.

Sources: CHANGELOG.md

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

medium Configuration risk needs validation

Users may get misleading failures or incomplete behavior unless configuration is checked carefully.

medium README/documentation is current enough for a first validation pass.

The project should not be treated as fully validated until this signal is reviewed.

medium v0.2.1 — thought upgrade + mcp-extras fix

Users cannot judge support quality until recent activity, releases, and issue response are checked.

medium Maintainer activity is unknown

Users cannot judge support quality until recent activity, releases, and issue response are checked.

Doramagic Pitfall Log

Doramagic extracted 8 source-linked risk signals. Review them before installing or handing real data to the project.

1. Configuration risk: Configuration risk needs validation

Severity: medium
Finding: Configuration risk is backed by a source signal: Configuration risk needs validation. Treat it as a review item until the current version is checked.
User impact: Users may get misleading failures or incomplete behavior unless configuration is checked carefully.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: capability.host_targets | github_repo:1238261514 | https://github.com/RNBBarrett/thought-mcp | host_targets=mcp_host, claude, claude_code, chatgpt

2. Capability assumption: README/documentation is current enough for a first validation pass.

Severity: medium
Finding: README/documentation is current enough for a first validation pass.
User impact: The project should not be treated as fully validated until this signal is reviewed.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: capability.assumptions | github_repo:1238261514 | https://github.com/RNBBarrett/thought-mcp | README/documentation is current enough for a first validation pass.

3. Maintenance risk: v0.2.1 — thought upgrade + mcp-extras fix

Severity: medium
Finding: Maintenance risk is backed by a source signal: v0.2.1 — thought upgrade + mcp-extras fix. Treat it as a review item until the current version is checked.
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/RNBBarrett/thought-mcp/releases/tag/v0.2.1

4. Maintenance risk: Maintainer activity is unknown

Severity: medium
Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: evidence.maintainer_signals | github_repo:1238261514 | https://github.com/RNBBarrett/thought-mcp | last_activity_observed missing

5. Security or permission risk: no_demo

Severity: medium
Finding: no_demo
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: downstream_validation.risk_items | github_repo:1238261514 | https://github.com/RNBBarrett/thought-mcp | no_demo; severity=medium

6. Security or permission risk: no_demo

Severity: medium
Finding: no_demo
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: risks.scoring_risks | github_repo:1238261514 | https://github.com/RNBBarrett/thought-mcp | no_demo; severity=medium

7. Maintenance risk: issue_or_pr_quality=unknown

Severity: low
Finding: issue_or_pr_quality=unknown。
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: evidence.maintainer_signals | github_repo:1238261514 | https://github.com/RNBBarrett/thought-mcp | issue_or_pr_quality=unknown

8. Maintenance risk: release_recency=unknown

Severity: low
Finding: release_recency=unknown。
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: evidence.maintainer_signals | github_repo:1238261514 | https://github.com/RNBBarrett/thought-mcp | release_recency=unknown

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 3

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using thought-mcp with real data or production workflows.

v0.2.2 — MCP stdio transport fix - github / github_release
v0.2.1 — thought upgrade + mcp-extras fix - github / github_release
Configuration risk needs validation - GitHub / issue

Source: Project Pack community evidence and pitfall evidence