Doramagic Project Pack · Human Manual

thought-mcp

| Command | Description | |---------|-------------| | thought init | Create database file + config + CLAUDE.md | | thought recall <query | Semantic recall with embeddings | | thought ask <...

Introduction to THOUGHT

Related topics: Quickstart Guide, Installation and Setup, System Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Storage Layer

Continue reading this section for the full explanation and source context.

Section Query Layer

Continue reading this section for the full explanation and source context.

Section Graph Layer

Continue reading this section for the full explanation and source context.

Related topics: Quickstart Guide, Installation and Setup, System Architecture

Introduction to THOUGHT

THOUGHT is a local AI memory tool designed to help developers, researchers, writers, and investigators maintain persistent, queryable knowledge graphs of their work. It combines graph database technology with natural language processing to create a bi-temporal knowledge base that tracks information across time—answering questions like "what was true on date X" and "what did the system know on date X." Sources: README.md

What is THOUGHT?

THOUGHT operates as a self-hosted memory layer that runs entirely on your local machine. Unlike cloud-based AI memory solutions, THOUGHT stores everything in a local SQLite database, giving you full control over your data while still providing powerful querying capabilities through natural language or Cypher graph queries. Sources: src/thought/cli.py:1-50

The core philosophy is to treat memory as a first-class citizen in the development workflow—something that persists across sessions, understands context, and can be queried like a real database rather than a simple key-value store.

Core Architecture

THOUGHT's architecture consists of several interconnected layers that work together to provide a complete memory solution.

graph TD
    A[CLI / MCP Server] --> B[Query Layer]
    B --> C[Graph Layer]
    B --> D[Code Layer]
    C --> E[Storage Backend]
    D --> E
    E --> F[SQLite Database]
    B --> G[LLM Providers]
    G --> H[Ollama / LM Studio / OpenAI]

Storage Layer

The storage layer uses SQLite with a carefully designed schema that supports bi-temporal modeling. Every entity and edge in the knowledge graph has timestamps tracking when facts became valid and when they were learned. Sources: src/thought/storage/sqlite/backend.py:1-100

ComponentPurpose
SQLiteBackendCore database operations with upsert, query, and embedding storage
WAL ModeWrite-Ahead Logging for crash recovery and concurrent reads
Migration SystemTracks applied migrations in applied_migrations table
Bi-temporal Columnsvalid_from, valid_until, learned_at, unlearned_at

Query Layer

The query layer provides multiple interfaces for accessing your memory:

  • Natural Language: Ask questions in plain English, translated to Cypher
  • Code Queries: Find callers, callees, and impact sets
  • Recall: Semantic search using embeddings
  • Cypher Direct: Execute graph queries directly Sources: src/thought/query/ask.py:1-50

Graph Layer

The graph layer provides the core graph operations that power all THOUGHT functionality. It handles entity and edge management with support for scopes (shared/private) and owner-based access control. Sources: src/thought/layers/graph.py

Entity Model

THOUGHT uses a flexible entity model that can represent code elements, prose content, legal documents, and research claims.

classDiagram
    class Entity {
        +str id
        +str type
        +str name
        +str canonical_name
        +ScopeName scope
        +Tier tier
        +float importance
        +datetime valid_from
        +datetime valid_until
        +datetime learned_at
        +dict~str, object~ attrs
    }
    
    class Edge {
        +str id
        +str source_id
        +str target_id
        +str relation_type
    }
    
    Entity "1" --> "*" Edge : source
    Entity "1" --> "*" Edge : target

Sources: src/thought/models.py:50-100

Entity Attributes

FieldTypeDescription
idstrUnique identifier
typestrEntity type (function, class, module, claim, etc.)
namestrHuman-readable name
canonical_namestrFully qualified name for disambiguation
scopeScopeName"shared" or "private"
owner_idstrOwner for private entities
tierTier"hot", "warm", or "cold"
valid_fromdatetimeWhen this fact became true
valid_untildatetimeWhen this fact stopped being true (null = current)
learned_atdatetimeWhen THOUGHT learned this fact
attrsdictAdditional type-specific metadata

Edge Relations

Edges represent relationships between entities with the following relation types:

Relation TypeDescription
CALLSFunction/method invocation
INHERITS_FROMClass inheritance
DEFINESContainer defines member
IMPORTSModule import statement
CONTRADICTSLogical contradiction between facts
CITESSource citation relationship

Audience Verticals

THOUGHT is designed to serve multiple audiences, each with specialized commands and entity taxonomies optimized for their use case. Sources: src/thought/demo.py:1-80

graph LR
    A[THOUGHT] --> B[Code Developers]
    A --> C[Writers]
    A --> D[Legal Investigators]
    A --> E[Researchers]
    
    B --> B1[thought scan]
    B --> B2[thought impact]
    B --> B3[thought callers]
    
    C --> C1[thought ingest-prose]
    C --> C2[thought timeline]
    C --> C3[contradiction-check]
    
    D --> D1[thought ingest-legal]
    D --> D2[unique_predicates]
    D --> D3[contradiction-graph]
    
    E --> E1[thought ingest-claim]
    E --> E2[citation-analysis]
    E --> E3[reliability-filter]

Code Developers

The code vertical provides tools for understanding, navigating, and analyzing source code:

  • thought scan: Incremental code scanning with change detection
  • thought impact <name>: Transitive impact set—what's affected if I change this?
  • thought callers <name>: Direct callers ranked by Personalized PageRank
  • thought recall: Semantic search across code by intent Sources: src/thought/layers/code.py:1-50

Writers

The writing vertical supports fiction and academic prose:

  • Ingest chapter/section facts about characters
  • Detect contradictions via the bi-temporal model
  • Query chronological mentions across documents
  • Time-travel as_of recall for historical consistency

The legal vertical is designed for investigation workflows:

  • thought ingest-legal: Ingest witness statements with unique predicates
  • thought contradiction-graph: Trigger CONTRADICTS edges between testimonies
  • Query the contradiction graph for investigation leads

Researchers

The research vertical supports academic workflows:

  • thought ingest-claim: Ingest claim/source pairs
  • Cypher queries to find uncited claims
  • Most-cited source identification
  • Citation reliability filtering

CLI Commands Overview

CommandDescription
thought initCreate database file + config + CLAUDE.md
thought recall <query>Semantic recall with embeddings
thought ask <question>Natural language query → Cypher → results
thought scan <repo>Incremental code scan with change detection
thought callers <name>Find direct callers ranked by PageRank
thought impact <name>Transitive impact set
thought db sizeDisk usage + entity/edge counts
thought db flushWipe the knowledge base
thought db backup <file>SQLite online-backup snapshot
thought db load <file>Load backup file
thought hook installInstall Claude Code hooks
thought diff --from <sha1> --to <sha2>Entity diff between commits

Sources: src/thought/cli.py:50-150

Database Lifecycle Management

THOUGHT provides comprehensive database management commands under thought db:

Backup and Restore

graph LR
    A[Production DB] -->|thought db backup| B[backup.db]
    B -->|thought db load| C[Production DB]
    B -->|thought db inspect| D[Inspection Report]

The backup system uses SQLite's online backup API, ensuring consistent snapshots even during active writes. Date filters can produce clean, self-contained subset files. Sources: src/thought/storage/sqlite/backend.py:100-200

Flush Operations

Flush commands support date-bounded deletion:

  • --before X: Delete facts valid before date X
  • --since X: Delete facts learned since date X
  • --time-axis valid|learned|created: Choose which time axis to filter

All destructive operations automatically back up to <db>.bak.<timestamp> before proceeding.

Git History Integration

THOUGHT can ingest git repositories with two modes:

ModeBehaviorUse Case
snapshot (default)Ingest HEAD only, stamp with HEAD SHAFast code analysis
fullWalk every commit, stamp with commit SHABi-temporal historical queries

The GitWalker class shells out to git commands rather than using native libraries, avoiding C extension dependencies while maintaining cross-platform compatibility. Sources: src/thought/ingest/code/git_walker.py:1-50

graph TD
    A[thought ingest-git] --> B{Snapshot Mode?}
    B -->|Yes| C[Ingest HEAD only]
    B -->|No| D[Walk all commits]
    C --> E[Stamp with HEAD SHA]
    D --> F[Stamp each entity with commit SHA]
    E --> G[Enable as_of queries]
    F --> G

Bi-temporal Model

THOUGHT's bi-temporal model tracks two independent timelines for every fact:

Time AxisDescriptionQuestion Answered
Valid TimeWhen a fact was true in reality"What was true on date X?"
Learned TimeWhen THOUGHT learned the fact"What did the system know on date X?"

This distinction enables sophisticated queries like:

MATCH (e:Entity)
WHERE e.valid_from <= date('2024-01-01')
  AND (e.valid_until IS NULL OR e.valid_until > date('2024-01-01'))
RETURN e

Contradictions surface as CONTRADICTS edges—they're treated as data rather than warnings, allowing you to query them directly. Sources: src/thought/cli.py:1-50

LLM Provider Integration

THOUGHT supports multiple LLM providers for natural language processing:

ProviderFeatures
OllamaNative /api/embed (batched), OpenAI-compatible fallback
LM StudioOpenAI-compatible API
Any OpenAI-compatible serverStandard embedding endpoints

The embedder selection defaults to auto, which probes for sentence_transformers and falls back to a deterministic embedder when the optional dependency is unavailable. Sources: src/thought/storage/sqlite/backend.py:200-300

Code Extraction Support

THOUGHT can parse and extract entities from multiple programming languages:

LanguageExtractorKey Features
Pythonpython_extractor.pyAST-based import tracking, class/function detection
TypeScripttypescript_extractor.pyTree-sitter parsing, heritage analysis
Rustrust_extractor.pyModule system, impl block handling
PHPphp_extractor.pyNamespace handling, method visibility

All extractors produce consistent CodeEntity and CodeEdge objects that integrate with the unified graph model. Sources: src/thought/ingest/code/python_extractor.py:1-50

Getting Started

Initialization

thought init --db-path .thought/thought.db --embedder auto

This creates:

  1. The SQLite database file
  2. A thought.toml configuration file
  3. A CLAUDE.md file for MCP client integration

Quick Start Commands

# Ingest a git repository
thought ingest-git ./my-project --mode snapshot

# Recall something semantically
thought recall "authentication middleware"

# Ask a natural language question
thought ask "what calls the authenticate_user function?"

# Find impact of changing a function
thought impact MyClass.my_method

Configuration

THOUGHT uses a thought.toml file for configuration:

SectionOptionDefaultDescription
databasepath.thought/thought.dbDatabase file path
llmproviderautoLLM provider selection
embeddermodelautoEmbedding model
scopesdefaultsharedDefault scope for new entities

Configuration can be overridden via CLI flags or environment variables.

Sources: src/thought/models.py:50-100

Quickstart Guide

Related topics: Introduction to THOUGHT, Installation and Setup

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Initial Setup

Continue reading this section for the full explanation and source context.

Section Configuration File

Continue reading this section for the full explanation and source context.

Section Ingest Commands

Continue reading this section for the full explanation and source context.

Related topics: Introduction to THOUGHT, Installation and Setup

Quickstart Guide

Overview

THOUGHT is a local-AI memory tool designed to manage knowledge bases, run on local models, write graph queries, and query in natural language. It provides a comprehensive CLI for ingesting information, recalling facts, and performing code analysis with graph-based relationships.

Sources: CHANGELOG.md

Architecture Overview

graph TD
    subgraph "THOUGHT Core"
        CLI[CLI Interface]
        DB[(SQLite Database)]
        EMB[Embedder Layer]
        GRAPH[Graph Layer]
    end
    
    subgraph "Ingestion Sources"
        CODE[Code Ingest]
        PROSE[Prose Ingest]
        LEGAL[Legal Ingest]
    end
    
    subgraph "Query Interface"
        RECALL[Recall Command]
        REPL[Interactive REPL]
        MCP[MCP Server]
    end
    
    CLI --> DB
    CLI --> EMB
    EMB --> DB
    CODE --> CLI
    PROSE --> CLI
    LEGAL --> CLI
    RECALL --> GRAPH
    REPL --> GRAPH
    MCP --> GRAPH
    GRAPH --> DB

Sources: src/thought/cli.py

Installation and Initialization

Initial Setup

Run the init command to create the database, configuration file, and CLAUDE.md helper:

thought init

The init command accepts several options:

OptionDefaultDescription
--configthought.tomlPath to configuration file
--db-path.thought/thought.dbSQLite database path
--embedderautoEmbedder type: auto, sentence-transformers, or deterministic
--write-claude-mdtrueDrop a CLAUDE.md for MCP clients
--quickfalseSkip first-run embedder warmup

Sources: src/thought/cli.py:57-78

Configuration File

The init command creates a thought.toml configuration file with the following structure:

[database]
path = ".thought/thought.db"

[embedder]
type = "auto"  # or "ollama", "lm_studio", "openai_compatible"

[llm]
provider = "auto"

Core Commands

Ingest Commands

THOUGHT supports multiple ingestion modes:

CommandPurpose
thought ingest TEXTOne-shot remember from command line
thought ingest --file PATHIngest a single file
thought ingest --glob PATBulk-ingest matching files
thought ingest --stdinBulk-ingest one line-per-item from stdin

Sources: src/thought/cli.py:30-42

Code Ingestion

The code ingest pipeline extracts entities and relationships from source files:

thought ingest --file src/main.py
thought ingest --glob "**/*.py"

The code extractor produces:

  • Entities: modules, functions, classes, methods
  • Edges: IMPORTS, INHERITS_FROM, DEFINES, OVERRIDES

Sources: src/thought/ingest/code/pipeline.py

Git-Aware Ingest

For bi-temporal code analysis:

thought ingest-git <repo> --mode snapshot  # Fast: HEAD only
thought ingest-git <repo> --mode full      # Walk every commit

This enables as_of queries against historical commits.

Sources: CHANGELOG.md

Recall and Query

thought recall "what did I learn about authentication?"
thought repl

The recall command returns up to 10 results with ranked relevance. Use as_of and scope to narrow results further.

Sources: src/thought/cli.py

Database Management

CommandDescription
thought db sizeDisk usage + entity/edge counts
thought db flushWipe the KB (with backup)
thought db backup <file>SQLite backup snapshot
thought db load <file>Load a backup file
thought db inspect <file>Inspect backup without loading

Sources: CHANGELOG.md

Code Analysis Commands

Callers and Impact Analysis

# Find who calls a function (ranked by PageRank)
thought callers authenticate_user

# Transitive impact: what's affected if I change this?
thought impact JWTValidator

Sources: src/thought/layers/code.py

Diff Between Commits

thought diff --from abc1234 --to def5678

This shows entities added/removed between two ingested commits.

Built-in Demos

Run audience-specific walkthroughs:

thought demo code        # Agent/developer flow (14-stage walkthrough)
thought demo writer       # Novelist/paper author
thought demo legal        # Investigator/paralegal
thought demo researcher   # Academic use case
thought demo all          # Run all demos sequentially

Each demo runs end-to-end in a self-cleaning temporary directory and produces a structured DemoReport.

Sources: src/thought/demo.py

Entity Data Model

@dataclass(frozen=True)
class CodeEntity:
    name: str           # Qualified name (e.g., "ClassName.method_name")
    type_: CodeEntityType  # "module" | "function" | "class" | "method" | "file"
    language: str       # Programming language
    file_path: str      # POSIX-style relative path
    line_start: int     # Starting line number
    line_end: int       # Ending line number
    signature: str      # Function/class signature
    docstring: str | None
    visibility: Literal["public", "private"]

Sources: src/thought/ingest/code/types.py

Supported Languages

The code ingestion pipeline supports:

LanguageExtractorFile Extension
Pythonpython_extractor.py.py
TypeScripttypescript_extractor.py.ts, .tsx
PHPphp_extractor.py.php
Rustrust_extractor.py.rs

MCP Server

Start the MCP server for integration with Claude Code:

thought serve                          # stdio transport (default)
thought serve --transport streamable-http  # HTTP transport

Sources: src/thought/cli.py

Utility Commands

CommandDescription
thought statsDisplay knowledge base statistics
thought forget PATTERNSoft-delete entities matching SQL LIKE pattern
thought consolidateRun one consolidation cycle
thought doctorEnvironment health check

Bi-Temporal Model

THOUGHT uses a bi-temporal model for knowledge tracking:

  • valid_from / valid_until: When facts were true in reality
  • learned_at / unlearned_at: When the system learned/corrected facts

Query variants:

  • as_of_kind='valid' — "what was true on date X"
  • as_of_kind='learned' — "what did the system know on date X"

Sources: src/thought/models.py

Sources: CHANGELOG.md

Installation and Setup

Related topics: Quickstart Guide, System Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Standard Installation

Continue reading this section for the full explanation and source context.

Section Development Installation

Continue reading this section for the full explanation and source context.

Section Init Command Signature

Continue reading this section for the full explanation and source context.

Related topics: Quickstart Guide, System Architecture

Installation and Setup

Overview

The thought-mcp project provides a comprehensive CLI tool and MCP (Model Context Protocol) server for AI-powered memory and knowledge management. The installation and setup process involves initializing the local SQLite database, configuring MCP clients (Claude Code, Cursor, etc.), and optionally setting up Claude Code hooks for automated memory operations.

The setup system is designed with idempotency in mind — installations can be safely re-run without disrupting existing configurations.

System Architecture

graph TD
    A[User] --> B[thought CLI]
    B --> C[init command]
    C --> D[SQLite Database]
    C --> E[thought.toml Config]
    C --> F[CLAUDE.md Agent Hint]
    B --> G[MCP Server]
    G --> D
    B --> H[Client Install]
    H --> I[Claude Code]
    H --> J[Cursor]
    H --> K[VS Code]
    B --> L[Hook Install]
    L --> M[.claude/settings.json]

Prerequisites

ComponentRequirementNotes
Python>= 3.10Core runtime
GitOn PATHUsed by git pipeline for code ingestion
SQLite3.xBundled with Python stdlib
pip/pipxLatestPackage installation

Sources: CONTRIBUTING.md

Installation Methods

Standard Installation

pip install thought-mcp

Development Installation

git clone https://github.com/RNBBarrett/thought-mcp.git
cd thought-mcp
pip install -e ".[dev]"

CLI Initialization

The thought init command establishes the complete working environment. It creates three essential components in sequence.

Init Command Signature

@app.command()
def init(
    config: Path = typer.Option("thought.toml", help="Path to config file."),
    db_path: str = typer.Option(".thought/thought.db", help="SQLite database path."),
    embedder: str = typer.Option(
        "auto", help="'auto' picks sentence-transformers if available, else deterministic.",
    ),
    write_claude_md: bool = typer.Option(
        True, "--write-claude-md/--no-claude-md",
        help="Drop a CLAUDE.md so MCP clients learn how to use the tool.",
    ),
    quick: bool = typer.Option(
        False, "--quick", help="Skip first-run embedder warmup.",
    ),
) -> None:

Sources: src/thought/cli.py:35-56

What Init Creates

graph LR
    A[thought init] --> B[Create .thought/ directory]
    A --> C[Create SQLite DB file]
    A --> D[Write thought.toml config]
    A --> E[Write CLAUDE.md]
    
    B --> F[parents=True<br/>exist_ok=True]
    C --> G[DB auto-backed up<br/>before destructive ops]

#### 1. Database Initialization

The command creates the SQLite database at the specified path. Parent directories are created automatically using parents=True to ensure the path exists.

Path(db_path).parent.mkdir(parents=True, exist_ok=True)

Sources: src/thought/cli.py:52-53

#### 2. Configuration File

The thought.toml file contains runtime configuration including embedder settings and database paths.

#### 3. CLAUDE.md Agent Hint

When write_claude_md=True (default), the init command drops a CLAUDE.md file that teaches MCP clients how to interact with the tool.

Embedder Configuration

The init command supports three embedder modes:

ModeBehaviorDependencies
auto (default)Uses sentence-transformers if available, falls back to deterministic embeddingsOptional: sentence-transformers
sentence-transformersUses local transformer models for embeddingsRequired: sentence-transformers
deterministicUses hash-based embeddings, no ML dependenciesNone

The --quick flag skips the first-run embedder warmup process.

MCP Client Installation

The thought clients install command merges a thought MCP server entry into your client's configuration file.

Supported Clients

ClientConfig Location
Claude Code.claude/settings.json
Cursor~/.cursor/settings.json
VS Code~/.cursor/settings.json

Installation Workflow

graph TD
    A[thought clients install] --> B{Check config exists?}
    B -->|No| C[Create new config file]
    B -->|Yes| D[Read existing JSON]
    C --> E{Valid JSON object?}
    D --> E
    E -->|Yes| F[Merge mcpServers entry]
    E -->|No| G[Return error]
    F --> H{Backup enabled?}
    H -->|Yes| I[Create .thought.bak backup]
    H -->|No| J[Write merged config]
    I --> J
    J --> K[Return ClientInstallResult]

Client Install Result States

@dataclass(frozen=True)
class ClientInstallResult:
    client: ClientName
    path: Path
    status: Literal["installed", "already_present", "no_path", "error"]
    detail: str = ""

Sources: src/thought/clients.py

Server Block Structure

The MCP server configuration block includes:

  • Server name (thought)
  • Command to execute
  • Server arguments
  • Environment variables for database path

Claude Code Hook Installation

The thought hooks install command adds hook entries to Claude Code's settings for automated memory operations.

Hook Types

Hook KindClaude Code EventCommand
recallUserPromptSubmitthought hook recall
writeStopthought hook write
contextSessionStartthought hook context

Sources: src/thought/hooks/install.py:17-22

Hook Installation Options

def settings_path(*, scope: Literal["project", "user"] = "project") -> Path:
    """Return the ``.claude/settings.json`` path for the requested scope.

    Project scope is the recommended default — it travels with the repo and
    is what most users actually want for THOUGHT-flavoured auto-memory.
    """
    if scope == "project":
        return Path.cwd() / ".claude" / "settings.json"

Hook Install Process

graph TD
    A[thought hooks install recall] --> B{Backup enabled?}
    B -->|Yes| C[Create settings.json.thought.bak]
    B -->|No| D[Read settings.json]
    C --> D
    D --> E{Valid JSON?}
    E -->|Yes| F[Merge recall hook entry]
    E -->|No| G[Return error]
    F --> H{Entry exists?}
    H -->|Yes| I[Return already_present]
    H -->|No| J[Write updated settings.json]
    J --> K[Return HookInstallResult]

Hook Install Result

@dataclass(frozen=True)
class HookInstallResult:
    kind: HookKind
    path: Path
    status: Literal["installed", "already_present", "error"]
    detail: str = ""

Sources: src/thought/hooks/install.py:28-32

Quick Start Guide

Step 1: Initialize the Environment

# Standard initialization
thought init

# Skip embedder warmup for faster startup
thought init --quick

# Custom database location
thought init --db-path /path/to/custom.db

Step 2: Install MCP Client

# Install for Claude Code
thought clients install claude_code

# Install for Cursor
thought clients install cursor

Step 3: Install Claude Code Hooks (Optional)

# Install recall hook (automatic memory on user input)
thought hooks install recall

# Install write hook (save memory on session stop)
thought hooks install write

# Install context hook (load memory on session start)
thought hooks install context

# Install all hooks
thought hooks install recall --kind write --kind context

Database Lifecycle Management

Database Size Check

thought db size

Shows disk usage of main + WAL + SHM sidecars plus entity/edge counts.

Database Backup

thought db backup <file>

Creates an SQLite online-backup snapshot. Date filters produce a clean, self-contained subset file with DELETE + VACUUM after backup.

Database Restore

thought db load <file>

Atomically replaces the active database with the backup file. Use --merge to INSERT-OR-IGNORE rows from the snapshot instead of replacing.

Database Flush

# Full flush with confirmation
thought db flush

# Skip confirmation
thought db flush --yes

# Date-bounded flush
thought db flush --before 2024-01-01
thought db flush --since 2024-06-01
Note: All destructive operations auto-backup to <db>.bak.<timestamp> before proceeding.

Verifying Installation

Run the Demo

# Run code audience demo
thought demo code

# Run all demos
thought demo all

The demo runs an audience-specific walkthrough end-to-end in a self-cleaning temporary directory, verifying the installation works correctly.

Health Check

thought doctor

Performs an environment health check to verify all dependencies and configurations are correct.

Configuration File Format

thought.toml

[database]
path = ".thought/thought.db"

[embedder]
type = "auto"  # or "sentence-transformers", "deterministic"

[server]
name = "thought"
transport = "stdio"  # or "streamable-http"

Troubleshooting

Common Issues

IssueSolution
Config file not foundRun thought init first
Database lockedCheck for other thought processes
Embedder initialization slowUse --quick flag or deterministic embedder
MCP client not connectingVerify client config has correct server entry

Reset Installation

# Backup current database
thought db backup /path/to/backup.db

# Flush and reinitialize
thought db flush --yes
thought init --db-path .thought/thought.db

Next Steps

After installation and setup, users typically:

  1. Ingest code: thought ingest-git <repo> to analyze repository code
  2. Recall information: thought recall <query> to query the knowledge base
  3. Run agents: Use reference agents like the vulnerability scanner or OSINT aggregator

Sources: CONTRIBUTING.md

System Architecture

Related topics: Introduction to THOUGHT, Storage and Database Layer, Memory Model and Data Structures

Section Related Pages

Continue reading this section for the full explanation and source context.

Section MCP Server (src/thought/server.py)

Continue reading this section for the full explanation and source context.

Section Query Router and Classifier

Continue reading this section for the full explanation and source context.

Section Code Layer (src/thought/layers/code.py)

Continue reading this section for the full explanation and source context.

Related topics: Introduction to THOUGHT, Storage and Database Layer, Memory Model and Data Structures

System Architecture

Overview

The thought-mcp project is a Model Context Protocol (MCP) server implementation that provides an intelligent memory and code analysis system for AI-assisted development. The system combines semantic memory storage with code graph analysis, enabling natural language queries against codebases through a bi-temporal knowledge graph.

High-Level Architecture

graph TD
    subgraph "Client Layer"
        MCP[MCP Client]
        CLI[Thought CLI]
        Hooks[Claude Code Hooks]
    end
    
    subgraph "Server Layer"
        Server[MCP Server]
        Router[Query Router]
        Classifier[Query Classifier]
    end
    
    subgraph "Memory Layer"
        Memory[Memory Manager]
        Recall[Recall Engine]
        Ask[Ask - NL to Cypher]
    end
    
    subgraph "Storage Layer"
        Backend[SQLite Backend]
        Entities[Entity Store]
        Edges[Edge Store]
        Embeddings[Vector Embeddings]
    end
    
    subgraph "Ingest Layer"
        CodePipeline[Code Pipeline]
        GitPipeline[Git Pipeline]
        Extractors[Language Extractors]
    end
    
    MCP --> Server
    CLI --> Server
    Hooks --> Server
    Server --> Router
    Router --> Classifier
    Classifier --> Memory
    Memory --> Backend
    CodePipeline --> Backend
    GitPipeline --> Backend
    Ask --> Recall

Core Components

MCP Server (`src/thought/server.py`)

The MCP server exposes the primary tool interface for AI clients. It implements async tool handlers that delegate to the memory layer.

Key Tools:

ToolPurpose
recallSemantic recall of entities using embeddings
askNatural language queries translated to Cypher
working_contextContext primitive for agent awareness
scanIncremental code scanning with change detection

Sources: src/thought/server.py:1-100

Query Router and Classifier

The system routes queries through a classification system that detects:

  • CODE queries: Triggered by code-shaped keywords (function, class, caller, callee, file extensions) plus camelCase/snake_case identifiers
  • CHANGE queries: Historical or diff-based queries
  • HYBRID combinations: CODE × CHANGE patterns like "what changed in auth.middleware since v1.0"
graph LR
    Q[Query] --> C[Classifier]
    C --> |CODE| CR[Code Route]
    C --> |CHANGE| CH[Change Route]
    C --> |HYBRID| HY[Hybrid Route]
    C --> |DEFAULT| DF[Default Recall]

Sources: CHANGELOG.md:1-80

Code Layer (`src/thought/layers/code.py`)

The code layer provides a high-level API for code-specific graph queries against the currently-valid view of the code graph.

class CodeLayer:
    def callers_of(name)    # Who calls this function
    def callees_of(name)    # What this function calls
    def impact_set(name)    # Transitive callers, ranked
    def defines_in_file()   # Entities in a given file

Sources: src/thought/layers/code.py:1-60

Storage Architecture

SQLite Backend

The system uses SQLite as its primary storage with the following schema features:

  • Bi-temporal model: Tracks valid_from/valid_until (business time) and learned_at (system knowledge time)
  • Entity/Edge tables with code-specific columns (code_file, code_language, code_commit_sha)
  • Partial indexes for efficient queries
  • WAL mode with checkpointing for consistent backups

Data Models

Entity Structure:

@dataclass
class CodeEntity:
    name: str
    type_: str           # function, class, module, method
    language: str        # python, typescript, rust, php
    file_path: str
    line_start: int
    line_end: int
    signature: str
    docstring: str
    visibility: str      # public, private, protected
    attrs: dict

Edge Types:

  • CALLS - Function/method invocations
  • INHERITS_FROM - Class inheritance
  • IMPORTS - Module imports
  • DEFINES - Member definitions within classes
  • OVERRIDES - Method overrides (TypeScript)

Sources: src/thought/ingest/code/pipeline.py:1-100

Code Ingestion Pipeline

Language Extractors

The system uses tree-sitter parsers for multi-language code extraction:

LanguageFileCapabilities
Pythonpython_extractor.pyFunctions, classes, imports, inheritance
TypeScripttypescript_extractor.pyFunctions, classes, imports, exports, inheritance, overrides
Rustrust_extractor.pyFunctions, impl blocks, traits
PHPphp_extractor.pyFunctions, classes, methods, namespaces

All extractors output CodeEntity and CodeEdge tuples parsed from AST nodes.

Sources: src/thought/ingest/code/python_extractor.py:1-80

Code Pipeline Flow

graph TD
    F[File Input] --> LD[Language Detection]
    LD --> EX[Extract Entities/Edges]
    EX --> SI[Upsert Source]
    SI --> WE[_write_entities]
    WE --> EE[Embed Signatures]
    EE --> WEd[_write_edges]
    WEd --> CM[Commit Transaction]
    
    subgraph "Entities Processing"
        WE --> |"name_to_id map"| WEd
    end

Sources: src/thought/ingest/code/pipeline.py:100-200

Git Pipeline (`src/thought/ingest/code/git_pipeline.py`)

The git pipeline enables historical code analysis with two modes:

ModeBehavior
snapshotFast - ingest HEAD only, stamp entities with HEAD SHA
fullWalk every commit chronologically, stamp each entity with its commit SHA

The full mode enables bi-temporal as_of queries against historical commits.

Sources: src/thought/ingest/code/git_pipeline.py:1-50

Query System

Recall Engine

Semantic recall uses vector embeddings to find entities by intent rather than exact name:

def recall(
    query: str,
    scope: str = "all",
    owner_id: str | None = None,
    limit: int = 10,
) -> list[RecallHit]

The system embeds entity signatures and docstrings during ingestion, enabling natural queries like "who calls authenticate_user".

Ask Engine (`src/thought/query/ask.py`)

Natural language to Cypher translation with validation:

graph LR
    NL[Natural Language] --> PROMPT[Build Prompt]
    PROMPT --> LLM[LLM Provider]
    LLM --> CY[Cypher Query]
    CY --> VAL[Validate]
    VAL --> |Valid| EXE[Execute]
    VAL --> |Invalid| FB[Fallback to Recall]

Constraint System:

  • Read-only Cypher features only (MATCH, WHERE, RETURN)
  • Validates against actual schema before execution
  • Falls back to recall() on translation failures

Sources: src/thought/query/ask.py:1-80

Integration Points

MCP Client Installation (`src/thought/clients.py`)

The system installs as an MCP server for AI coding tools:

def install(client: ClientName, *, server_name: str = "thought")

Supported clients include Claude Code and other MCP-compatible tools. Installation merges configuration without disturbing existing settings.

Claude Code Hooks (`src/thought/hooks/install.py`)

Hooks provide automatic memory integration:

HookEventAction
recallUserPromptSubmitMemory recall on user input
writeStopContext capture on completion
contextSessionStartSession initialization

Sources: src/thought/hooks/install.py:1-50

CLI Architecture (`src/thought/cli.py`)

The command-line interface provides database lifecycle management:

CommandFunction
thought initCreate database + config + CLAUDE.md
thought db sizeDisk usage + entity/edge counts
thought db flushWipe KB with backup
thought db backupSQLite online-backup snapshot
thought db loadLoad snapshot atomically
thought db inspectCount + schema summary
thought ingest-gitGit-history-aware ingestion
thought callersDirect callers via PageRank
thought impactTransitive impact set
thought diffEntity diff between commits

Sources: src/thought/cli.py:1-100

Demo System (`src/thought/demo.py`)

The built-in demo provides audience-specific walkthroughs:

AudiencePurpose
codeAgent/developer flow - 14-stage code vertical
writerNovelist/paper author - bi-temporal recall
legalInvestigator - contradiction detection
researcherAcademic - claim/source relationships

Sources: src/thought/demo.py:1-50

Configuration

Database Initialization

# thought.toml
[database]
path = ".thought/thought.db"

[llm]
provider = "anthropic"  # or ollama, lmstudio, openai-compat

[embedder]
type = "auto"  # sentence-transformers if available, else deterministic

Sources: src/thought/cli.py:50-80

Summary

The thought-mcp architecture combines:

  1. MCP Server - Tool interface for AI clients
  2. Bi-temporal Storage - SQLite with code-specific schema
  3. Multi-language Extractors - Tree-sitter based AST parsing
  4. Git Integration - Historical code analysis
  5. Query Routing - Classification-based query dispatch
  6. Natural Language Interface - NL to Cypher translation

This design enables both real-time code assistance and deep historical analysis of codebases through a unified query interface.

Sources: src/thought/server.py:1-100

Storage and Database Layer

Related topics: System Architecture, Memory Model and Data Structures

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: System Architecture, Memory Model and Data Structures

Storage and Database Layer

Overview

The Storage and Database Layer is the persistence backbone of the THOUGHT system, providing a structured SQLite-based knowledge base (KB) for storing entities, edges, embeddings, and operational metadata. This layer abstracts database operations through a modular backend interface, enabling CRUD operations, bi-temporal data tracking, and specialized queries for code analysis.

The architecture supports:

  • Entity/Edge persistence with bi-temporal validity tracking (valid_from, valid_until, learned_at)
  • Vector embeddings for semantic recall operations
  • Source tracking for ingested content provenance
  • Code-specific metadata including language, file path, and commit SHA
  • Agent and scan logging for operational auditability

Sources: src/thought/storage/__init__.py

Sources: src/thought/storage/__init__.py

Memory Model and Data Structures

Related topics: System Architecture, Storage and Database Layer, Query and Retrieval System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Vector Layer (src/thought/layers/vector.py)

Continue reading this section for the full explanation and source context.

Section Graph Layer (src/thought/layers/graph.py)

Continue reading this section for the full explanation and source context.

Section Temporal Layer (src/thought/layers/temporal.py)

Continue reading this section for the full explanation and source context.

Related topics: System Architecture, Storage and Database Layer, Query and Retrieval System

Memory Model and Data Structures

Overview

The thought-mcp repository implements a multi-layered memory architecture designed for AI-assisted knowledge management. The memory model combines vector embeddings for semantic search, graph relationships for structural querying, and temporal versioning for historical analysis. This hybrid approach enables both intuitive natural-language recall and precise code-intent queries.

The core memory system operates as a knowledge base (KB) with bi-temporal semantics, tracking when facts became true (valid_from) versus when the system learned them (learned_at). This design supports time-travel queries that answer "what was true on date X" or "what did the system know on date X".

Architecture Layers

The memory system is organized into three distinct but interconnected layers:

graph TD
    A[User Input] --> B[Memory Layer]
    B --> C[Vector Layer]
    B --> D[Graph Layer]
    B --> E[Temporal Layer]
    C --> F[SQLite Backend]
    D --> F
    E --> F
    G[Query/Recall] --> B

Vector Layer (`src/thought/layers/vector.py`)

The vector layer handles semantic embedding and similarity search. It stores dense vector representations of entities enabling natural-language recall based on meaning rather than exact keyword matching.

Core Responsibilities:

  • Embed text content (entity names, signatures, docstrings) into high-dimensional vectors
  • Store embeddings with model metadata (name, version, dimensions)
  • Perform similarity searches against the embedded corpus
  • Support fallback to deterministic embeddings when ML models are unavailable

Key Components:

ComponentPurpose
VectorStorePersists embeddings in SQLite with metadata
EmbedderBase protocol for embedding models
OllamaEmbedderIntegration with Ollama's /api/embed endpoint
DeterministicEmbedderFallback using hash-based vectors

Sources: src/thought/layers/vector.py

Graph Layer (`src/thought/layers/graph.py`)

The graph layer manages entity-relationship data structures and supports Cypher-style traversals. It maintains the structural knowledge of how entities connect to each other.

Entity Types Supported:

TypeDescription
moduleSource file or namespace unit
classClass or type declarations
functionFunction definitions
methodClass methods
factGeneral knowledge facts
claimAcademic/research claims
sourceCitation or reference
witnessLegal testimony statements

Edge Relation Types:

RelationMeaning
IMPORTSModule dependency relationship
INHERITS_FROMClass inheritance
DEFINESContainer defines a member
OVERRIDESMethod overrides parent
CALLSFunction invocation
REFERS_TOGeneral reference
CONTRADICTSLogical opposition between facts

Sources: src/thought/layers/graph.py

Temporal Layer (`src/thought/layers/temporal.py`)

The temporal layer implements bi-temporal data modeling, tracking both valid time and learned time for all entities. This enables sophisticated time-travel queries and contradiction detection.

Bi-Temporal Model:

graph LR
    A[Entity] --> B[valid_from<br/>When fact became true]
    A --> C[learned_at<br/>When KB learned fact]
    D[as_of valid] --> E[Historical state query]
    D --> F[as_of learned<br/>System knowledge query]

Key Temporal Features:

  • valid_from: Timestamp when the fact became true in reality
  • learned_at: Timestamp when the system recorded the fact
  • valid_until: Optional expiration of fact validity
  • CONTRADICTS edges: Automatically surface when facts conflict across time axes

Sources: src/thought/layers/temporal.py

Core Data Models

Entity Model (`src/thought/models.py`)

The base Entity model represents all stored knowledge items in the system.

class Entity:
    id: str                    # Unique identifier
    name: str                  # Canonical name
    type: str                  # Entity type (see table above)
    scope: str                  # "shared" or "private"
    owner_id: str | None        # Owner for private entities
    valid_from: datetime        # When fact became true
    learned_at: datetime        # When system learned it
    source_ref: str | None      # Reference to source document
    tier: str                   # "hot", "warm", "cold" (access frequency)
    attrs: dict                 # Type-specific attributes

Entity Attributes by Type:

Entity TypeKey Attributes
code_*code_file, code_language, code_commit_sha, signature, visibility, line_start, line_end
factpredicates, unique_predicates, source_doc
claimcitation_key, reliability_score

Sources: src/thought/models.py

Code Entities (`src/thought/ingest/entities.py`)

Code-specific entities extend the base model with language-aware attributes:

class CodeEntity:
    name: str
    type_: str                  # "module", "class", "function", "method"
    language: str               # "python", "typescript", "rust", "php"
    file_path: str
    line_start: int
    line_end: int
    signature: str              # Function/class signature
    visibility: str             # "public", "private", "protected"
    docstring: str | None
    attrs: dict                 # Language-specific (e.g., `class` for methods)

Edge Model (`src/thought/ingest/entities.py`)

Relationships between entities are modeled as typed, directed edges:

class CodeEdge:
    source_name: str             # Origin entity
    target_name: str             # Destination entity
    relation_type: str          # IMPORTS, DEFINES, INHERITS_FROM, etc.
    line_number: int | None
    attrs: dict

Sources: src/thought/ingest/entities.py

Consolidation Engine (`src/thought/consolidation/engine.py`)

The consolidation engine handles fact deduplication, merging, and contradiction detection. It processes incoming data through a pipeline that ensures data quality and consistency.

graph TD
    A[Raw Input] --> B[Jaccard Deduplication]
    B --> C[Fact Extraction]
    C --> D[Predicate Matching]
    D --> E{Conflict?}
    E -->|Yes| F[Create CONTRADICTS Edge]
    E -->|No| G[Merge into KB]
    F --> G

Consolidation Pipeline Steps:

  1. Jaccard Deduplication: Skip content with >50% overlap to existing facts
  2. Fact Extraction: Parse structured predicates from unstructured text
  3. Predicate Matching: Match against existing knowledge using unique predicates
  4. Contradiction Detection: Create CONTRADICTS edges when facts conflict
  5. Entity Merging: Upsert with identity (name, code_file, code_commit_sha)

Sources: src/thought/consolidation/engine.py

Storage Backend

The system uses SQLite as its primary storage engine with the following schema:

graph TD
    A[SQLite Database] --> B[entities table]
    A --> C[edges table]
    A --> D[embeddings table]
    A --> E[applied_migrations table]
    B --> F[code_file<br/>code_language<br/>code_commit_sha]
    C --> G[relation_type<br/>source_name<br/>target_name]
    D --> H[model_name<br/>model_version<br/>vector BLOB]

Key Backend Classes:

ClassResponsibility
BackendCore CRUD operations on entities/edges
find_code_entity()Fast lookup by name + file/commit disambiguators
upsert_entity()Insert or update with identity awareness
store_embedding()Persist vectors with model metadata

Sources: src/thought/storage/sqlite/backend.py (inferred from CHANGELOG.md)

Query Pathways

The memory system supports multiple query mechanisms:

def recall(
    query: str,
    scope: str = "all",
    owner_id: str | None = None,
    max_results: int = 10,
) -> list[RecallResult]

Returns up to 10 semantically similar entities based on embedding similarity.

Ask (Natural Language to Cypher)

Routes natural-language questions through an LLM to generate Cypher queries:

QUESTION: "who calls authenticate_user"
→ CYPHER: MATCH (caller)-[:CALLS]->(f:Function {name: 'authenticate_user'}) 
          RETURN caller.name

Sources: src/thought/query/ask.py

Code Intelligence Queries

CommandPurpose
thought callers <name>Direct callers via Personalized PageRank
thought impact <name>Transitive impact set (what breaks if changed)
thought diff --from SHA1 --to SHA2Entity diff between commits

Ingest Pipelines

Code ingestion follows a standardized pipeline:

graph TD
    A[Source File] --> B[Language Detection]
    B --> C[AST Parser<br/>tree-sitter]
    C --> D[Extractor<br/>Language-specific]
    D --> E[CodeEntity list]
    D --> F[CodeEdge list]
    E --> G[Embedding]
    G --> H[Backend upsert]
    F --> H
    H --> I[Call Graph Builder<br/>optional]

Supported Languages:

  • Python (.py) - via tree-sitter-python
  • TypeScript (.ts, .tsx) - via tree-sitter-typescript
  • Rust (.rs) - via tree-sitter-rust
  • PHP (.php) - via tree-sitter-php

Extracted Metadata:

  • Module/namespace names
  • Class declarations with heritage (extends, implements)
  • Function and method definitions
  • Import/use declarations
  • Visibility modifiers (public, private, protected)

Sources: src/thought/ingest/code/python_extractor.py, src/thought/ingest/code/typescript_extractor.py

Auto-Memory Hooks

The system integrates with Claude Code via hooks for automatic memory management:

HookEventAction
recallUserPromptSubmitEmbeds prompt, recalls relevant context
writeStopExtracts facts from session transcript
contextSessionStartLoads relevant context for new session

Sources: src/thought/hooks/install.py

Versioning and Snapshots

The storage layer supports full database lifecycle management:

OperationDescription
db sizeDisk usage + entity/edge counts
db flushWipe KB with date-bounded options
db backup <file>SQLite online backup snapshot
db load <file>Restore or merge from snapshot
db inspect <file>Preview backup without loading

WAL (Write-Ahead Logging) checkpoints ensure consistent backups.

Summary

The thought-mcp memory model implements a production-grade knowledge management system with:

  1. Three-layer architecture: Vector for semantics, Graph for structure, Temporal for history
  2. Bi-temporal semantics: Tracks both validity and knowledge acquisition times
  3. Code-aware extraction: AST-based parsing for multiple programming languages
  4. Contradiction detection: Automatic CONTRADICTS edges between conflicting facts
  5. Multiple query pathways: Semantic recall, natural-language Cypher, and code-intelligence commands
  6. Git-aware versioning: Commits can be stamped on entities for historical queries

This architecture enables sophisticated AI memory capabilities while maintaining query performance through strategic use of SQLite with proper indexing.

Sources: src/thought/layers/vector.py

Query and Retrieval System

Related topics: Memory Model and Data Structures, Storage and Database Layer, Agent Adapters and SDK Integration

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Query Types

Continue reading this section for the full explanation and source context.

Section CODE Query Detection

Continue reading this section for the full explanation and source context.

Section Translation Process

Continue reading this section for the full explanation and source context.

Related topics: Memory Model and Data Structures, Storage and Database Layer, Agent Adapters and SDK Integration

Query and Retrieval System

The Query and Retrieval System is a core subsystem within the thought-mcp project that enables users to query the knowledge graph using natural language. It translates human-readable questions into structured Cypher queries or SQL statements, executes them against the underlying SQLite backend, and returns ranked, relevant results. The system serves as the primary interface for retrieving facts, code entities, relationships, and historical data stored in the memory database.

Architecture Overview

The Query and Retrieval System is composed of several interconnected modules that work together to process, route, and execute queries. At its core, the system leverages a Router to classify incoming queries into semantic categories, then delegates processing to specialized handlers based on the query type.

graph TD
    A[User Query] --> B[Router]
    B --> C{Code Query?}
    B --> D{Natural Language?}
    B --> E{Search Query?}
    C --> F[Code Layer]
    D --> G[Ask Module]
    G --> H[Cypher Translator]
    H --> I[Query Validator]
    I --> J[SQLite Backend]
    E --> K[Recall Hook]
    K --> J
    J --> L[Results]
    F --> L

The system follows a layered approach where queries are first classified by intent, then transformed into appropriate database queries. Natural language queries are translated to Cypher through an LLM-based translator, while code-specific queries bypass translation and directly execute predefined graph traversal operations.

Query Classification

The Router module plays a critical role in determining how each query should be processed. Based on keyword detection and pattern matching, queries are classified into distinct types that trigger different handling paths.

Query Types

Query TypeTrigger KeywordsHandlerUse Case
CODEfunction, class, caller, callee, impact, file extensions, camelCase identifiersCodeLayerCode graph traversal
CHANGEsince v1.0, before this commit, diffGitIngestReportVersion-aware queries
HYBRIDCODE × CHANGE combinationsGraphLayer + GitWalkerHistorical code analysis
SEARCHGeneral textRecall HookSemantic search
ASKNatural language questionsAsk ModuleNatural language to Cypher

Sources: src/thought/query/ask.py:1-30

CODE Query Detection

The CODE query class is triggered by code-shaped keywords and identifier patterns. This includes function names, class declarations, caller/callee relationships, file extensions such as .py or .ts, and version-related phrases like since v1.0 or before this commit. Additionally, camelCase and snake_case identifiers automatically route to the CODE handler, enabling queries like "who calls authenticate_user" to be processed through the call-graph machinery without explicit CLI invocation.

Sources: src/thought/query/ask.py:1-30

Natural Language to Cypher Translation

The Ask module (src/thought/query/ask.py) is responsible for translating natural language questions into Cypher queries. This translation is performed by an LLM provider configured in the [llm] section of the configuration file, supporting multiple backends through a unified interface.

Translation Process

sequenceDiagram
    participant U as User
    participant A as Ask Module
    participant L as LLM Provider
    participant V as Cypher Validator
    participant B as SQLite Backend
    
    U->>A: "What functions call authenticate_user?"
    A->>A: Build Prompt with Schema
    A->>L: Send Prompt
    L-->>A: Cypher Query
    A->>V: Validate Cypher
    alt Valid
        V->>B: Execute Query
        B-->>V: Results
        V-->>U: Ranked Results
    else Invalid
        A->>A: Fallback to Recall
        A-->>U: Semantic Search Results
    end

The translation process begins with constructing a prompt that includes the database schema, entity types, and relationship types. The LLM generates a Cypher query that is then validated against a parser before execution. If validation fails or the query cannot be executed, the system gracefully falls back to a plain recall() call, ensuring the user always receives some response.

Prompt Constraints

The Ask module enforces strict constraints on generated queries to maintain system safety and performance:

  • Only read-only Cypher features are permitted, including MATCH, WHERE, RETURN, LIMIT, and AS_OF
  • Query types are restricted to MERGE, CREATE, DELETE, SET, and WITH being explicitly forbidden
  • All entity types and relationship types must come from the defined schema
  • Single Cypher queries are required without explanations or markdown formatting

Sources: src/thought/query/ask.py:1-50

AskResult Data Model

The AskResult dataclass encapsulates the outcome of a query translation and execution attempt:

FieldTypeDescription
cypher`str \None`The generated Cypher query
sql`str \None`Alternative SQL query if applicable
rows`list[dict[str, Any]] \None`Query results
fallback_usedboolWhether fallback to recall was triggered
fallback_reasonstrExplanation if fallback occurred

Sources: src/thought/query/ask.py:1-50

Recall Hook

The recall hook (src/thought/hooks/recall.py) provides semantic search functionality as a fallback mechanism and primary retrieval method. It uses embedding vectors to find semantically similar entities in the knowledge graph, supporting the core recall operation used throughout the system.

Recall Behavior

Recall operations are bounded by design to prevent overwhelming the user with too many results. The system never returns more than 10 hits regardless of knowledge base size, encouraging users to narrow their queries using as_of and scope parameters for more targeted retrieval.

The recall mechanism supports bi-temporal queries through the as_of_kind parameter:

  • valid: Returns what was true on a given date, answering "what was true on date X"
  • learned: Returns what the system knew on a given date, answering "what did the system know on date X"

These two modes differ when facts are corrected after the fact, enabling users to perform historical analysis of their knowledge graph.

Sources: src/thought/query/views.py

Code Layer

The Code Layer (src/thought/layers/code.py) provides a specialized interface for code-specific graph queries. It wraps the GraphLayer with operations native to programmers, operating against the currently-valid view of the code graph using the valid_until IS NULL filter.

Core Operations

MethodDescriptionUse Case
callers_of(name)Direct callers ranked by PageRankFinding who uses a function
callees_of(name)Direct callees within the packageFinding what a function calls
impact_set(name)Transitive callers rankedDependency analysis
defines_in_file()All entities in a fileFile-level inspection

All four operations support optional as_of parameters to query historical snapshots when bi-temporal git ingest has been configured. The code_commit_sha field enables time-travel queries against the code graph.

Sources: src/thought/layers/code.py:1-50

Entity Resolution

The _resolve_entity_id method handles name resolution with multiple fallback strategies:

  1. Intra-file match with exact name
  2. Cross-file match with unique qualified suffix
  3. Cross-file bare-name match for top-level functions
  4. Stub creation for unresolved references

This multi-stage resolution ensures that queries like obj.method() can resolve to ClassName.method when it is unique in the knowledge base, and that bare function names can be found across different files.

Sources: src/thought/query/cypher.py

Cypher Query Engine

The Cypher module (src/thought/query/cypher.py) handles the parsing, validation, and execution of Cypher queries against the SQLite backend. It provides a bridge between the graph query language and the relational database storage.

Query Validation

Before executing any Cypher query, the system validates it against the defined grammar to prevent malformed queries from reaching the database. This validation step catches syntax errors, unsupported features, and schema violations before they can cause runtime errors.

Execution Model

Cypher queries are translated into equivalent SQL statements that operate against the SQLite schema. The translation preserves the semantic meaning of graph patterns while adapting them to the relational storage model used by the backend.

Views and Data Models

The views module (src/thought/query/views.py) defines the data structures and return formats used throughout the Query and Retrieval System.

Entity Model

The Entity model represents nodes in the knowledge graph with the following key attributes:

AttributeTypeDescription
idstrUnique identifier
typestrEntity type (PERSON, function, class, etc.)
namestrDisplay name
canonical_namestrNormalized name for matching
scopeScopeNameshared, private, or all
tierTierhot, warm, or cold
valid_fromdatetimeStart of validity period
valid_until`datetime \None`End of validity period
attrsdict[str, object]Additional attributes

Scope Filter

The ScopeFilter class determines visibility of entities based on ownership and scope:

  • shared: All entities with scope = "shared"
  • private: Entities matching both scope = "private" AND owner_id
  • all: Shared entities plus private entities owned by the requesting user

The scope filter generates SQL fragments that join against the entity table aliased as e, enabling fine-grained access control across the query system.

Sources: src/thought/models.py:1-80

CLI Commands

The Query and Retrieval System is exposed through several CLI commands under the thought command group:

CommandDescription
thought recall <query>Semantic search across the knowledge graph
thought ask <question>Natural language query with Cypher translation
thought callers <name>Find direct callers ranked by PageRank
thought callees <name>Find direct callees within the package
thought impact <name>Transitive impact set analysis
thought browse <name>Drill into a topic with PPR-ranked neighborhood
thought diff --from <sha1> --to <sha2>Compare entities between commits

Browse Command

The browse command (mcp__thought__browse_topic) implements a two-step resolution process. First, the name is matched against entity types for a type facet. If no type matches, the name is resolved as an entity using canonical-name matching, and the PPR-ranked neighborhood is returned. The via field in results indicates whether the hit came from type_facet, ppr, or bfs matching.

Sources: src/thought/cli.py

Configuration

The Query and Retrieval System respects configuration from the thought.toml file and environment variables:

OptionDefaultDescription
embedderautoEmbedder selection: auto, sentence-transformers, or deterministic
llm.provideropenaiLLM provider for Ask module
llm.modelvariesModel name for translation
db_path.thought/thought.dbSQLite database path

The auto embedder selector probes the sentence_transformers package via importlib.util.find_spec before returning the wrapper, falling back to the deterministic embedder when the optional dependency is missing.

Integration Points

The Query and Retrieval System integrates with several other subsystems:

  • Storage Layer: SQLite backend provides entity and edge persistence
  • Ingest System: Code extractors populate entities that are later queried
  • Memory Module: Coordinates between recall, browse, and scan operations
  • Server: Exposes query functionality via MCP protocol

The bidirectional relationship between the Code Layer and the Cypher query engine enables both natural language queries like "who calls authenticate_user" and structured queries using the CODE query class, providing flexibility for different user interaction patterns.

Error Handling

The system implements graceful degradation throughout the query pipeline. If Cypher translation fails or validation rejects the generated query, execution falls back to the recall hook, ensuring users always receive results. Bounded result sets prevent resource exhaustion, and the contradiction detection mechanism surfaces conflicts as CONTRADICTS edges in the graph rather than throwing errors, allowing downstream applications to handle them as data.

Sources: src/thought/query/ask.py:1-30

Multi-Language Code Parsing

Related topics: Git History Integration, Storage and Database Layer

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Dispatcher Pattern

Continue reading this section for the full explanation and source context.

Section CodeEntity

Continue reading this section for the full explanation and source context.

Section CodeEdge

Continue reading this section for the full explanation and source context.

Related topics: Git History Integration, Storage and Database Layer

Multi-Language Code Parsing

The Multi-Language Code Parsing system is the foundational code-vertical layer in THOUGHT. It provides language-agnostic AST extraction across six programming languages using tree-sitter grammars, produces standardized code entities and relationship edges, and enables downstream features like caller analysis, impact queries, and cross-file call-graph resolution.

Overview

The parsing system operates in two phases:

  1. Phase 1 – AST Extraction: Each language has a dedicated extractor that walks the tree-sitter parse tree and emits CodeEntity and CodeEdge objects.
  2. Phase 2 – Call Graph Resolution: After all files are ingested, a separate pass resolves CALLS edges by matching callee names against the entity index.

Sources: src/thought/ingest/code/ast_extractor.py:1-15

Supported Languages

The system supports six languages through language-specific extractors:

LanguageExtractor FileTree-sitter Grammar
Pythonpython_extractor.pytree-sitter-python
TypeScript / TSX / JSXtypescript_extractor.pytree-sitter-typescript
Gogo_extractor.pytree-sitter-go
Rustrust_extractor.pytree-sitter-rust
Javajava_extractor.pytree-sitter-java
PHPphp_extractor.pytree-sitter-php

Sources: src/thought/ingest/code/ast_extractor.py:30-55

Architecture

graph TD
    A[Code File] --> B[Language Detection]
    B --> C[ast_extractor.py Dispatcher]
    C --> D{Python?}
    C --> E{TypeScript?}
    C --> F{Go?}
    C --> G{Rust?}
    C --> H{Java?}
    C --> I{PHP?}
    D --> J[python_extractor.extract]
    E --> K[typescript_extractor.extract]
    F --> L[go_extractor.extract]
    G --> M[rust_extractor.extract]
    H --> N[java_extractor.extract]
    I --> O[php_extractor.extract]
    J --> P[(CodeEntity, CodeEdge)]
    K --> P
    L --> P
    M --> P
    N --> P
    O --> P
    P --> Q[CodeIngestPipeline]
    Q --> R[build_call_graph]
    R --> S[(CALLS Edges)]

Dispatcher Pattern

The ast_extractor.py module uses lazy loading to avoid importing heavy tree-sitter C extensions at module load time:

_REGISTRY: dict[str, Callable[[str, str], tuple[list[CodeEntity], list[CodeEdge]]]] = {}

def _python_extractor():
    from . import python_extractor
    return python_extractor.extract

Each language loader is registered in _LOADERS and invoked only when that language is first requested. Sources: src/thought/ingest/code/ast_extractor.py:9-35

Data Models

CodeEntity

Represents a code element extracted from the AST:

FieldTypeDescription
namestrCanonical identifier (module, function, class, method)
type_strEntity kind: module, function, class, method
languagestrSource language: python, typescript, go, rust, java, php
file_pathstrPath to source file (relative to repo root)
line_startint1-indexed start line
line_endint1-indexed end line
signaturestrDeclaration signature (e.g., module foo, def bar(self, x))
docstring`str \None`Extracted docstring text
visibilitystrpublic or private based on naming conventions
attrsdictLanguage-specific metadata

Sources: src/thought/ingest/code/python_extractor.py:14-25

CodeEdge

Represents a relationship between entities:

FieldTypeDescription
source_namestrEntity that is the subject of the relation
target_namestrEntity that is the object of the relation
relation_typestrOne of: IMPORTS, INHERITS_FROM, DEFINES, OVERRIDES, CALLS
line_numberintSource line where the relationship was discovered
attrsdictAdditional metadata (e.g., from_import: true)

Sources: src/thought/ingest/code/typescript_extractor.py:110-115

Extractor Interface

All language extractors share a common signature:

def extract(source: str, file_path: str) -> tuple[list[CodeEntity], list[CodeEdge]]:
    ...

This uniform interface allows the dispatcher to route to any language without knowing implementation details. Sources: src/thought/ingest/code/python_extractor.py:28-40

Supported Edge Types

RelationSourceTargetLanguages
IMPORTSmoduleimported modulePython, TypeScript, PHP, Go, Rust, Java
INHERITS_FROMclassparent classPython, TypeScript, Java, PHP
DEFINESclass/modulecontained memberAll languages
OVERRIDESmethodoverridden methodTypeScript (currently)
CALLSfunction/methodcalled functionAll (via call-graph pass)

Sources: src/thought/ingest/code/python_extractor.py:1-15, src/thought/ingest/code/typescript_extractor.py:1-20

Language-Specific Extractors

Python Extractor

The Python extractor uses tree-sitter-python and handles:

  • Module entities as the root node
  • Function definitions (function_item)
  • Class declarations (class_declaration)
  • Method definitions within classes
  • Import statements (import_from_statement, import_statement)
  • Class inheritance via base field
def extract(source: str, file_path: str) -> tuple[list[CodeEntity], list[CodeEdge]]:
    parser = _get_parser()
    source_bytes = source.encode("utf-8")
    tree = parser.parse(source_bytes)
    root = tree.root_node

    module_name = _module_name_from_path(file_path)
    entities: list[CodeEntity] = []
    edges: list[CodeEdge] = []

    entities.append(CodeEntity(
        name=module_name,
        type_="module",
        language="python",
        ...
    ))

Sources: src/thought/ingest/code/python_extractor.py:28-50

TypeScript Extractor

The TypeScript extractor supports both .ts and .tsx files using separate tree-sitter grammars:

def extract(source: str, file_path: str) -> tuple[list[CodeEntity], list[CodeEdge]]:
    use_tsx = file_path.endswith((".tsx", ".jsx"))
    parser = _get_parser(use_tsx=use_tsx)
    ...

Node types processed include function_declaration, arrow_function, class_declaration, method_definition, import_statement, and export_statement. Sources: src/thought/ingest/code/typescript_extractor.py:120-145

PHP Extractor

The PHP extractor handles files starting with <?php and recursively scans for definitions nested under namespace_definition blocks:

def _scan(node: Node) -> None:
    for child in node.named_children:
        ...

Sources: src/thought/ingest/code/php_extractor.py:45-60

Rust Extractor

The Rust extractor uses tree-sitter-rust and tracks method visibility through impl_type attributes:

out_entities.append(CodeEntity(
    name=qualified, type_="method", language="rust",
    visibility=_rust_visibility(child, source_bytes),
    attrs={"impl_type": type_name},
))

Sources: src/thought/ingest/code/rust_extractor.py:1-30

Call Graph Resolution

The call graph is built in a separate Phase 2 pass after all files are ingested. The build_call_graph function resolves callee references using a cascade of strategies:

  1. Exact match within same file — direct intra-file resolution
  2. Qualified suffix matchobj.method() resolves to ClassName.method
  3. Cross-file bare-name match — top-level functions defined elsewhere
  4. Stub creation — synthetic stub for unknown callees (filtered from impact graphs)
tgt_id = backend.find_code_entity(
    canonical_name=callee_name, scope_filter=sf, code_file=file_path,
)
if tgt_id is None and "." not in callee_name:
    # Unique qualified suffix match.
    rows = backend._conn.execute(
        "SELECT id FROM entities "
        "WHERE type IN ('method','function') AND valid_until IS NULL "
        "AND canonical_name LIKE ? ...",
        (f"%.{callee_name.lower()}", commit_sha),
    ).fetchall()

Sources: src/thought/ingest/code/call_graph.py:1-60

CodeIngestPipeline

The CodeIngestPipeline orchestrates the full ingest workflow:

  1. Reads source file content
  2. Detects or validates language
  3. Calls the appropriate extractor
  4. Creates a source reference record
  5. Writes entities within a single transaction
  6. Embeds entity signatures and docstrings for VIBE recall
  7. Writes edges and resolves call graph
graph LR
    A[Source File] --> B[detect_language]
    B --> C[extract entities/edges]
    C --> D[upsert_source]
    D --> E[begin transaction]
    E --> F[_write_entities + embed]
    F --> G[_write_edges]
    G --> H[build_call_graph]
    H --> I[commit]

The pipeline embeds entity signatures and docstrings so that queries like "who calls authenticate_user" can find functions by intent rather than exact name. Sources: src/thought/ingest/code/pipeline.py:1-80

CodeLayer API

The CodeLayer provides a high-level interface for code graph queries:

MethodDescription
callers_of(name)Direct callers, ranked by Personalized PageRank
callees_of(name)Direct callees (intra-package)
impact_set(name)Transitive callers, ranked — for thought impact command
defines_in_file(path)All entities discovered in a file

All methods operate against the currently-valid view (valid_until IS NULL). Pass as_of= for historical snapshots. Sources: src/thought/layers/code.py:1-40

Git-Aware Ingest

The GitWalker enables two ingestion modes:

ModeBehavior
snapshot (default)Ingest HEAD only, stamp every entity with HEAD SHA
fullWalk every commit chronologically, stamp each entity with its commit SHA

This enables bi-temporal as_of queries against historical commits. Sources: src/thought/ingest/code/git_pipeline.py:1-50

Configuration

Language is auto-detected by file extension when language=None:

ExtensionLanguage
.pypython
.ts, .tsx, .js, .jsxtypescript
.gogo
.rsrust
.javajava
.phpphp

Pass language= explicitly to override detection. Sources: src/thought/ingest/code/pipeline.py:25-35

Sources: src/thought/ingest/code/ast_extractor.py:1-15

Git History Integration

Related topics: Multi-Language Code Parsing, Memory Model and Data Structures

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Component Overview

Continue reading this section for the full explanation and source context.

Section Data Flow

Continue reading this section for the full explanation and source context.

Section GitWalker

Continue reading this section for the full explanation and source context.

Related topics: Multi-Language Code Parsing, Memory Model and Data Structures

Git History Integration

Overview

Git History Integration enables thought-mcp to ingest source code with full commit-level provenance, allowing bi-temporal queries that can reconstruct what a codebase looked like at any point in its history. This feature stamps every extracted code entity (functions, classes, modules) with the exact git commit SHA where it was discovered, creating a temporal graph that supports "as-of" queries.

The system provides two ingestion modes: a fast snapshot mode for current-state analysis and a comprehensive full-history mode for complete historical reconstruction.

Sources: CHANGELOG.md

Architecture

Component Overview

graph TD
    subgraph "Git History Integration"
        CLI["thought ingest-git CLI"]
        Pipeline["GitIngestPipeline"]
        Walker["GitWalker"]
        Storage["SQLite Backend"]
    end
    
    subgraph "Git Operations"
        Git["git executable"]
        RevParse["rev-parse HEAD"]
        Log["log --format"]
        LsTree["ls-tree -r"]
        Show["show <sha>:<path>"]
    end
    
    CLI --> Pipeline
    Pipeline --> Walker
    Walker --> Git
    Git --> RevParse
    Git --> Log
    Git --> LsTree
    Git --> Show
    Pipeline --> Storage

Data Flow

sequenceDiagram
    participant User
    participant CLI
    participant Pipeline
    participant Walker
    participant Extractor
    participant Backend
    
    User->>CLI: thought ingest-git /repo --mode full
    CLI->>Pipeline: run(repo_path, mode)
    
    alt snapshot mode
        Pipeline->>Walker: get_head_commit()
        Walker->>Git: rev-parse HEAD
        Git-->>Walker: sha
        Pipeline->>Pipeline: ingest single snapshot
    else full mode
        Pipeline->>Walker: get_all_commits()
        Walker->>Git: log --format
        Git-->>Walker: commit list
        Loop for each commit
            Pipeline->>Git: ls-tree -r sha
            Pipeline->>Git: show sha:path
            Git-->>Pipeline: file content
            Pipeline->>Extractor: extract(entities, edges)
            Extractor-->>Pipeline: CodeEntity[], CodeEdge[]
            Pipeline->>Backend: upsert with commit_sha
        end
    end
    
    Pipeline-->>User: GitIngestReport

Sources: src/thought/ingest/code/git_pipeline.py:1-95 Sources: src/thought/ingest/code/git_walker.py:1-60

Core Components

GitWalker

The GitWalker class provides a read-only interface to git repositories using pure subprocess calls. It deliberately avoids native dependencies like pygit2 to minimize installation footprint.

MethodGit CommandPurpose
get_head_sha()rev-parse HEADGet current HEAD commit SHA
get_all_commits()log --format=...List all commits chronologically
get_files_at_commit(sha)ls-tree -r <sha>List files in tree at commit
get_file_at_commit(sha, path)show <sha>:<path>Get file content at commit

#### Commit Data Model

@dataclass(frozen=True)
class Commit:
    sha: str                    # Full commit SHA
    author: str                 # Author name
    author_email: str           # Author email
    author_date: datetime      # Commit timestamp
    subject: str                # Commit message first line

Sources: src/thought/ingest/code/git_walker.py:24-31

#### Initialization Validation

def __init__(self, repo_path: Path | str) -> None:
    self.repo = Path(repo_path)
    if shutil.which("git") is None:
        raise RuntimeError("git executable not on PATH")
    if not (self.repo / ".git").exists():
        raise ValueError(f"not a git repository: {self.repo}")

The walker validates that:

  1. The git executable exists on PATH
  2. The target path is a valid git repository (contains .git directory)

Sources: src/thought/ingest/code/git_walker.py:35-42

GitIngestPipeline

The pipeline orchestrates the complete ingestion process, coordinating between git history traversal and code extraction.

ParameterTypeDescription
repo_pathPathPath to git repository
modeGitMode"snapshot" (HEAD only) or "full" (all commits)
patternstuple[str, ...]Glob patterns to filter files (e.g., *.py)

#### Ingestion Report

@dataclass(frozen=True)
class GitIngestReport:
    head_sha: str           # SHA of HEAD at time of ingest
    mode: GitMode           # Mode used for ingestion
    commits_visited: int    # Number of commits processed
    files_ingested: int     # Total files ingested
    call_edges: int         # Call graph edges created

Sources: src/thought/ingest/code/git_pipeline.py:35-41

Ingestion Modes

Snapshot Mode (Default)

Snapshot mode ingests only the current HEAD commit. This is the recommended mode for:

  • Initial repository ingestion
  • Quick code analysis workflows
  • When historical queries are not needed

Performance characteristics:

  • Single-pass through current tree
  • No duplicate processing
  • Typical runtime: seconds to minutes depending on repository size

Entity stamping: All extracted entities receive the HEAD SHA as their code_commit_sha attribute, enabling queries like "what did auth.middleware look like at HEAD?" or future comparisons.

Sources: src/thought/ingest/code/git_pipeline.py:7-16

Full History Mode

Full mode walks every commit in chronological order, ingesting the file tree at each point. This enables:

  • Historical queries: "what did function X look like at commit Y?"
  • Diff analysis between any two commits
  • Complete temporal reconstruction of code evolution

Performance considerations:

Repository SizeEstimated CommitsEstimated Time
Small (<100 files)~100~30 seconds
Medium (500 files)~1000~5 minutes
Large (1000+ files)~5000+~25+ minutes
Note: Full-history ingest is bounded by file count × commits. The per-commit cost is dominated by tree-sitter parsing, not git operations.

Sources: src/thought/ingest/code/git_pipeline.py:16-25

CLI Usage

Command Syntax

thought ingest-git <repo_path> [OPTIONS]

#### Options

OptionShortDefaultDescription
--mode--mode snapshot or --mode fullsnapshotIngestion mode
--paths--paths "*.py,*.js"*.pyComma-separated glob patterns
--config--config path/to/configthought.tomlConfiguration file

Examples

# Ingest current directory as git repo (HEAD only)
thought ingest-git .

# Ingest specific repository with full history
thought ingest-git /path/to/repo --mode full

# Ingest Python and TypeScript files only
thought ingest-git . --paths "*.py,*.ts,*.tsx"

# Ingest with full git history, multiple file types
thought ingest-git /project --mode full --paths "*.py,*.js,*.go"

Sources: src/thought/cli.py:90-120

Code Commit Stamping

Every extracted code entity receives metadata linking it to its source commit:

eid = self._backend.upsert_entity(
    # ... other fields ...
    code_file=ent.file_path,
    code_language=language,
    code_commit_sha=commit_sha,  # Links entity to specific commit
)

The database schema includes:

ColumnTypePurpose
code_fileTEXTFile path relative to repo root
code_languageTEXTProgramming language detected
code_commit_shaTEXTGit commit where entity was found

These columns have partial indexes for fast lookups by commit.

Sources: CHANGELOG.md Sources: src/thought/ingest/code/pipeline.py:60-75

CodeLayer Query Interface

The CodeLayer class provides convenience methods for querying the code graph with temporal awareness:

class CodeLayer:
    def callers_of(name, *, code_commit_sha=None)  # Find who calls this function
    def callees_of(name, *, code_commit_sha=None)  # Find what this function calls
    def impact_set(name)                            # Transitive callers, ranked
    def defines_in_file(path)                       # Entities in a file

Temporal Queries

All lookups operate against the currently-valid view of the code graph. To query historical snapshots, pass the as_of parameter or filter by code_commit_sha:

# Query current state
impact = code_layer.impact_set("authenticate_user")

# Query historical state (when full-history ingest was used)
impact_historical = code_layer.impact_set(
    "authenticate_user",
    code_commit_sha="abc123..."
)

Sources: src/thought/layers/code.py:1-50

Diff Between Commits

The system supports computing the difference between any two ingested commits:

thought diff --from <sha1> --to <sha2>

This returns:

  • Added entities: Entities present at --to but not at --from
  • Removed entities: Entities present at --from but not at --to

The diff operates on the set of entities by name, comparing their commit stamps.

Sources: CHANGELOG.md

Supported Languages

The git ingestion pipeline uses language-specific extractors:

LanguageExtractorExtensions
Pythonpython_extractor.py.py
Rustrust_extractor.py.rs
TypeScripttypescript_extractor.py.ts, .tsx
PHPphp_extractor.py.php

Each extractor uses tree-sitter for AST parsing, extracting:

  • Entities: modules, functions, classes, methods
  • Edges: IMPORTS, DEFINES, CALLS, INHERITS_FROM, OVERRIDES

Sources: src/thought/ingest/code/python_extractor.py Sources: src/thought/ingest/code/rust_extractor.py Sources: src/thought/ingest/code/typescript_extractor.py Sources: src/thought/ingest/code/php_extractor.py

Configuration

Thought Configuration (thought.toml)

[embedder]
type = "auto"  # or "ollama", "openai", "deterministic"

[storage]
path = "thought.db"

Environment Variables

VariableDescription
OLLAMA_BASE_URLOllama server URL for embeddings
OPENAI_API_KEYOpenAI API key for embeddings

Best Practices

Initial Ingestion

  1. Start with snapshot mode to verify the setup works
  2. Run thought stats to confirm entities were created
  3. Query a function to verify call graph edges exist

Full History Ingestion

  1. Ensure adequate disk space (full mode creates temporary copies)
  2. Use --paths to filter to relevant file types on large repos
  3. Consider running during off-peak hours for large repositories

Query Optimization

  • Use code_file filter when querying specific files
  • Use code_commit_sha filter for historical lookups
  • Combine with vector similarity for intent-based queries

Troubleshooting

"git executable not on PATH"

Solution: Install git or ensure it's in your system PATH.

# Verify git is available
git --version

"not a git repository"

Solution: Ensure the path contains a .git directory:

# Initialize if needed
git init

Slow Full-History Ingestion

Mitigation:

  • Use --paths to filter file types
  • Use snapshot mode for initial setup
  • Consider parallelizing with multiple --paths passes

Summary

Git History Integration transforms thought-mcp from a current-state code analysis tool into a full temporal code repository that can answer questions about code at any point in history. By combining git's commit tracking with bi-temporal database queries, users can reconstruct how functions evolved, who called what across commits, and the complete impact chain of changes over time.

The architecture prioritizes:

  • No native dependencies: Pure subprocess git operations
  • Two-mode flexibility: Fast snapshots or complete history
  • Temporal provenance: Every entity stamped with its commit SHA
  • Language generality: Support for multiple programming languages via tree-sitter

Sources: CHANGELOG.md

Agent Adapters and SDK Integration

Related topics: Query and Retrieval System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Purpose and Scope

Continue reading this section for the full explanation and source context.

Section Core Methods

Continue reading this section for the full explanation and source context.

Section Working Context Structure

Continue reading this section for the full explanation and source context.

Related topics: Query and Retrieval System

Agent Adapters and SDK Integration

Overview

The Agent Adapters and SDK Integration subsystem provides a seamless bridge between THOUGHT's knowledge base and external AI agent frameworks. This system enables any Claude-Agent-SDK-shaped agent to interact with THOUGHT's memory, context retrieval, and code analysis capabilities through a standardized adapter interface.

The integration layer consists of three primary components:

  1. Claude SDK Adapter (ThoughtMemoryProvider) — A drop-in memory adapter for Claude Agent SDK
  2. MCP Server Surface — Exposes core primitives via the Model Context Protocol
  3. Claude Code Hook Installer — Integrates THOUGHT directly into Claude Code's event loop

Sources: CHANGELOG.md

Architecture Overview

graph TD
    subgraph "Agent Frameworks"
        ClaudeSDK[Claude Agent SDK]
        ClaudeCode[Claude Code CLI]
        MCPClients[MCP-Compatible Clients]
    end

    subgraph "THOUGHT Integration Layer"
        ClaudeSDKAdapter[ThoughtMemoryProvider]
        MCPServer[MCP Server Surface]
        HookInstaller[Claude Code Hook Installer]
    end

    subgraph "Core THOUGHT"
        Memory[Memory / Knowledge Base]
        Embedder[Embedder Service]
        CodeAnalysis[Code Analysis Engine]
        Backend[SQLite Backend]
    end

    ClaudeSDK --> ClaudeSDKAdapter
    ClaudeSDKAdapter --> Memory
    ClaudeSDKAdapter --> Embedder
    
    ClaudeCode --> HookInstaller
    HookInstaller --> Memory
    
    MCPClients --> MCPServer
    MCPServer --> Memory
    MCPServer --> CodeAnalysis
    MCPServer --> Backend

    Memory --> Backend
    Embedder --> Backend
    CodeAnalysis --> Backend

The Claude SDK Adapter

Purpose and Scope

The ThoughtMemoryProvider class serves as a drop-in memory adapter for any Claude-Agent-SDK-shaped agent. It wraps THOUGHT's core memory primitives and exposes them through a familiar interface that agent developers expect.

Sources: src/thought/adapters/claude_sdk.py

Core Methods

The adapter implements three primary methods that cover the complete agent loop:

MethodPurposeReturns
context_for(target, role)Returns a working-context dict for a specific target entity and roledict with anchor, neighbours, recent_contradictions, role_view
render_context(target)Returns the same payload as a plain-text system-prompt augmentationstr formatted for LLM consumption
record(content)Persists what the agent learned to the knowledge basestr — source ID of recorded content
scan(repo_path)Runs an incremental scan under the agent's namedict with scan results

Sources: src/thought/adapters/claude_sdk.py

Working Context Structure

The context_for() method returns a ranked, role-aware payload containing:

{
    "anchor": "<entity-name>",           # The target entity
    "neighbours": [...],                  # Top-K related entities
    "recent_contradictions": [...],       # Entities that contradict this one
    "role_view": "<saved-view-name>"      # Optional named view for the role
}

The context is token-budgeted to prevent overwhelming the agent's context window.

Sources: CHANGELOG.md

Integration Flow

sequenceDiagram
    participant Agent as Claude Agent SDK
    participant Adapter as ThoughtMemoryProvider
    participant Memory as THOUGHT Memory
    participant Embedder as Embedder Service
    participant Backend as SQLite Backend

    Agent->>Adapter: context_for("authenticate", role="code")
    Adapter->>Memory: working_context(target, role, budget_tokens)
    Memory->>Embedder: embed("authenticate")
    Embedder->>Memory: vector embedding
    Memory->>Backend: query similar entities
    Backend-->>Memory: ranked entity results
    Memory-->>Adapter: structured context dict
    Adapter-->>Agent: context payload

    Agent->>Adapter: record("Learned: auth uses JWT")
    Adapter->>Backend: upsert_source(content, mime_type)
    Adapter->>Backend: store entity + edges
    Backend-->>Adapter: source_id
    Adapter-->>Agent: source_id

MCP Server Surface

The MCP (Model Context Protocol) server exposes THOUGHT's primitives as tools that any MCP-compatible client can invoke.

Sources: src/thought/server.py

Available Tools

#### working_context

Universal "what does my agent need to know about X right now" primitive.

@app.tool()
async def working_context(
    target: str,           # "function:authenticate" / "chapter:5" / entity name
    role: str = "default", # Contextual role for view filtering
    budget_tokens: int = 1024,
    scope: str | None = None,
    owner_id: str | None = None,
) -> dict

Returns:

{
    "anchor": str,
    "neighbours": list[dict],
    "recent_contradictions": list[dict],
    "role_view": str | None
}

Sources: src/thought/server.py:48-63

#### scan

Incremental code-scan primitive for keeping the knowledge base current.

@app.tool()
async def scan(
    repo_path: str,           # Repository to scan
    agent: str | None = None, # Agent name for scan attribution
    since: str | None = None, # Only files changed since this time/commit
    max_files: int | None = None,
    note: str | None = None,
) -> dict

Sources: src/thought/server.py:65-78

#### scan_log_list

Lists recent scan runs for tracking incremental progress.

@app.tool()
async def scan_log_list(
    agent: str | None = None,
    limit: int = 10,
) -> dict

Returns:

{
    "scans": [
        {
            "scan_id": str,
            "agent": str,
            "timestamp": str,
            "files_processed": int,
            "note": str | None
        },
        ...
    ]
}

Sources: src/thought/server.py:80-91

Client Installation

THOUGHT supports installation into multiple MCP-compatible clients. The installation process merges a thought MCP server entry into the client's configuration file.

Sources: src/thought/clients.py

Supported Clients

ClientConfiguration Path
Project.claude/settings.json
User~/.claude/settings.json

Sources: src/thought/clients.py

Installation Function

def install(
    client: ClientName,
    *,
    server_name: str = "thought",
    block: dict | None = None,
    backup: bool = True,
) -> ClientInstallResult

Parameters:

ParameterTypeDefaultDescription
clientClientNameRequiredTarget client name
server_namestr"thought"Name for the server entry
block`dict \None`NoneCustom server block; defaults to server_block()
backupboolTrueBackup existing config before modification

Return Type: ClientInstallResult

@dataclass
class ClientInstallResult:
    client: ClientName
    path: Path | None
    status: Literal["installed", "already_present", "error", "no_path"]
    detail: str = ""

Sources: src/thought/clients.py

Installation Behavior

The install() function performs the following:

  1. Read existing config — Parses the client's JSON configuration
  2. Merge server entry — Adds the thought server block under mcpServers
  3. Backup — Creates settings.json.thought.bak before any write
  4. Idempotency check — Returns already_present if entry exists and matches

Sources: src/thought/clients.py

graph TD
    A[install called] --> B{Config exists?}
    B -->|No| C[Create new config]
    B -->|Yes| D{Valid JSON?}
    D -->|No| E[Return error]
    D -->|Yes| F{Server entry exists?}
    F -->|Yes, matches| G[Return already_present]
    F -->|Yes, differs| H[Backup config]
    F -->|No| I[Add server entry]
    H --> J[Write merged config]
    I --> J
    C --> J
    J --> K[Return installed]

Claude Code Hook Integration

The hook installer provides Claude Code event-driven integration, enabling THOUGHT to automatically capture context at key points in the development workflow.

Sources: src/thought/hooks/install.py

Hook Kinds

Hook KindClaude Code EventCommandTrigger
recallUserPromptSubmitthought hook recallAfter user submits a prompt
writeStopthought hook writeAfter agent completes work
contextSessionStartthought hook contextWhen session begins

Sources: src/thought/hooks/install.py:15-22

Hook Installation Result

@dataclass(frozen=True)
class HookInstallResult:
    kind: HookKind
    path: Path
    status: Literal["installed", "already_present", "error"]
    detail: str = ""

Settings Path Resolution

def settings_path(*, scope: Literal["project", "user"] = "project") -> Path

Sources: src/thought/hooks/install.py:41-50

Demo Integration

The thought demo command includes a built-in walkthrough specifically for the Claude Agent SDK adapter:

- ``code``  Agent / developer flow — the 14-stage code-vertical
            walkthrough including agent identity, ``thought scan``,
            ``working_context``, 4 new-language extractors, and the
            Claude Agent SDK adapter.

Sources: src/thought/demo.py

Demo Audiences

AudiencePurposeKey Features
codeAgent/developer flowSDK adapter, scan, working_context
writerNovelist/paper authorBi-temporal model, contradiction detection
legalInvestigator/paralegalunique_predicates, CONTRADICTS edges
researcherAcademicClaim/source pairs, Cypher queries
allSequential all audiencesFull demonstration suite

Configuration

Environment Variables

The integration layer respects the following environment variables for embedder configuration:

VariablePurpose
THOUGHT_DB_PATHOverride database path
THOUGHT_EMBEDDEREmbedder choice (auto, sentence-transformers, etc.)
THOUGHT_OLLAMA_HOSTOllama server host
THOUGHT_OLLAMA_MODELOllama model name
THOUGHT_LMSTUDIO_URLLM Studio server URL
THOUGHT_LMSTUDIO_MODELLM Studio model name
THOUGHT_OPENAI_COMPAT_URLOpenAI-compatible API URL
THOUGHT_OPENAI_COMPAT_MODELOpenAI-compatible model name
THOUGHT_OPENAI_COMPAT_API_KEYAPI key for OpenAI-compatible endpoints

Sources: src/thought/config.py

Config File (`thought.toml`)

[embedding]
choice = "auto"  # or specific embedder name

[db]
path = ".thought/thought.db"

Dependencies

The adapter package requires the following extras:

[project.optional-dependencies]
adapters = ["httpx>=0.27"]

Sources: CHANGELOG.md

Usage Example

from thought.adapters.claude_sdk import ThoughtMemoryProvider

# Initialize adapter
memory = ThoughtMemoryProvider()

# Get working context for a function
context = memory.context_for(
    target="authenticate_user",
    role="security-reviewer",
    budget_tokens=2048,
)

# Record what the agent learned
source_id = memory.record(
    "Session token validation happens in this function. "
    "Uses HMAC-SHA256 for signature verification."
)

# Run incremental scan
result = memory.scan(
    repo_path="/path/to/project",
    agent="security-audit",
    note="Weekly security review scan"
)

Summary

The Agent Adapters and SDK Integration system provides three complementary pathways for integrating THOUGHT with external agents:

  1. Direct SDK IntegrationThoughtMemoryProvider for Claude Agent SDK agents
  2. MCP Protocol — Standard tool interface for any MCP-compatible client
  3. Claude Code Hooks — Event-driven integration for Claude Code CLI users

All pathways share the same underlying memory primitives, ensuring consistent behavior regardless of how the agent connects to THOUGHT.

Sources: CHANGELOG.md

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

medium Configuration risk needs validation

Users may get misleading failures or incomplete behavior unless configuration is checked carefully.

medium README/documentation is current enough for a first validation pass.

The project should not be treated as fully validated until this signal is reviewed.

medium v0.2.1 — thought upgrade + mcp-extras fix

Users cannot judge support quality until recent activity, releases, and issue response are checked.

medium Maintainer activity is unknown

Users cannot judge support quality until recent activity, releases, and issue response are checked.

Doramagic Pitfall Log

Doramagic extracted 8 source-linked risk signals. Review them before installing or handing real data to the project.

1. Configuration risk: Configuration risk needs validation

  • Severity: medium
  • Finding: Configuration risk is backed by a source signal: Configuration risk needs validation. Treat it as a review item until the current version is checked.
  • User impact: Users may get misleading failures or incomplete behavior unless configuration is checked carefully.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: capability.host_targets | github_repo:1238261514 | https://github.com/RNBBarrett/thought-mcp | host_targets=mcp_host, claude, claude_code, chatgpt

2. Capability assumption: README/documentation is current enough for a first validation pass.

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: The project should not be treated as fully validated until this signal is reviewed.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: capability.assumptions | github_repo:1238261514 | https://github.com/RNBBarrett/thought-mcp | README/documentation is current enough for a first validation pass.

3. Maintenance risk: v0.2.1 — thought upgrade + mcp-extras fix

  • Severity: medium
  • Finding: Maintenance risk is backed by a source signal: v0.2.1 — thought upgrade + mcp-extras fix. Treat it as a review item until the current version is checked.
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/RNBBarrett/thought-mcp/releases/tag/v0.2.1

4. Maintenance risk: Maintainer activity is unknown

  • Severity: medium
  • Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:1238261514 | https://github.com/RNBBarrett/thought-mcp | last_activity_observed missing

5. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: downstream_validation.risk_items | github_repo:1238261514 | https://github.com/RNBBarrett/thought-mcp | no_demo; severity=medium

6. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: risks.scoring_risks | github_repo:1238261514 | https://github.com/RNBBarrett/thought-mcp | no_demo; severity=medium

7. Maintenance risk: issue_or_pr_quality=unknown

  • Severity: low
  • Finding: issue_or_pr_quality=unknown。
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:1238261514 | https://github.com/RNBBarrett/thought-mcp | issue_or_pr_quality=unknown

8. Maintenance risk: release_recency=unknown

  • Severity: low
  • Finding: release_recency=unknown。
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:1238261514 | https://github.com/RNBBarrett/thought-mcp | release_recency=unknown

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 3

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using thought-mcp with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence