# https://github.com/AryanSaini26/CodeAtlas Project Manual

Generated at: 2026-06-21 00:30:17 UTC

## Table of Contents

- [Overview and System Architecture](#page-1)
- [Code Graph Engine, Parsers, and Search](#page-2)
- [Agent Interfaces, CLI, HTTP API, and Web UI](#page-3)
- [Hosted Gateway, Deployment, GitHub Integration, and Operations](#page-4)

<a id='page-1'></a>

## Overview and System Architecture

### Related Pages

Related topics: [Code Graph Engine, Parsers, and Search](#page-2), [Agent Interfaces, CLI, HTTP API, and Web UI](#page-3), [Hosted Gateway, Deployment, GitHub Integration, and Operations](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md)
- [frontend/index.html](https://github.com/AryanSaini26/CodeAtlas/blob/main/frontend/index.html)
- [frontend/src/pages/Overview.tsx](https://github.com/AryanSaini26/CodeAtlas/blob/main/frontend/src/pages/Overview.tsx)
- [examples/mcp-claude/README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/examples/mcp-claude/README.md)
- [src/codeatlas/api/schemas.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/api/schemas.py)
- [src/codeatlas/parsers/python_parser.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/parsers/python_parser.py)
- [src/codeatlas/parsers/java_parser.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/parsers/java_parser.py)
- [src/codeatlas/parsers/kotlin_parser.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/parsers/kotlin_parser.py)
- [src/codeatlas/parsers/ruby_parser.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/parsers/ruby_parser.py)
- [src/codeatlas/parsers/scala_parser.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/parsers/scala_parser.py)
- [src/codeatlas/parsers/swift_parser.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/parsers/swift_parser.py)
</details>

# Overview and System Architecture

## Purpose and Scope

CodeAtlas is a code knowledge-graph platform that converts source files into a queryable graph of symbols and relationships, then exposes that graph to humans (via a web UI and CLI) and to AI coding agents (via an MCP server, FastAPI HTTP/JSON API, and semantic search). Its central claim is reducing the 60–80% of context window that AI agents typically waste re-orienting themselves inside an unfamiliar codebase. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).

The project ships in v1.0.0 with 24 supported languages, a SQLite + FTS5 graph store (zero external infrastructure), optional FAISS-based semantic search, a React web frontend, and a FastMCP server exposing 30 tools. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).

## Layered Architecture

CodeAtlas follows a four-layer architecture: language parsers feed a graph database, which is wrapped by an API/MCP layer, which in turn drives both a human-facing web UI and an agent-facing tool surface.

```mermaid
flowchart TB
    subgraph Sources
        A[Source files: .py, .ts, .go, .rs, .java, .kt, .rb, .scala, .swift, ...]
    end

    subgraph Parsers["Language Parsers (Tree-sitter AST)"]
        P1[Python]
        P2[TypeScript / TSX]
        P3[Go / Rust]
        P4[Java / Kotlin]
        P5[Ruby / Scala / Swift]
    end

    subgraph Core["Graph Core"]
        DB[(SQLite + FTS5)]
        VEC[(FAISS vector index - optional)]
        ALGO[PageRank, communities, cycles, hotspots]
    end

    subgraph Interfaces["Interfaces"]
        CLI[codeatlas CLI]
        API[FastAPI HTTP/JSON]
        MCP[FastMCP server - 30 tools]
    end

    subgraph Clients["Clients"]
        WEB[React web UI]
        AGENT[Claude Code / Cursor / agents]
        CI[GitHub webhooks / pre-commit]
    end

    A --> Parsers
    Parsers --> DB
    DB <--> VEC
    DB --> ALGO
    DB --> CLI
    DB --> API
    DB --> MCP
    CLI --> WEB
    API --> WEB
    MCP --> AGENT
    CI --> DB
```

Each layer has a clear responsibility boundary:

- **Parsers** extract `Symbol` and `Relationship` records from tree-sitter ASTs and emit them into a uniform intermediate model regardless of the source language. The Python parser, for example, walks `function_definition` / `class_definition` / `decorated_definition` nodes and emits `CALLS`, `INHERITS`, and `DECORATES` relationships with explicit source spans. Source: [src/codeatlas/parsers/python_parser.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/parsers/python_parser.py).
- **Graph core** persists everything into SQLite with FTS5 for keyword search and recursive-CTE traversals for graph queries. Optional FAISS indexes power semantic and hybrid search via reciprocal rank fusion. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).
- **Interfaces** expose the graph through three channels: a Click-based CLI, a FastAPI service with Pydantic schemas, and a FastMCP server for AI agents.
- **Clients** consume the interfaces: a React + react-force-graph UI for humans, agent hosts (Claude Code, Cursor) for AI, and CI integrations (watchdog file watcher, GitHub webhooks, pre-commit hooks) for continuous sync. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md), [examples/mcp-claude/README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/examples/mcp-claude/README.md).

## Parser Subsystem

The parser package follows a uniform contract: each language module produces a list of `Symbol` objects (id, name, qualified_name, kind, file_path, span, docstring, signature, decorators, language) and a list of `Relationship` objects (source_id, target_id, kind, file_path, span). The `RelationshipKind` enum covers at minimum `CALLS`, `INHERITS`, `DECORATES`, and unresolved references of the form `<unresolved>::Name`.

Examples of language-specific extraction strategies:

- **Python** captures decorators via `decorated_definition`, extracts calls from function bodies, and recurses into nested definitions under the parent qualified name. Source: [src/codeatlas/parsers/python_parser.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/parsers/python_parser.py).
- **Java** detects `final` modifier on fields to distinguish constants from variables, and reconstructs method signatures as `"returnType name(params)"` for richer display. Source: [src/codeatlas/parsers/java_parser.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/parsers/java_parser.py).
- **Kotlin** handles `class_declaration`, `object_declaration`, and `companion_object`, threading an `owner` parameter so nested functions become qualified methods. Source: [src/codeatlas/parsers/kotlin_parser.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/parsers/kotlin_parser.py).
- **Ruby** walks `class`, `module`, and method nodes, capturing the `superclass` field as an `INHERITS` edge. Source: [src/codeatlas/parsers/ruby_parser.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/parsers/ruby_parser.py).
- **Scala** builds signatures including explicit return types (`name(params): ReturnType`). Source: [src/codeatlas/parsers/scala_parser.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/parsers/scala_parser.py).
- **Swift** traverses `class_declaration`, `protocol_declaration`, and `inheritance_specifier` to capture both class and protocol conformance edges. Source: [src/codeatlas/parsers/swift_parser.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/parsers/swift_parser.py).

The frontend shell declares itself as a dark-themed SPA titled "CodeAtlas" with the description "Explore your code as a knowledge graph — PageRank, semantic search, diff, coverage gaps." Source: [frontend/index.html](https://github.com/AryanSaini26/CodeAtlas/blob/main/frontend/index.html).

## API, MCP, and Web Surfaces

The HTTP/JSON API uses Pydantic schemas for every response. Representative schemas include `PageRankResponse`, `HotspotsResponse`, `CommunitiesResponse`, `CoverageGapsResponse`, `DiffResponse`, `ReindexResponse`, and `ErrorResponse`, each modeled with explicit field names and types so consumers can rely on stable contracts. Source: [src/codeatlas/api/schemas.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/api/schemas.py).

The MCP server is configured for agent hosts by pointing the `mcpServers` block at `codeatlas serve --db <path>`. Once registered, agents can call tools such as `search_symbols`, `get_symbol_details`, `get_pagerank`, `find_path`, `get_dependencies`, `trace_call_chain`, `find_dead_code`, `analyze_complexity`, `get_hotspots`, `get_symbol_coverage`, and chain them automatically (for example, `search_symbols` → `get_symbol_details` → `get_dependencies`). Source: [examples/mcp-claude/README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/examples/mcp-claude/README.md).

The web frontend's Overview page surfaces the top symbols by PageRank, links into the Analysis tab for deeper views, and routes each result to a dedicated `/symbol/:id` page. Empty states explicitly instruct users to run `codeatlas index` first. Source: [frontend/src/pages/Overview.tsx](https://github.com/AryanSaini26/CodeAtlas/blob/main/frontend/src/pages/Overview.tsx).

## Data Flow and Sync

A typical end-to-end flow looks like this:

1. A user runs `codeatlas index [path]` (optionally with `--incremental`, `--watch`, or `--workers N`). Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).
2. Parsers emit `Symbol` and `Relationship` records per file; the graph core persists them into SQLite and, when installed, indexes embeddings into FAISS.
3. The graph is queried through the CLI, the FastAPI service, or the MCP server. The CLI offers `query`, `show`, `audit`, `find-path`, `coupling`, `hotspots`, `hubs`, `rank`, `communities`, `coverage-gaps`, `report`, and `agent-eval`. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).
4. A background `codeatlas index --watch` process, a `codeatlas pre-commit install` hook, or a GitHub webhook keeps the graph fresh as files change.

## Failure Modes and Operational Notes

Common failure modes a technical reader should plan for:

- **Empty results after install**: the web UI's Overview page renders an explicit "Run `codeatlas index` first" hint when no PageRank data exists. Source: [frontend/src/pages/Overview.tsx](https://github.com/AryanSaini26/CodeAtlas/blob/main/frontend/src/pages/Overview.tsx).
- **Unresolved references**: cross-file or cross-module references that cannot be resolved during parsing are stored as `<unresolved>::Name` placeholders and resolved later by a graph pass. Source: [src/codeatlas/parsers/python_parser.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/parsers/python_parser.py).
- **Semantic search unavailable**: the `[search]` and `[all]` extras install sentence-transformers and FAISS; without them, only FTS5 keyword and hybrid (`--hybrid`) queries are available. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).
- **MCP not registered**: until `codeatlas serve` is added to the host's `mcpServers` config and the host is restarted, agents cannot reach the graph. Source: [examples/mcp-claude/README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/examples/mcp-claude/README.md).

## See Also

- CLI Commands reference (in [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md))
- MCP tool catalog and Claude Code / Cursor setup (in [examples/mcp-claude/README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/examples/mcp-claude/README.md))
- API response schemas (in [src/codeatlas/api/schemas.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/api/schemas.py))
- Language parser implementations (under `src/codeatlas/parsers/`)
- v1.0.0 release notes (in the [GitHub releases page](https://github.com/AryanSaini26/CodeAtlas/releases/tag/v1.0.0))

---

<a id='page-2'></a>

## Code Graph Engine, Parsers, and Search

### Related Pages

Related topics: [Overview and System Architecture](#page-1), [Agent Interfaces, CLI, HTTP API, and Web UI](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md)
- [src/codeatlas/parsers/swift_parser.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/parsers/swift_parser.py)
- [src/codeatlas/parsers/ruby_parser.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/parsers/ruby_parser.py)
- [frontend/index.html](https://github.com/AryanSaini26/CodeAtlas/blob/main/frontend/index.html)
- [frontend/package.json](https://github.com/AryanSaini26/CodeAtlas/blob/main/frontend/package.json)
- [vscode-extension/README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/vscode-extension/README.md)
- [vscode-extension/package.json](https://github.com/AryanSaini26/CodeAtlas/blob/main/vscode-extension/package.json)
- [examples/mcp-claude/README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/examples/mcp-claude/README.md)
</details>

# Code Graph Engine, Parsers, and Search

## Architecture and Data Flow

CodeAtlas is positioned as a knowledge-graph engine for source code: tree-sitter parses files into symbols and relationships, a SQLite-backed graph store (with FTS5 and recursive CTEs) persists them, and a search layer exposes them through the CLI, MCP, HTTP API, web UI, and VS Code extension. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).

The README's stated pipeline is `Source Files → Tree-sitter AST → Symbols + Relationships → SQLite Graph`, fanning out to FTS5 keyword search, FAISS vector search, and graph analysis. The same store is read by CLI commands, the 30-tool MCP server, the FastAPI layer, and the React UI — so all surfaces see identical data. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).

```mermaid
flowchart LR
    FS[Source Files] --> TS[Tree-sitter AST]
    TS --> SYM[Symbols + Relationships]
    SYM --> SQL[(SQLite + FTS5)]
    SQL --> CLI[CLI]
    SQL --> MCP[MCP Server - 30 tools]
    SQL --> API[FastAPI HTTP API]
    SQL --> UI[React Web UI]
    SQL --> VS[VS Code Extension]
    SYM --> VS2[FAISS Vectors]
    SQL --> ANALY[Graph Analysis: PageRank, cycles, hotspots]
```

Design choices highlighted in the README: SQLite over Neo4j (zero infrastructure, ships with Python, FTS5 + recursive CTEs), FAISS over pgvector (local, no DB server), and tree-sitter over regex (incremental, cross-language). Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).

## Tree-sitter Parsers

CodeAtlas ships tree-sitter parsers for 24 languages, exposing classes, functions, interfaces, decorators, docstrings, imports, inheritance, generics, and language-specific constructs (JSDoc, Javadoc, KDoc, Scaladoc, XML doc comments, `///` Rust doc comments). Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).

### Internal structure (example: Swift and Ruby)

The parsers share a common shape: locate the language-specific declaration node, read its `name` child, build a stable symbol id from the file path and qualified name, attach the source span and docstring, then walk the body for nested declarations. The Ruby parser handles top-level statements via `_walk_toplevel`, dispatches to `_handle_class` (which records the symbol plus an `INHERITS` relationship from the `superclass` field), `_handle_module`, and method handlers, and resolves nested classes/methods inside the class body. Source: [src/codeatlas/parsers/ruby_parser.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/parsers/ruby_parser.py).

The Swift parser mirrors this for `class_declaration`, `protocol_declaration`, and friends: it locates the `type_identifier` for the name, emits the `Symbol`, then iterates `inheritance_specifier` children to emit one `INHERITS` relationship per parent type. Unresolved parents are written as `<unresolved>::ParentType`, which the resolver layer can later bind. It then recurses into `class_body` so nested declarations inherit the qualified scope. Source: [src/codeatlas/parsers/swift_parser.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/parsers/swift_parser.py).

### Indexing and parallelism

`codeatlas index` walks the repository, parses files, and writes them into the graph. Two flags shape behavior:

| Flag | Effect |
|---|---|
| `--incremental` | Only re-index files that changed since the last index |
| `--watch` | After indexing, keep watching for file changes (uses Watchdog) |
| `--workers N` | Parse files in parallel across N processes |

Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md). The companion `codeatlas watch` command exists as a standalone equivalent, and `codeatlas diff [path]` reports which files changed since the last index. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).

## Knowledge Graph Store and Graph Analysis

The graph lives in a SQLite database (default `.codeatlas/graph.db`) with FTS5 for keyword search and recursive CTEs for traversals like shortest-path and call-chain. Sources cite the design rationale explicitly: zero infrastructure, deterministic, and small enough to ship with Python. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).

Graph analysis primitives surfaced through both CLI and MCP include:

- **PageRank** — `codeatlas rank` ranks symbols weighted by caller importance rather than raw degree; `--kind class --json` restricts to a kind. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).
- **Communities** — `codeatlas communities` runs label propagation; nodes receive a `community_id`, and the UI offers a toggle between kind-coloring and community-coloring. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).
- **Cycles, dead code, hotspots, coverage gaps, coupling, shortest path** — exposed via `codeatlas audit`, `codeatlas coverage-gaps`, `codeatlas coupling`, `codeatlas find-path <src> <tgt>`, and `codeatlas hotspots [path]`. The `codeatlas report [path]` command composes them into a single health report, optionally as JSON. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).
- **Git-aware change impact** — the `codeatlas diff` and the MCP `get_change_impact` tool surface which symbols and files are affected by a change between refs; the web UI's Diff tab renders the same data as added/removed/modified columns. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).

The graph can be exported for downstream tooling in DOT (Graphviz), JSON (D3.js), Mermaid, GraphML, CSV, and Cypher via `codeatlas export --format ...`. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).

## Search Subsystem

CodeAtlas provides three search modes that share the same symbol store:

- **Full-text (FTS5)** — `codeatlas query <term>` runs a keyword search across the indexed corpus. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).
- **Semantic** — `codeatlas query <term> --semantic` runs natural-language vector search using sentence-transformers, requiring the `codeatlas[search]` extra. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).
- **Hybrid** — `codeatlas query <term> --hybrid` merges keyword and vector results with reciprocal rank fusion, giving the best of both lexical precision and semantic recall. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).

All three return the same shape (consumable via `--json`) and feed the Web UI's Search pane, where results open into a detail view showing signature, docstring, and incoming/outgoing references. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).

The MCP server exposes search as `search_symbols` (FTS5 with kind/file filters and query expansion) and `find_similar_code` (semantic); agents routinely chain `search_symbols → get_symbol_details → get_dependencies` to answer a single question — the key advantage over filesystem-only tools is querying a graph rather than a pile of text. Source: [examples/mcp-claude/README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/examples/mcp-claude/README.md).

## Downstream Surfaces

The same graph backs every consumer, which is the project's main architectural claim:

- **CLI** — `codeatlas show <symbol>`, `query`, `audit`, `find-path`, `hotspots`, `hubs`, `rank`, `communities`, `coverage-gaps`, `report`. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).
- **MCP server** — 30 tools (29 cited in the Claude example, 30 in the feature table) for AI agents; example prompts include "Find every caller of `parse_file`" and "What symbols were added or modified between `HEAD` and `main`?". Source: [examples/mcp-claude/README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/examples/mcp-claude/README.md), [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).
- **HTTP API + Web UI** — `codeatlas ui` serves both on `localhost:8080`; the frontend (`codeatlas-web` v1.0.0) is a Vite + React + Tailwind SPA built on `react-force-graph-2d`, `@tanstack/react-query`, `react-router-dom`, and `zustand`. Source: [frontend/package.json](https://github.com/AryanSaini26/CodeAtlas/blob/main/frontend/package.json), [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md). The UI exposes Dashboard, Search, Analysis, Graph, Symbols, Diff, and Settings tabs. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).
- **VS Code extension** — `codeatlas-vscode` v0.1.0 commands: Open Web UI, Search Symbols, Show Symbol at Cursor, Show Impact Radius, Build Agent Context. It targets `codeatlas.apiBase` (default `http://127.0.0.1:8080`) and optionally sends `X-API-Key` for servers started with `--api-key`. Source: [vscode-extension/README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/vscode-extension/README.md), [vscode-extension/package.json](https://github.com/AryanSaini26/CodeAtlas/blob/main/vscode-extension/package.json).
- **Real-time sync** — Watchdog-based `codeatlas index --watch` plus a GitHub webhook handler (`codeatlas webhook /path/to/repo --port 9000 --secret YOUR_WEBHOOK_SECRET`) for push-driven incremental updates. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).

## Common Failure Modes and Operational Notes

A few behaviors documented in the README and examples are worth keeping in mind when integrating:

- **Cross-encoder reranker is opt-in.** The README explicitly notes that the optional reranker "did **not** beat the graph/lexical baseline on the code-symbol suite" and was kept behind an opt-in flag, with results in `benchmarks/rerank-report.md`. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).
- **First-time semantic search needs the extra.** `codeatlas[search]` installs sentence-transformers; without it, `--semantic` and `--hybrid` will fail to load embeddings. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).
- **Unresolved references.** Parsers emit targets as `<unresolved>::Name` when a parent/import can't be statically bound (e.g. Swift `inheritance_specifier`, Ruby `superclass`); the resolver layer is expected to bind these later. Source: [src/codeatlas/parsers/swift_parser.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/parsers/swift_parser.py), [src/codeatlas/parsers/ruby_parser.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/parsers/ruby_parser.py).
- **Web UI depends on the API.** The VS Code extension and the React UI both assume `codeatlas server` (or `codeatlas ui`) is running at `http://127.0.0.1:8080`; configure with `codeatlas.apiBase` / `codeatlas.apiKey` for VS Code. Source: [vscode-extension/README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/vscode-extension/README.md).
- **v1.0.0 release.** The v1.0.0 tag is the current stable line; the changelog is at `https://github.com/AryanSaini26/CodeAtlas/commits/v1.0.0`. Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md).

## See Also

- [Web UI and React Frontend](./Web-UI-and-React-Frontend.md) — `codeatlas-web` SPA, Dashboard, Graph, Diff tabs
- [MCP Server and Agent Integrations](./MCP-Server-and-Agent-Integrations.md) — 30 MCP tools, Claude/Cursor configuration, agent outcome A/B eval
- [CLI Reference](./CLI-Reference.md) — full command list, flags, and `--json` output conventions
- [Performance and Benchmarking](./Performance-and-Benchmarking.md) — `benchmarks/rerank-report.md`, scale report, retrieval V2 metrics

---

<a id='page-3'></a>

## Agent Interfaces, CLI, HTTP API, and Web UI

### Related Pages

Related topics: [Code Graph Engine, Parsers, and Search](#page-2), [Hosted Gateway, Deployment, GitHub Integration, and Operations](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md)
- [frontend/index.html](https://github.com/AryanSaini26/CodeAtlas/blob/main/frontend/index.html)
- [frontend/package.json](https://github.com/AryanSaini26/CodeAtlas/blob/main/frontend/package.json)
- [frontend/src/pages/Overview.tsx](https://github.com/AryanSaini26/CodeAtlas/blob/main/frontend/src/pages/Overview.tsx)
- [examples/mcp-claude/README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/examples/mcp-claude/README.md)
- [src/codeatlas/api/schemas.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/api/schemas.py)
- [src/codeatlas/api/hosted_routes.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/api/hosted_routes.py)
</details>

# Agent Interfaces, CLI, HTTP API, and Web UI

CodeAtlas exposes the same underlying knowledge graph through four complementary surfaces: a Python **CLI**, an **HTTP/JSON API**, an **MCP server** for AI agents, and a React-based **Web UI**. This page describes each surface, the data they share, and the practical entry points for human and machine consumers.

Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md)

## Architecture at a Glance

All four interfaces read from the same SQLite graph database, so a single index operation feeds every downstream consumer.

```mermaid
flowchart LR
    A[Source Files] --> B[Tree-sitter Parsers]
    B --> C[SQLite Graph DB]
    C --> D[CLI]
    C --> E[HTTP/JSON API]
    C --> F[MCP Server]
    C --> G[Web UI]
    D --> H[Humans]
    E --> I[Custom Tooling]
    F --> J[AI Agents]
    G --> K[Browser]
```

Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md)

## Command-Line Interface (CLI)

The CLI is the canonical entry point for indexing, querying, and analyzing a repository. It is installed automatically with `pip install codeatlas` and is implemented as the `codeatlas` console script.

Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md)

Common command categories include:

| Command | Purpose |
|---|---|
| `codeatlas index <path>` | Parse a repository and build the graph |
| `codeatlas stats` | Show file/symbol/relationship counts |
| `codeatlas query <term>` | Keyword search (with `--hybrid`, `--semantic`, `--json`) |
| `codeatlas show <symbol>` | Inspect signature, docs, deps, call chain |
| `codeatlas audit` | Cycles, dead code, complexity report |
| `codeatlas find-path <src> <tgt>` | Shortest dependency path |
| `codeatlas hotspots [path]` | Highest-risk files (churn × in-degree) |
| `codeatlas rank` | PageRank ranking |
| `codeatlas communities` | Subsystem detection via label propagation |
| `codeatlas coverage-gaps` | Public symbols with zero test coverage |
| `codeatlas viz --open` | Launch the interactive graph in a browser |
| `codeatlas export --format dot|json|mermaid` | Graph export |
| `codeatlas serve` | Start the HTTP/JSON API |
| `codeatlas watch <path>` | Incremental re-index on file change |
| `codeatlas webhook <path>` | GitHub webhook handler |
| `codeatlas agent-eval` / `eval` / `perf-report` / `doctor` / `data-lineage` | Reproducible benchmark and diagnostics |

Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md)

Most query/analysis commands accept a `--json` flag so the CLI can be embedded in shell pipelines or used as a backing command for custom frontends.

## HTTP/JSON API

The HTTP layer wraps the graph in a FastAPI application, giving non-Python clients a stable surface for integration. Pydantic models in `schemas.py` define the wire contract.

Source: [src/codeatlas/api/schemas.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/api/schemas.py)

Key response shapes include:

| Model | Fields | Used By |
|---|---|---|
| `GraphNode` | `id`, `name`, `qualified_name`, `kind`, `file`, `community_id` | Graph view, exports |
| `GraphLink` | `source`, `target`, `kind`, `confidence` | Graph view |
| `GraphResponse` | `nodes`, `links`, `truncated` | `/graph` endpoint |
| `StatsResponse` | `files`, `symbols`, `relationships`, `languages`, `kinds` | `/stats` endpoint |
| `SymbolRef` | `id`, `name`, `qualified_name`, `kind`, `file`, `line` | Nested in detail models |
| `SymbolDetails` | `id`, `name`, `qualified_name`, `kind`, `file`, `start_line`, `end_line`, `signature`, `docstring`, `incoming`, `outgoing` | `/symbol/{id}` endpoint |
| `ImpactDepthGroup` | `depth`, `count` | Change-impact analysis |

Source: [src/codeatlas/api/schemas.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/api/schemas.py)

A comment in the file makes the stability contract explicit: *"Keep these stable — the web UI and any third-party consumers key off them. Breaking changes go behind a new `/api/vN` prefix rather than mutating the existing shape."*

Source: [src/codeatlas/api/schemas.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/api/schemas.py)

Beyond the core read API, `hosted_routes.py` exposes a local-dev hosted control plane with GitHub App OAuth, signed webhook handling, sync-worker dispatch, retrieval-eval endpoints, and security scanning.

Source: [src/codeatlas/api/hosted_routes.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/api/hosted_routes.py)

## MCP Server (AI Agent Interface)

The Model Context Protocol server exposes the graph to AI coding agents such as Claude Code and Cursor. With `pip install codeatlas[mcp]`, agents can call tools like `search_symbols`, `get_symbol_details`, `get_dependencies`, `trace_call_chain`, `get_impact_analysis`, `find_similar_code`, `detect_circular_dependencies`, `get_hotspots`, `get_symbol_coverage`, and `get_change_impact`.

Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md)

The shipping example documents a minimal Claude Code wiring:

```json
{
  "mcpServers": {
    "codeatlas": {
      "command": "codeatlas",
      "args": ["serve", "--db", "/path/to/repo/.codeatlas/graph.db"]
    }
  }
}
```

Source: [examples/mcp-claude/README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/examples/mcp-claude/README.md)

The example emphasizes an agent workflow pattern: agents chain multiple tools — `search_symbols` → `get_symbol_details` → `get_dependencies` — to answer a single question, which is precisely where a graph query outperforms filesystem-only tools.

Source: [examples/mcp-claude/README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/examples/mcp-claude/README.md)

## Web UI

The Web UI is a React + TypeScript single-page application built with Vite, React Router, TanStack Query, Zustand, and `react-force-graph-2d`. It is launched locally with `codeatlas viz --open`.

Source: [frontend/package.json](https://github.com/AryanSaini26/CodeAtlas/blob/main/frontend/package.json)

The HTML shell mounts a dark-themed app into `#root` and loads the bundled entry from `/src/main.tsx`.

Source: [frontend/index.html](https://github.com/AryanSaini26/CodeAtlas/blob/main/frontend/index.html)

The UI is organized into several views: **Search** (showing signature, docstring, references), **Analysis** (PageRank ranking, hotspots, communities, coverage gaps), **Graph** (interactive force-directed visualization with kind/community coloring and file filtering), **Symbols** (detailed symbol pages), **Diff** (compare symbols between two git refs), and **Settings** (credentials, reindex, version info).

Source: [README.md](https://github.com/AryanSaini26/CodeAtlas/blob/main/README.md)

The `Overview` page wires live data from the API into hero stat tiles (files indexed, symbols, relationships), a language breakdown bar, a symbol-kinds distribution, and the top-10 PageRank list with deep links to symbol pages.

Source: [frontend/src/pages/Overview.tsx](https://github.com/AryanSaini26/CodeAtlas/blob/main/frontend/src/pages/Overview.tsx)

## Common Failure Modes

- **Empty ranking on Overview** — the page renders an `EmptyState` reading *"No ranking yet"* when no graph has been indexed; run `codeatlas index` first.
  Source: [frontend/src/pages/Overview.tsx](https://github.com/AryanSaini26/CodeAtlas/blob/main/frontend/src/pages/Overview.tsx)
- **CLI without sem

---

<a id='page-4'></a>

## Hosted Gateway, Deployment, GitHub Integration, and Operations

### Related Pages

Related topics: [Overview and System Architecture](#page-1), [Agent Interfaces, CLI, HTTP API, and Web UI](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [src/codeatlas/api/hosted_routes.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/api/hosted_routes.py)
- [src/codeatlas/hosted.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/hosted.py)
- [src/codeatlas/hosted_worker.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/hosted_worker.py)
- [src/codeatlas/github_app.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/github_app.py)
- [src/codeatlas/rate_limit.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/rate_limit.py)
- [src/codeatlas/context_security.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/context_security.py)
- [src/codeatlas/agent_context.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/agent_context.py)
- [src/codeatlas/hosted_eval.py](https://github.com/AryanSaini26/CodeAtlas/blob/main/src/codeatlas/hosted_eval.py)
</details>

# Hosted Gateway, Deployment, GitHub Integration, and Operations

CodeAtlas v1.0.0 ships a local-development hosted control plane that packages team management, repository registration, a background sync worker, GitHub App integration, and an operations surface into a single FastAPI process. The system is deliberately designed so teams can stand up a multi-tenant context gateway on a laptop before wiring real OAuth, billing, or remote MCP routing — the latter are explicitly stubbed in the MVP. Source: [src/codeatlas/api/hosted_routes.py:1-15]().

## Architecture Overview

The hosted gateway exposes a FastAPI router mounted under `/hosted`, backed by a SQLite-backed `HostedStore` that persists teams, principals, repositories, sync jobs, and webhook delivery IDs. A separate `SyncJobWorker` runs in the background to clone repositories and trigger per-repo graph indexing. GitHub App concerns (OAuth, webhook signature verification, repository refresh) are isolated in `codeatlas/github_app.py` so the control plane can be exercised in CI without live credentials.

```mermaid
flowchart LR
    A[Bootstrap<br/>bootstrap] --> B[Register Team & Repo<br/>register-repo]
    B --> C[SyncJobWorker<br/>background]
    C --> D[Per-repo Graph DB<br/>GraphStore]
    D --> E[Remote MCP<br/>/remote-mcp]
    D --> F[Hosted Dashboard<br/>/hosted]
    G[GitHub Webhook<br/>push event] --> H[process_github_webhook]
    H --> C
    I[Agent / IDE] --> E
    I --> F
```

## Bootstrap, Registration, and Sync

Bootstrap is the entry point: it accepts a `BootstrapRequest` (email, name, team slug, team name) and creates the first principal and team record. Source: [src/codeatlas/api/hosted_routes.py:18-25]().

Repository registration uses a `RepoRegistration` payload persisted by `HostedStore`. Once registered, `codeatlas hosted sync` enqueues a sync job that the worker picks up. Source: [src/codeatlas/hosted_worker.py]() and [src/codeatlas/hosted.py]().

| Step | Command / Action | Source |
|------|------------------|--------|
| Initialize gateway | `codeatlas hosted bootstrap --hosted-db .codeatlas/hosted.db` | `hosted_routes.py` |
| Register repo | `codeatlas hosted register-repo` | `hosted.py` |
| Trigger sync | `codeatlas hosted sync` | `hosted_worker.py` |
| Inspect status | `codeatlas hosted github status` | `hosted_routes.py` |

Each sync run activates the repository and rebuilds the per-repo `GraphStore` (`.codeatlas/graph.db`). The dashboard reads from this store to render the Agent Context Feed, measured retrieval quality, blast-radius impact, and data lineage views. Source: [src/codeatlas/api/hosted_routes.py:1-15]().

## GitHub App Integration

The GitHub integration is split across three concerns. First, OAuth helpers in `github_app.py` (`build_oauth_authorize_url`, `exchange_oauth_code`, `fetch_github_user`) handle user authorization. Second, `verify_github_signature` validates incoming webhook deliveries using the configured secret. Third, `refresh_github_repositories` enumerates the installation's repositories from a fixture or token source so CI runs stay hermetic. Source: [src/codeatlas/github_app.py]().

`parse_webhook_payload` and `process_github_webhook` decode push events, record the delivery ID via `webhook_rate_limiter`, and enqueue per-repo syncs. The `hosted_routes.py` module wires these into the `/hosted/github/webhook` endpoint. Source: [src/codeatlas/api/hosted_routes.py:1-15]() and [src/codeatlas/rate_limit.py]().

For repo-scoped remote context, `/remote-mcp` validates the `X-Stratum-Audience` header and serves context packs through `build_context_pack` along with graph summaries. The same endpoint includes prompt-injection, secret, and vendor-path scan results from `scan_context_pack`. Source: [src/codeatlas/context_security.py]() and [src/codeatlas/agent_context.py]().

## Operations: Security, Rate Limiting, and Retrieval Eval

The operations surface combines three subsystems. Rate limiting is centralized in `codeatlas/rate_limit.py`, exposing `context_rate_limiter` and `webhook_rate_limiter` that the routes consult before accepting context requests or webhook deliveries. Source: [src/codeatlas/rate_limit.py]().

Context security scanning (`scan_context_pack`) runs every context pack through prompt-injection, secret-leak, and vendor-path detectors before it leaves the gateway. The scan verdict travels alongside the pack so consumers can reject suspicious payloads. Source: [src/codeatlas/context_security.py]().

Retrieval quality is treated as a first-class operational signal: `hosted_eval.run_repo_retrieval_eval` measures recall@k and MRR per repository, while `compute_context_savings` quantifies token reduction versus prompt-only baselines. These metrics are surfaced on the hosted dashboard so teams can decide whether the graph is actually helping their agents. Source: [src/codeatlas/hosted_eval.py]().

### Common Failure Modes

- **Webhook signature mismatch** — webhook deliveries without a valid HMAC are rejected before reaching the worker; verify the secret via `codeatlas hosted webhook-test`.
- **Sync job backlog** — if the worker cannot keep up, `codeatlas hosted github refresh-repos` re-enumerates installations and the queue drains on the next worker tick.
- **Context budget overflow** — `codeatlas context <query> --budget 2000` trims the pack; raising the budget without re-running the eval suite can regress measured quality.
- **Remote MCP audience rejection** — calls to `/remote-mcp` without `X-Stratum-Audience` are refused; this is by design to keep repo context scoped.

## See Also

- [Retrieval and Context Pack Pipeline]()
- [MCP Server and Agent Tooling]()
- [GitHub App and Webhook Setup Guide]()

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: AryanSaini26/CodeAtlas

Summary: Found 7 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Configuration risk - Configuration risk requires verification.

## 1. Configuration risk - Configuration risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.host_targets | https://github.com/AryanSaini26/CodeAtlas

## 2. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/AryanSaini26/CodeAtlas

## 3. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/AryanSaini26/CodeAtlas

## 4. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/AryanSaini26/CodeAtlas

## 5. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/AryanSaini26/CodeAtlas

## 6. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/AryanSaini26/CodeAtlas

## 7. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/AryanSaini26/CodeAtlas

<!-- canonical_name: AryanSaini26/CodeAtlas; human_manual_source: deepwiki_human_wiki -->
