CodeAtlas Manual - Doramagic.ai

Doramagic Project Pack · Human Manual

CodeAtlas

MCP server that builds a real-time code knowledge graph via Tree-sitter AST parsing, giving AI coding agents instant structural and semantic codebase navigation.

Overview and System Architecture

Related topics: Code Graph Engine, Parsers, and Search, Agent Interfaces, CLI, HTTP API, and Web UI, Hosted Gateway, Deployment, GitHub Integration, and Operations

Section Related Pages

Continue reading this section for the full explanation and source context.

Overview and System Architecture

Purpose and Scope

CodeAtlas is a code knowledge-graph platform that converts source files into a queryable graph of symbols and relationships, then exposes that graph to humans (via a web UI and CLI) and to AI coding agents (via an MCP server, FastAPI HTTP/JSON API, and semantic search). Its central claim is reducing the 60–80% of context window that AI agents typically waste re-orienting themselves inside an unfamiliar codebase. Source: README.md.

The project ships in v1.0.0 with 24 supported languages, a SQLite + FTS5 graph store (zero external infrastructure), optional FAISS-based semantic search, a React web frontend, and a FastMCP server exposing 30 tools. Source: README.md.

Layered Architecture

CodeAtlas follows a four-layer architecture: language parsers feed a graph database, which is wrapped by an API/MCP layer, which in turn drives both a human-facing web UI and an agent-facing tool surface.

flowchart TB
    subgraph Sources
        A[Source files: .py, .ts, .go, .rs, .java, .kt, .rb, .scala, .swift, ...]
    end

    subgraph Parsers["Language Parsers (Tree-sitter AST)"]
        P1[Python]
        P2[TypeScript / TSX]
        P3[Go / Rust]
        P4[Java / Kotlin]
        P5[Ruby / Scala / Swift]
    end

    subgraph Core["Graph Core"]
        DB[(SQLite + FTS5)]
        VEC[(FAISS vector index - optional)]
        ALGO[PageRank, communities, cycles, hotspots]
    end

    subgraph Interfaces["Interfaces"]
        CLI[codeatlas CLI]
        API[FastAPI HTTP/JSON]
        MCP[FastMCP server - 30 tools]
    end

    subgraph Clients["Clients"]
        WEB[React web UI]
        AGENT[Claude Code / Cursor / agents]
        CI[GitHub webhooks / pre-commit]
    end

    A --> Parsers
    Parsers --> DB
    DB <--> VEC
    DB --> ALGO
    DB --> CLI
    DB --> API
    DB --> MCP
    CLI --> WEB
    API --> WEB
    MCP --> AGENT
    CI --> DB

Each layer has a clear responsibility boundary:

Parsers extract Symbol and Relationship records from tree-sitter ASTs and emit them into a uniform intermediate model regardless of the source language. The Python parser, for example, walks function_definition / class_definition / decorated_definition nodes and emits CALLS, INHERITS, and DECORATES relationships with explicit source spans. Source: src/codeatlas/parsers/python_parser.py.
Graph core persists everything into SQLite with FTS5 for keyword search and recursive-CTE traversals for graph queries. Optional FAISS indexes power semantic and hybrid search via reciprocal rank fusion. Source: README.md.
Interfaces expose the graph through three channels: a Click-based CLI, a FastAPI service with Pydantic schemas, and a FastMCP server for AI agents.
Clients consume the interfaces: a React + react-force-graph UI for humans, agent hosts (Claude Code, Cursor) for AI, and CI integrations (watchdog file watcher, GitHub webhooks, pre-commit hooks) for continuous sync. Source: README.md, examples/mcp-claude/README.md.

Parser Subsystem

The parser package follows a uniform contract: each language module produces a list of Symbol objects (id, name, qualified_name, kind, file_path, span, docstring, signature, decorators, language) and a list of Relationship objects (source_id, target_id, kind, file_path, span). The RelationshipKind enum covers at minimum CALLS, INHERITS, DECORATES, and unresolved references of the form <unresolved>::Name.

Examples of language-specific extraction strategies:

Python captures decorators via decorated_definition, extracts calls from function bodies, and recurses into nested definitions under the parent qualified name. Source: src/codeatlas/parsers/python_parser.py.
Java detects final modifier on fields to distinguish constants from variables, and reconstructs method signatures as "returnType name(params)" for richer display. Source: src/codeatlas/parsers/java_parser.py.
Kotlin handles class_declaration, object_declaration, and companion_object, threading an owner parameter so nested functions become qualified methods. Source: src/codeatlas/parsers/kotlin_parser.py.
Ruby walks class, module, and method nodes, capturing the superclass field as an INHERITS edge. Source: src/codeatlas/parsers/ruby_parser.py.
Scala builds signatures including explicit return types (name(params): ReturnType). Source: src/codeatlas/parsers/scala_parser.py.
Swift traverses class_declaration, protocol_declaration, and inheritance_specifier to capture both class and protocol conformance edges. Source: src/codeatlas/parsers/swift_parser.py.

The frontend shell declares itself as a dark-themed SPA titled "CodeAtlas" with the description "Explore your code as a knowledge graph — PageRank, semantic search, diff, coverage gaps." Source: frontend/index.html.

API, MCP, and Web Surfaces

The HTTP/JSON API uses Pydantic schemas for every response. Representative schemas include PageRankResponse, HotspotsResponse, CommunitiesResponse, CoverageGapsResponse, DiffResponse, ReindexResponse, and ErrorResponse, each modeled with explicit field names and types so consumers can rely on stable contracts. Source: src/codeatlas/api/schemas.py.

The MCP server is configured for agent hosts by pointing the mcpServers block at codeatlas serve --db <path>. Once registered, agents can call tools such as search_symbols, get_symbol_details, get_pagerank, find_path, get_dependencies, trace_call_chain, find_dead_code, analyze_complexity, get_hotspots, get_symbol_coverage, and chain them automatically (for example, search_symbols → get_symbol_details → get_dependencies). Source: examples/mcp-claude/README.md.

The web frontend's Overview page surfaces the top symbols by PageRank, links into the Analysis tab for deeper views, and routes each result to a dedicated /symbol/:id page. Empty states explicitly instruct users to run codeatlas index first. Source: frontend/src/pages/Overview.tsx.

Data Flow and Sync

A typical end-to-end flow looks like this:

A user runs codeatlas index [path] (optionally with --incremental, --watch, or --workers N). Source: README.md.
Parsers emit Symbol and Relationship records per file; the graph core persists them into SQLite and, when installed, indexes embeddings into FAISS.
The graph is queried through the CLI, the FastAPI service, or the MCP server. The CLI offers query, show, audit, find-path, coupling, hotspots, hubs, rank, communities, coverage-gaps, report, and agent-eval. Source: README.md.
A background codeatlas index --watch process, a codeatlas pre-commit install hook, or a GitHub webhook keeps the graph fresh as files change.

Failure Modes and Operational Notes

Common failure modes a technical reader should plan for:

Empty results after install: the web UI's Overview page renders an explicit "Run codeatlas index first" hint when no PageRank data exists. Source: frontend/src/pages/Overview.tsx.
Unresolved references: cross-file or cross-module references that cannot be resolved during parsing are stored as <unresolved>::Name placeholders and resolved later by a graph pass. Source: src/codeatlas/parsers/python_parser.py.
Semantic search unavailable: the [search] and [all] extras install sentence-transformers and FAISS; without them, only FTS5 keyword and hybrid (--hybrid) queries are available. Source: README.md.
MCP not registered: until codeatlas serve is added to the host's mcpServers config and the host is restarted, agents cannot reach the graph. Source: examples/mcp-claude/README.md.

Code Graph Engine, Parsers, and Search

Related topics: Overview and System Architecture, Agent Interfaces, CLI, HTTP API, and Web UI

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Internal structure (example: Swift and Ruby)

Continue reading this section for the full explanation and source context.

Section Indexing and parallelism

Continue reading this section for the full explanation and source context.

Code Graph Engine, Parsers, and Search

Architecture and Data Flow

CodeAtlas is positioned as a knowledge-graph engine for source code: tree-sitter parses files into symbols and relationships, a SQLite-backed graph store (with FTS5 and recursive CTEs) persists them, and a search layer exposes them through the CLI, MCP, HTTP API, web UI, and VS Code extension. Source: README.md.

The README's stated pipeline is Source Files → Tree-sitter AST → Symbols + Relationships → SQLite Graph, fanning out to FTS5 keyword search, FAISS vector search, and graph analysis. The same store is read by CLI commands, the 30-tool MCP server, the FastAPI layer, and the React UI — so all surfaces see identical data. Source: README.md.

flowchart LR
    FS[Source Files] --> TS[Tree-sitter AST]
    TS --> SYM[Symbols + Relationships]
    SYM --> SQL[(SQLite + FTS5)]
    SQL --> CLI[CLI]
    SQL --> MCP[MCP Server - 30 tools]
    SQL --> API[FastAPI HTTP API]
    SQL --> UI[React Web UI]
    SQL --> VS[VS Code Extension]
    SYM --> VS2[FAISS Vectors]
    SQL --> ANALY[Graph Analysis: PageRank, cycles, hotspots]

Design choices highlighted in the README: SQLite over Neo4j (zero infrastructure, ships with Python, FTS5 + recursive CTEs), FAISS over pgvector (local, no DB server), and tree-sitter over regex (incremental, cross-language). Source: README.md.

Tree-sitter Parsers

CodeAtlas ships tree-sitter parsers for 24 languages, exposing classes, functions, interfaces, decorators, docstrings, imports, inheritance, generics, and language-specific constructs (JSDoc, Javadoc, KDoc, Scaladoc, XML doc comments, /// Rust doc comments). Source: README.md.

Internal structure (example: Swift and Ruby)

The parsers share a common shape: locate the language-specific declaration node, read its name child, build a stable symbol id from the file path and qualified name, attach the source span and docstring, then walk the body for nested declarations. The Ruby parser handles top-level statements via _walk_toplevel, dispatches to _handle_class (which records the symbol plus an INHERITS relationship from the superclass field), _handle_module, and method handlers, and resolves nested classes/methods inside the class body. Source: src/codeatlas/parsers/ruby_parser.py.

The Swift parser mirrors this for class_declaration, protocol_declaration, and friends: it locates the type_identifier for the name, emits the Symbol, then iterates inheritance_specifier children to emit one INHERITS relationship per parent type. Unresolved parents are written as <unresolved>::ParentType, which the resolver layer can later bind. It then recurses into class_body so nested declarations inherit the qualified scope. Source: src/codeatlas/parsers/swift_parser.py.

Indexing and parallelism

codeatlas index walks the repository, parses files, and writes them into the graph. Two flags shape behavior:

Flag	Effect
`--incremental`	Only re-index files that changed since the last index
`--watch`	After indexing, keep watching for file changes (uses Watchdog)
`--workers N`	Parse files in parallel across N processes

Source: README.md. The companion codeatlas watch command exists as a standalone equivalent, and codeatlas diff [path] reports which files changed since the last index. Source: README.md.

Knowledge Graph Store and Graph Analysis

The graph lives in a SQLite database (default .codeatlas/graph.db) with FTS5 for keyword search and recursive CTEs for traversals like shortest-path and call-chain. Sources cite the design rationale explicitly: zero infrastructure, deterministic, and small enough to ship with Python. Source: README.md.

Graph analysis primitives surfaced through both CLI and MCP include:

PageRank — codeatlas rank ranks symbols weighted by caller importance rather than raw degree; --kind class --json restricts to a kind. Source: README.md.
Communities — codeatlas communities runs label propagation; nodes receive a community_id, and the UI offers a toggle between kind-coloring and community-coloring. Source: README.md.
Cycles, dead code, hotspots, coverage gaps, coupling, shortest path — exposed via codeatlas audit, codeatlas coverage-gaps, codeatlas coupling, codeatlas find-path <src> <tgt>, and codeatlas hotspots [path]. The codeatlas report [path] command composes them into a single health report, optionally as JSON. Source: README.md.
Git-aware change impact — the codeatlas diff and the MCP get_change_impact tool surface which symbols and files are affected by a change between refs; the web UI's Diff tab renders the same data as added/removed/modified columns. Source: README.md.

The graph can be exported for downstream tooling in DOT (Graphviz), JSON (D3.js), Mermaid, GraphML, CSV, and Cypher via codeatlas export --format .... Source: README.md.

Search Subsystem

CodeAtlas provides three search modes that share the same symbol store:

Full-text (FTS5) — codeatlas query <term> runs a keyword search across the indexed corpus. Source: README.md.
Semantic — codeatlas query <term> --semantic runs natural-language vector search using sentence-transformers, requiring the codeatlas[search] extra. Source: README.md.
Hybrid — codeatlas query <term> --hybrid merges keyword and vector results with reciprocal rank fusion, giving the best of both lexical precision and semantic recall. Source: README.md.

All three return the same shape (consumable via --json) and feed the Web UI's Search pane, where results open into a detail view showing signature, docstring, and incoming/outgoing references. Source: README.md.

The MCP server exposes search as search_symbols (FTS5 with kind/file filters and query expansion) and find_similar_code (semantic); agents routinely chain search_symbols → get_symbol_details → get_dependencies to answer a single question — the key advantage over filesystem-only tools is querying a graph rather than a pile of text. Source: examples/mcp-claude/README.md.

Downstream Surfaces

The same graph backs every consumer, which is the project's main architectural claim:

CLI — codeatlas show <symbol>, query, audit, find-path, hotspots, hubs, rank, communities, coverage-gaps, report. Source: README.md.
MCP server — 30 tools (29 cited in the Claude example, 30 in the feature table) for AI agents; example prompts include "Find every caller of parse_file" and "What symbols were added or modified between HEAD and main?". Source: examples/mcp-claude/README.md, README.md.
HTTP API + Web UI — codeatlas ui serves both on localhost:8080; the frontend (codeatlas-web v1.0.0) is a Vite + React + Tailwind SPA built on react-force-graph-2d, @tanstack/react-query, react-router-dom, and zustand. Source: frontend/package.json, README.md. The UI exposes Dashboard, Search, Analysis, Graph, Symbols, Diff, and Settings tabs. Source: README.md.
VS Code extension — codeatlas-vscode v0.1.0 commands: Open Web UI, Search Symbols, Show Symbol at Cursor, Show Impact Radius, Build Agent Context. It targets codeatlas.apiBase (default http://127.0.0.1:8080) and optionally sends X-API-Key for servers started with --api-key. Source: vscode-extension/README.md, vscode-extension/package.json.
Real-time sync — Watchdog-based codeatlas index --watch plus a GitHub webhook handler (codeatlas webhook /path/to/repo --port 9000 --secret YOUR_WEBHOOK_SECRET) for push-driven incremental updates. Source: README.md.

Common Failure Modes and Operational Notes

A few behaviors documented in the README and examples are worth keeping in mind when integrating:

Cross-encoder reranker is opt-in. The README explicitly notes that the optional reranker "did not beat the graph/lexical baseline on the code-symbol suite" and was kept behind an opt-in flag, with results in benchmarks/rerank-report.md. Source: README.md.
First-time semantic search needs the extra. codeatlas[search] installs sentence-transformers; without it, --semantic and --hybrid will fail to load embeddings. Source: README.md.
Unresolved references. Parsers emit targets as <unresolved>::Name when a parent/import can't be statically bound (e.g. Swift inheritance_specifier, Ruby superclass); the resolver layer is expected to bind these later. Source: src/codeatlas/parsers/swift_parser.py, src/codeatlas/parsers/ruby_parser.py.
Web UI depends on the API. The VS Code extension and the React UI both assume codeatlas server (or codeatlas ui) is running at http://127.0.0.1:8080; configure with codeatlas.apiBase / codeatlas.apiKey for VS Code. Source: vscode-extension/README.md.
v1.0.0 release. The v1.0.0 tag is the current stable line; the changelog is at https://github.com/AryanSaini26/CodeAtlas/commits/v1.0.0. Source: README.md.

Agent Interfaces, CLI, HTTP API, and Web UI

Related topics: Code Graph Engine, Parsers, and Search, Hosted Gateway, Deployment, GitHub Integration, and Operations

Section Related Pages

Continue reading this section for the full explanation and source context.

Agent Interfaces, CLI, HTTP API, and Web UI

CodeAtlas exposes the same underlying knowledge graph through four complementary surfaces: a Python CLI, an HTTP/JSON API, an MCP server for AI agents, and a React-based Web UI. This page describes each surface, the data they share, and the practical entry points for human and machine consumers.

Source: README.md

Architecture at a Glance

All four interfaces read from the same SQLite graph database, so a single index operation feeds every downstream consumer.

flowchart LR
    A[Source Files] --> B[Tree-sitter Parsers]
    B --> C[SQLite Graph DB]
    C --> D[CLI]
    C --> E[HTTP/JSON API]
    C --> F[MCP Server]
    C --> G[Web UI]
    D --> H[Humans]
    E --> I[Custom Tooling]
    F --> J[AI Agents]
    G --> K[Browser]

Source: README.md

Command-Line Interface (CLI)

The CLI is the canonical entry point for indexing, querying, and analyzing a repository. It is installed automatically with pip install codeatlas and is implemented as the codeatlas console script.

Source: README.md

Common command categories include:

Command	Purpose
`codeatlas index <path>`	Parse a repository and build the graph
`codeatlas stats`	Show file/symbol/relationship counts
`codeatlas query <term>`	Keyword search (with `--hybrid`, `--semantic`, `--json`)
`codeatlas show <symbol>`	Inspect signature, docs, deps, call chain
`codeatlas audit`	Cycles, dead code, complexity report
`codeatlas find-path <src> <tgt>`	Shortest dependency path
`codeatlas hotspots [path]`	Highest-risk files (churn × in-degree)
`codeatlas rank`	PageRank ranking
`codeatlas communities`	Subsystem detection via label propagation
`codeatlas coverage-gaps`	Public symbols with zero test coverage
`codeatlas viz --open`	Launch the interactive graph in a browser
`codeatlas export --format dot	json	mermaid`	Graph export
`codeatlas serve`	Start the HTTP/JSON API
`codeatlas watch <path>`	Incremental re-index on file change
`codeatlas webhook <path>`	GitHub webhook handler
`codeatlas agent-eval` / `eval` / `perf-report` / `doctor` / `data-lineage`	Reproducible benchmark and diagnostics

Source: README.md

Most query/analysis commands accept a --json flag so the CLI can be embedded in shell pipelines or used as a backing command for custom frontends.

HTTP/JSON API

The HTTP layer wraps the graph in a FastAPI application, giving non-Python clients a stable surface for integration. Pydantic models in schemas.py define the wire contract.

Source: src/codeatlas/api/schemas.py

Key response shapes include:

Model	Fields	Used By
`GraphNode`	`id`, `name`, `qualified_name`, `kind`, `file`, `community_id`	Graph view, exports
`GraphLink`	`source`, `target`, `kind`, `confidence`	Graph view
`GraphResponse`	`nodes`, `links`, `truncated`	`/graph` endpoint
`StatsResponse`	`files`, `symbols`, `relationships`, `languages`, `kinds`	`/stats` endpoint
`SymbolRef`	`id`, `name`, `qualified_name`, `kind`, `file`, `line`	Nested in detail models
`SymbolDetails`	`id`, `name`, `qualified_name`, `kind`, `file`, `start_line`, `end_line`, `signature`, `docstring`, `incoming`, `outgoing`	`/symbol/{id}` endpoint
`ImpactDepthGroup`	`depth`, `count`	Change-impact analysis

Source: src/codeatlas/api/schemas.py

A comment in the file makes the stability contract explicit: *"Keep these stable — the web UI and any third-party consumers key off them. Breaking changes go behind a new /api/vN prefix rather than mutating the existing shape."*

Source: src/codeatlas/api/schemas.py

Beyond the core read API, hosted_routes.py exposes a local-dev hosted control plane with GitHub App OAuth, signed webhook handling, sync-worker dispatch, retrieval-eval endpoints, and security scanning.

Source: src/codeatlas/api/hosted_routes.py

MCP Server (AI Agent Interface)

The Model Context Protocol server exposes the graph to AI coding agents such as Claude Code and Cursor. With pip install codeatlas[mcp], agents can call tools like search_symbols, get_symbol_details, get_dependencies, trace_call_chain, get_impact_analysis, find_similar_code, detect_circular_dependencies, get_hotspots, get_symbol_coverage, and get_change_impact.

Source: README.md

The shipping example documents a minimal Claude Code wiring:

{
  "mcpServers": {
    "codeatlas": {
      "command": "codeatlas",
      "args": ["serve", "--db", "/path/to/repo/.codeatlas/graph.db"]
    }
  }
}

Source: examples/mcp-claude/README.md

The example emphasizes an agent workflow pattern: agents chain multiple tools — search_symbols → get_symbol_details → get_dependencies — to answer a single question, which is precisely where a graph query outperforms filesystem-only tools.

Source: examples/mcp-claude/README.md

Web UI

The Web UI is a React + TypeScript single-page application built with Vite, React Router, TanStack Query, Zustand, and react-force-graph-2d. It is launched locally with codeatlas viz --open.

Source: frontend/package.json

The HTML shell mounts a dark-themed app into #root and loads the bundled entry from /src/main.tsx.

Source: frontend/index.html

The UI is organized into several views: Search (showing signature, docstring, references), Analysis (PageRank ranking, hotspots, communities, coverage gaps), Graph (interactive force-directed visualization with kind/community coloring and file filtering), Symbols (detailed symbol pages), Diff (compare symbols between two git refs), and Settings (credentials, reindex, version info).

Source: README.md

The Overview page wires live data from the API into hero stat tiles (files indexed, symbols, relationships), a language breakdown bar, a symbol-kinds distribution, and the top-10 PageRank list with deep links to symbol pages.

Source: frontend/src/pages/Overview.tsx

Common Failure Modes

Source: frontend/src/pages/Overview.tsx

Empty ranking on Overview — the page renders an EmptyState reading *"No ranking yet"* when no graph has been indexed; run codeatlas index first.
**CLI without sem

Source: https://github.com/AryanSaini26/CodeAtlas / Human Manual

Hosted Gateway, Deployment, GitHub Integration, and Operations

Related topics: Overview and System Architecture, Agent Interfaces, CLI, HTTP API, and Web UI

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Common Failure Modes

Continue reading this section for the full explanation and source context.

Hosted Gateway, Deployment, GitHub Integration, and Operations

CodeAtlas v1.0.0 ships a local-development hosted control plane that packages team management, repository registration, a background sync worker, GitHub App integration, and an operations surface into a single FastAPI process. The system is deliberately designed so teams can stand up a multi-tenant context gateway on a laptop before wiring real OAuth, billing, or remote MCP routing — the latter are explicitly stubbed in the MVP. Source: src/codeatlas/api/hosted_routes.py:1-15.

Architecture Overview

The hosted gateway exposes a FastAPI router mounted under /hosted, backed by a SQLite-backed HostedStore that persists teams, principals, repositories, sync jobs, and webhook delivery IDs. A separate SyncJobWorker runs in the background to clone repositories and trigger per-repo graph indexing. GitHub App concerns (OAuth, webhook signature verification, repository refresh) are isolated in codeatlas/github_app.py so the control plane can be exercised in CI without live credentials.

flowchart LR
    A[Bootstrap<br/>bootstrap] --> B[Register Team & Repo<br/>register-repo]
    B --> C[SyncJobWorker<br/>background]
    C --> D[Per-repo Graph DB<br/>GraphStore]
    D --> E[Remote MCP<br/>/remote-mcp]
    D --> F[Hosted Dashboard<br/>/hosted]
    G[GitHub Webhook<br/>push event] --> H[process_github_webhook]
    H --> C
    I[Agent / IDE] --> E
    I --> F

Bootstrap, Registration, and Sync

Bootstrap is the entry point: it accepts a BootstrapRequest (email, name, team slug, team name) and creates the first principal and team record. Source: src/codeatlas/api/hosted_routes.py:18-25.

Repository registration uses a RepoRegistration payload persisted by HostedStore. Once registered, codeatlas hosted sync enqueues a sync job that the worker picks up. Source: src/codeatlas/hosted_worker.py and src/codeatlas/hosted.py.

Step	Command / Action	Source
Initialize gateway	`codeatlas hosted bootstrap --hosted-db .codeatlas/hosted.db`	`hosted_routes.py`
Register repo	`codeatlas hosted register-repo`	`hosted.py`
Trigger sync	`codeatlas hosted sync`	`hosted_worker.py`
Inspect status	`codeatlas hosted github status`	`hosted_routes.py`

Each sync run activates the repository and rebuilds the per-repo GraphStore (.codeatlas/graph.db). The dashboard reads from this store to render the Agent Context Feed, measured retrieval quality, blast-radius impact, and data lineage views. Source: src/codeatlas/api/hosted_routes.py:1-15.

GitHub App Integration

The GitHub integration is split across three concerns. First, OAuth helpers in github_app.py (build_oauth_authorize_url, exchange_oauth_code, fetch_github_user) handle user authorization. Second, verify_github_signature validates incoming webhook deliveries using the configured secret. Third, refresh_github_repositories enumerates the installation's repositories from a fixture or token source so CI runs stay hermetic. Source: src/codeatlas/github_app.py.

parse_webhook_payload and process_github_webhook decode push events, record the delivery ID via webhook_rate_limiter, and enqueue per-repo syncs. The hosted_routes.py module wires these into the /hosted/github/webhook endpoint. Source: src/codeatlas/api/hosted_routes.py:1-15 and src/codeatlas/rate_limit.py.

For repo-scoped remote context, /remote-mcp validates the X-Stratum-Audience header and serves context packs through build_context_pack along with graph summaries. The same endpoint includes prompt-injection, secret, and vendor-path scan results from scan_context_pack. Source: src/codeatlas/context_security.py and src/codeatlas/agent_context.py.

Operations: Security, Rate Limiting, and Retrieval Eval

The operations surface combines three subsystems. Rate limiting is centralized in codeatlas/rate_limit.py, exposing context_rate_limiter and webhook_rate_limiter that the routes consult before accepting context requests or webhook deliveries. Source: src/codeatlas/rate_limit.py.

Context security scanning (scan_context_pack) runs every context pack through prompt-injection, secret-leak, and vendor-path detectors before it leaves the gateway. The scan verdict travels alongside the pack so consumers can reject suspicious payloads. Source: src/codeatlas/context_security.py.

Retrieval quality is treated as a first-class operational signal: hosted_eval.run_repo_retrieval_eval measures recall@k and MRR per repository, while compute_context_savings quantifies token reduction versus prompt-only baselines. These metrics are surfaced on the hosted dashboard so teams can decide whether the graph is actually helping their agents. Source: src/codeatlas/hosted_eval.py.

Common Failure Modes

Webhook signature mismatch — webhook deliveries without a valid HMAC are rejected before reaching the worker; verify the secret via codeatlas hosted webhook-test.
Sync job backlog — if the worker cannot keep up, codeatlas hosted github refresh-repos re-enumerates installations and the queue drains on the next worker tick.
Context budget overflow — codeatlas context <query> --budget 2000 trims the pack; raising the budget without re-running the eval suite can regress measured quality.
Remote MCP audience rejection — calls to /remote-mcp without X-Stratum-Audience are refused; this is by design to keep repo context scoped.

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

medium Configuration risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Capability evidence risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Maintenance risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 7 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Configuration risk - Configuration risk requires verification.

1. Configuration risk: Configuration risk requires verification

Severity: medium
Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: capability.host_targets | https://github.com/AryanSaini26/CodeAtlas

2. Capability evidence risk: Capability evidence risk requires verification

Severity: medium
Finding: README/documentation is current enough for a first validation pass.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: capability.assumptions | https://github.com/AryanSaini26/CodeAtlas

3. Maintenance risk: Maintenance risk requires verification

Severity: medium
Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: evidence.maintainer_signals | https://github.com/AryanSaini26/CodeAtlas

4. Security or permission risk: Security or permission risk requires verification

Severity: medium
Finding: no_demo
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: downstream_validation.risk_items | https://github.com/AryanSaini26/CodeAtlas

5. Security or permission risk: Security or permission risk requires verification

Severity: medium
Finding: no_demo
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: risks.scoring_risks | https://github.com/AryanSaini26/CodeAtlas

6. Maintenance risk: Maintenance risk requires verification

Severity: low
Finding: issue_or_pr_quality=unknown。
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: evidence.maintainer_signals | https://github.com/AryanSaini26/CodeAtlas

7. Maintenance risk: Maintenance risk requires verification

Severity: low
Finding: release_recency=unknown。
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: evidence.maintainer_signals | https://github.com/AryanSaini26/CodeAtlas

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 2

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using CodeAtlas with real data or production workflows.

CodeAtlas v1.0.0 - github / github_release
Configuration risk requires verification - GitHub / issue

Source: Project Pack community evidence and pitfall evidence

CodeAtlas

Overview and System Architecture

Related Pages

Overview and System Architecture

Purpose and Scope

Layered Architecture

Parser Subsystem

API, MCP, and Web Surfaces

Data Flow and Sync

Failure Modes and Operational Notes

See Also

Code Graph Engine, Parsers, and Search

Related Pages

Code Graph Engine, Parsers, and Search

Architecture and Data Flow

Tree-sitter Parsers

Internal structure (example: Swift and Ruby)

Indexing and parallelism

Knowledge Graph Store and Graph Analysis

Search Subsystem

Downstream Surfaces

Common Failure Modes and Operational Notes

See Also

Agent Interfaces, CLI, HTTP API, and Web UI

Related Pages

Agent Interfaces, CLI, HTTP API, and Web UI

Architecture at a Glance

Command-Line Interface (CLI)

HTTP/JSON API

MCP Server (AI Agent Interface)

Web UI

Common Failure Modes

Hosted Gateway, Deployment, GitHub Integration, and Operations

Related Pages

Hosted Gateway, Deployment, GitHub Integration, and Operations

Architecture Overview

Bootstrap, Registration, and Sync

GitHub App Integration

Operations: Security, Rate Limiting, and Retrieval Eval

Common Failure Modes

See Also

Doramagic Pitfall Log

Doramagic Pitfall Log

1. Configuration risk: Configuration risk requires verification

2. Capability evidence risk: Capability evidence risk requires verification

3. Maintenance risk: Maintenance risk requires verification

4. Security or permission risk: Security or permission risk requires verification

5. Security or permission risk: Security or permission risk requires verification

6. Maintenance risk: Maintenance risk requires verification

7. Maintenance risk: Maintenance risk requires verification

Community Discussion Evidence

Community Discussion Evidence