Doramagic Project Pack · Human Manual
CodeAtlas
MCP server that builds a real-time code knowledge graph via Tree-sitter AST parsing, giving AI coding agents instant structural and semantic codebase navigation.
Overview and System Architecture
Related topics: Code Graph Engine, Parsers, and Search, Agent Interfaces, CLI, HTTP API, and Web UI, Hosted Gateway, Deployment, GitHub Integration, and Operations
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Code Graph Engine, Parsers, and Search, Agent Interfaces, CLI, HTTP API, and Web UI, Hosted Gateway, Deployment, GitHub Integration, and Operations
Overview and System Architecture
Purpose and Scope
CodeAtlas is a code knowledge-graph platform that converts source files into a queryable graph of symbols and relationships, then exposes that graph to humans (via a web UI and CLI) and to AI coding agents (via an MCP server, FastAPI HTTP/JSON API, and semantic search). Its central claim is reducing the 60–80% of context window that AI agents typically waste re-orienting themselves inside an unfamiliar codebase. Source: README.md.
The project ships in v1.0.0 with 24 supported languages, a SQLite + FTS5 graph store (zero external infrastructure), optional FAISS-based semantic search, a React web frontend, and a FastMCP server exposing 30 tools. Source: README.md.
Layered Architecture
CodeAtlas follows a four-layer architecture: language parsers feed a graph database, which is wrapped by an API/MCP layer, which in turn drives both a human-facing web UI and an agent-facing tool surface.
flowchart TB
subgraph Sources
A[Source files: .py, .ts, .go, .rs, .java, .kt, .rb, .scala, .swift, ...]
end
subgraph Parsers["Language Parsers (Tree-sitter AST)"]
P1[Python]
P2[TypeScript / TSX]
P3[Go / Rust]
P4[Java / Kotlin]
P5[Ruby / Scala / Swift]
end
subgraph Core["Graph Core"]
DB[(SQLite + FTS5)]
VEC[(FAISS vector index - optional)]
ALGO[PageRank, communities, cycles, hotspots]
end
subgraph Interfaces["Interfaces"]
CLI[codeatlas CLI]
API[FastAPI HTTP/JSON]
MCP[FastMCP server - 30 tools]
end
subgraph Clients["Clients"]
WEB[React web UI]
AGENT[Claude Code / Cursor / agents]
CI[GitHub webhooks / pre-commit]
end
A --> Parsers
Parsers --> DB
DB <--> VEC
DB --> ALGO
DB --> CLI
DB --> API
DB --> MCP
CLI --> WEB
API --> WEB
MCP --> AGENT
CI --> DBEach layer has a clear responsibility boundary:
- Parsers extract
SymbolandRelationshiprecords from tree-sitter ASTs and emit them into a uniform intermediate model regardless of the source language. The Python parser, for example, walksfunction_definition/class_definition/decorated_definitionnodes and emitsCALLS,INHERITS, andDECORATESrelationships with explicit source spans. Source: src/codeatlas/parsers/python_parser.py. - Graph core persists everything into SQLite with FTS5 for keyword search and recursive-CTE traversals for graph queries. Optional FAISS indexes power semantic and hybrid search via reciprocal rank fusion. Source: README.md.
- Interfaces expose the graph through three channels: a Click-based CLI, a FastAPI service with Pydantic schemas, and a FastMCP server for AI agents.
- Clients consume the interfaces: a React + react-force-graph UI for humans, agent hosts (Claude Code, Cursor) for AI, and CI integrations (watchdog file watcher, GitHub webhooks, pre-commit hooks) for continuous sync. Source: README.md, examples/mcp-claude/README.md.
Parser Subsystem
The parser package follows a uniform contract: each language module produces a list of Symbol objects (id, name, qualified_name, kind, file_path, span, docstring, signature, decorators, language) and a list of Relationship objects (source_id, target_id, kind, file_path, span). The RelationshipKind enum covers at minimum CALLS, INHERITS, DECORATES, and unresolved references of the form <unresolved>::Name.
Examples of language-specific extraction strategies:
- Python captures decorators via
decorated_definition, extracts calls from function bodies, and recurses into nested definitions under the parent qualified name. Source: src/codeatlas/parsers/python_parser.py. - Java detects
finalmodifier on fields to distinguish constants from variables, and reconstructs method signatures as"returnType name(params)"for richer display. Source: src/codeatlas/parsers/java_parser.py. - Kotlin handles
class_declaration,object_declaration, andcompanion_object, threading anownerparameter so nested functions become qualified methods. Source: src/codeatlas/parsers/kotlin_parser.py. - Ruby walks
class,module, and method nodes, capturing thesuperclassfield as anINHERITSedge. Source: src/codeatlas/parsers/ruby_parser.py. - Scala builds signatures including explicit return types (
name(params): ReturnType). Source: src/codeatlas/parsers/scala_parser.py. - Swift traverses
class_declaration,protocol_declaration, andinheritance_specifierto capture both class and protocol conformance edges. Source: src/codeatlas/parsers/swift_parser.py.
The frontend shell declares itself as a dark-themed SPA titled "CodeAtlas" with the description "Explore your code as a knowledge graph — PageRank, semantic search, diff, coverage gaps." Source: frontend/index.html.
API, MCP, and Web Surfaces
The HTTP/JSON API uses Pydantic schemas for every response. Representative schemas include PageRankResponse, HotspotsResponse, CommunitiesResponse, CoverageGapsResponse, DiffResponse, ReindexResponse, and ErrorResponse, each modeled with explicit field names and types so consumers can rely on stable contracts. Source: src/codeatlas/api/schemas.py.
The MCP server is configured for agent hosts by pointing the mcpServers block at codeatlas serve --db <path>. Once registered, agents can call tools such as search_symbols, get_symbol_details, get_pagerank, find_path, get_dependencies, trace_call_chain, find_dead_code, analyze_complexity, get_hotspots, get_symbol_coverage, and chain them automatically (for example, search_symbols → get_symbol_details → get_dependencies). Source: examples/mcp-claude/README.md.
The web frontend's Overview page surfaces the top symbols by PageRank, links into the Analysis tab for deeper views, and routes each result to a dedicated /symbol/:id page. Empty states explicitly instruct users to run codeatlas index first. Source: frontend/src/pages/Overview.tsx.
Data Flow and Sync
A typical end-to-end flow looks like this:
- A user runs
codeatlas index [path](optionally with--incremental,--watch, or--workers N). Source: README.md. - Parsers emit
SymbolandRelationshiprecords per file; the graph core persists them into SQLite and, when installed, indexes embeddings into FAISS. - The graph is queried through the CLI, the FastAPI service, or the MCP server. The CLI offers
query,show,audit,find-path,coupling,hotspots,hubs,rank,communities,coverage-gaps,report, andagent-eval. Source: README.md. - A background
codeatlas index --watchprocess, acodeatlas pre-commit installhook, or a GitHub webhook keeps the graph fresh as files change.
Failure Modes and Operational Notes
Common failure modes a technical reader should plan for:
- Empty results after install: the web UI's Overview page renders an explicit "Run
codeatlas indexfirst" hint when no PageRank data exists. Source: frontend/src/pages/Overview.tsx. - Unresolved references: cross-file or cross-module references that cannot be resolved during parsing are stored as
<unresolved>::Nameplaceholders and resolved later by a graph pass. Source: src/codeatlas/parsers/python_parser.py. - Semantic search unavailable: the
[search]and[all]extras install sentence-transformers and FAISS; without them, only FTS5 keyword and hybrid (--hybrid) queries are available. Source: README.md. - MCP not registered: until
codeatlas serveis added to the host'smcpServersconfig and the host is restarted, agents cannot reach the graph. Source: examples/mcp-claude/README.md.
See Also
- CLI Commands reference (in README.md)
- MCP tool catalog and Claude Code / Cursor setup (in examples/mcp-claude/README.md)
- API response schemas (in src/codeatlas/api/schemas.py)
- Language parser implementations (under
src/codeatlas/parsers/) - v1.0.0 release notes (in the GitHub releases page)
Source: https://github.com/AryanSaini26/CodeAtlas / Human Manual
Code Graph Engine, Parsers, and Search
Related topics: Overview and System Architecture, Agent Interfaces, CLI, HTTP API, and Web UI
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview and System Architecture, Agent Interfaces, CLI, HTTP API, and Web UI
Code Graph Engine, Parsers, and Search
Architecture and Data Flow
CodeAtlas is positioned as a knowledge-graph engine for source code: tree-sitter parses files into symbols and relationships, a SQLite-backed graph store (with FTS5 and recursive CTEs) persists them, and a search layer exposes them through the CLI, MCP, HTTP API, web UI, and VS Code extension. Source: README.md.
The README's stated pipeline is Source Files → Tree-sitter AST → Symbols + Relationships → SQLite Graph, fanning out to FTS5 keyword search, FAISS vector search, and graph analysis. The same store is read by CLI commands, the 30-tool MCP server, the FastAPI layer, and the React UI — so all surfaces see identical data. Source: README.md.
flowchart LR
FS[Source Files] --> TS[Tree-sitter AST]
TS --> SYM[Symbols + Relationships]
SYM --> SQL[(SQLite + FTS5)]
SQL --> CLI[CLI]
SQL --> MCP[MCP Server - 30 tools]
SQL --> API[FastAPI HTTP API]
SQL --> UI[React Web UI]
SQL --> VS[VS Code Extension]
SYM --> VS2[FAISS Vectors]
SQL --> ANALY[Graph Analysis: PageRank, cycles, hotspots]Design choices highlighted in the README: SQLite over Neo4j (zero infrastructure, ships with Python, FTS5 + recursive CTEs), FAISS over pgvector (local, no DB server), and tree-sitter over regex (incremental, cross-language). Source: README.md.
Tree-sitter Parsers
CodeAtlas ships tree-sitter parsers for 24 languages, exposing classes, functions, interfaces, decorators, docstrings, imports, inheritance, generics, and language-specific constructs (JSDoc, Javadoc, KDoc, Scaladoc, XML doc comments, /// Rust doc comments). Source: README.md.
Internal structure (example: Swift and Ruby)
The parsers share a common shape: locate the language-specific declaration node, read its name child, build a stable symbol id from the file path and qualified name, attach the source span and docstring, then walk the body for nested declarations. The Ruby parser handles top-level statements via _walk_toplevel, dispatches to _handle_class (which records the symbol plus an INHERITS relationship from the superclass field), _handle_module, and method handlers, and resolves nested classes/methods inside the class body. Source: src/codeatlas/parsers/ruby_parser.py.
The Swift parser mirrors this for class_declaration, protocol_declaration, and friends: it locates the type_identifier for the name, emits the Symbol, then iterates inheritance_specifier children to emit one INHERITS relationship per parent type. Unresolved parents are written as <unresolved>::ParentType, which the resolver layer can later bind. It then recurses into class_body so nested declarations inherit the qualified scope. Source: src/codeatlas/parsers/swift_parser.py.
Indexing and parallelism
codeatlas index walks the repository, parses files, and writes them into the graph. Two flags shape behavior:
| Flag | Effect |
|---|---|
--incremental | Only re-index files that changed since the last index |
--watch | After indexing, keep watching for file changes (uses Watchdog) |
--workers N | Parse files in parallel across N processes |
Source: README.md. The companion codeatlas watch command exists as a standalone equivalent, and codeatlas diff [path] reports which files changed since the last index. Source: README.md.
Knowledge Graph Store and Graph Analysis
The graph lives in a SQLite database (default .codeatlas/graph.db) with FTS5 for keyword search and recursive CTEs for traversals like shortest-path and call-chain. Sources cite the design rationale explicitly: zero infrastructure, deterministic, and small enough to ship with Python. Source: README.md.
Graph analysis primitives surfaced through both CLI and MCP include:
- PageRank —
codeatlas rankranks symbols weighted by caller importance rather than raw degree;--kind class --jsonrestricts to a kind. Source: README.md. - Communities —
codeatlas communitiesruns label propagation; nodes receive acommunity_id, and the UI offers a toggle between kind-coloring and community-coloring. Source: README.md. - Cycles, dead code, hotspots, coverage gaps, coupling, shortest path — exposed via
codeatlas audit,codeatlas coverage-gaps,codeatlas coupling,codeatlas find-path <src> <tgt>, andcodeatlas hotspots [path]. Thecodeatlas report [path]command composes them into a single health report, optionally as JSON. Source: README.md. - Git-aware change impact — the
codeatlas diffand the MCPget_change_impacttool surface which symbols and files are affected by a change between refs; the web UI's Diff tab renders the same data as added/removed/modified columns. Source: README.md.
The graph can be exported for downstream tooling in DOT (Graphviz), JSON (D3.js), Mermaid, GraphML, CSV, and Cypher via codeatlas export --format .... Source: README.md.
Search Subsystem
CodeAtlas provides three search modes that share the same symbol store:
- Full-text (FTS5) —
codeatlas query <term>runs a keyword search across the indexed corpus. Source: README.md. - Semantic —
codeatlas query <term> --semanticruns natural-language vector search using sentence-transformers, requiring thecodeatlas[search]extra. Source: README.md. - Hybrid —
codeatlas query <term> --hybridmerges keyword and vector results with reciprocal rank fusion, giving the best of both lexical precision and semantic recall. Source: README.md.
All three return the same shape (consumable via --json) and feed the Web UI's Search pane, where results open into a detail view showing signature, docstring, and incoming/outgoing references. Source: README.md.
The MCP server exposes search as search_symbols (FTS5 with kind/file filters and query expansion) and find_similar_code (semantic); agents routinely chain search_symbols → get_symbol_details → get_dependencies to answer a single question — the key advantage over filesystem-only tools is querying a graph rather than a pile of text. Source: examples/mcp-claude/README.md.
Downstream Surfaces
The same graph backs every consumer, which is the project's main architectural claim:
- CLI —
codeatlas show <symbol>,query,audit,find-path,hotspots,hubs,rank,communities,coverage-gaps,report. Source: README.md. - MCP server — 30 tools (29 cited in the Claude example, 30 in the feature table) for AI agents; example prompts include "Find every caller of
parse_file" and "What symbols were added or modified betweenHEADandmain?". Source: examples/mcp-claude/README.md, README.md. - HTTP API + Web UI —
codeatlas uiserves both onlocalhost:8080; the frontend (codeatlas-webv1.0.0) is a Vite + React + Tailwind SPA built onreact-force-graph-2d,@tanstack/react-query,react-router-dom, andzustand. Source: frontend/package.json, README.md. The UI exposes Dashboard, Search, Analysis, Graph, Symbols, Diff, and Settings tabs. Source: README.md. - VS Code extension —
codeatlas-vscodev0.1.0 commands: Open Web UI, Search Symbols, Show Symbol at Cursor, Show Impact Radius, Build Agent Context. It targetscodeatlas.apiBase(defaulthttp://127.0.0.1:8080) and optionally sendsX-API-Keyfor servers started with--api-key. Source: vscode-extension/README.md, vscode-extension/package.json. - Real-time sync — Watchdog-based
codeatlas index --watchplus a GitHub webhook handler (codeatlas webhook /path/to/repo --port 9000 --secret YOUR_WEBHOOK_SECRET) for push-driven incremental updates. Source: README.md.
Common Failure Modes and Operational Notes
A few behaviors documented in the README and examples are worth keeping in mind when integrating:
- Cross-encoder reranker is opt-in. The README explicitly notes that the optional reranker "did not beat the graph/lexical baseline on the code-symbol suite" and was kept behind an opt-in flag, with results in
benchmarks/rerank-report.md. Source: README.md. - First-time semantic search needs the extra.
codeatlas[search]installs sentence-transformers; without it,--semanticand--hybridwill fail to load embeddings. Source: README.md. - Unresolved references. Parsers emit targets as
<unresolved>::Namewhen a parent/import can't be statically bound (e.g. Swiftinheritance_specifier, Rubysuperclass); the resolver layer is expected to bind these later. Source: src/codeatlas/parsers/swift_parser.py, src/codeatlas/parsers/ruby_parser.py. - Web UI depends on the API. The VS Code extension and the React UI both assume
codeatlas server(orcodeatlas ui) is running athttp://127.0.0.1:8080; configure withcodeatlas.apiBase/codeatlas.apiKeyfor VS Code. Source: vscode-extension/README.md. - v1.0.0 release. The v1.0.0 tag is the current stable line; the changelog is at
https://github.com/AryanSaini26/CodeAtlas/commits/v1.0.0. Source: README.md.
See Also
- Web UI and React Frontend —
codeatlas-webSPA, Dashboard, Graph, Diff tabs - MCP Server and Agent Integrations — 30 MCP tools, Claude/Cursor configuration, agent outcome A/B eval
- CLI Reference — full command list, flags, and
--jsonoutput conventions - Performance and Benchmarking —
benchmarks/rerank-report.md, scale report, retrieval V2 metrics
Source: https://github.com/AryanSaini26/CodeAtlas / Human Manual
Agent Interfaces, CLI, HTTP API, and Web UI
Related topics: Code Graph Engine, Parsers, and Search, Hosted Gateway, Deployment, GitHub Integration, and Operations
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Code Graph Engine, Parsers, and Search, Hosted Gateway, Deployment, GitHub Integration, and Operations
Agent Interfaces, CLI, HTTP API, and Web UI
CodeAtlas exposes the same underlying knowledge graph through four complementary surfaces: a Python CLI, an HTTP/JSON API, an MCP server for AI agents, and a React-based Web UI. This page describes each surface, the data they share, and the practical entry points for human and machine consumers.
Source: README.md
Architecture at a Glance
All four interfaces read from the same SQLite graph database, so a single index operation feeds every downstream consumer.
flowchart LR
A[Source Files] --> B[Tree-sitter Parsers]
B --> C[SQLite Graph DB]
C --> D[CLI]
C --> E[HTTP/JSON API]
C --> F[MCP Server]
C --> G[Web UI]
D --> H[Humans]
E --> I[Custom Tooling]
F --> J[AI Agents]
G --> K[Browser]Source: README.md
Command-Line Interface (CLI)
The CLI is the canonical entry point for indexing, querying, and analyzing a repository. It is installed automatically with pip install codeatlas and is implemented as the codeatlas console script.
Source: README.md
Common command categories include:
| Command | Purpose | ||
|---|---|---|---|
codeatlas index <path> | Parse a repository and build the graph | ||
codeatlas stats | Show file/symbol/relationship counts | ||
codeatlas query <term> | Keyword search (with --hybrid, --semantic, --json) | ||
codeatlas show <symbol> | Inspect signature, docs, deps, call chain | ||
codeatlas audit | Cycles, dead code, complexity report | ||
codeatlas find-path <src> <tgt> | Shortest dependency path | ||
codeatlas hotspots [path] | Highest-risk files (churn × in-degree) | ||
codeatlas rank | PageRank ranking | ||
codeatlas communities | Subsystem detection via label propagation | ||
codeatlas coverage-gaps | Public symbols with zero test coverage | ||
codeatlas viz --open | Launch the interactive graph in a browser | ||
| `codeatlas export --format dot | json | mermaid` | Graph export |
codeatlas serve | Start the HTTP/JSON API | ||
codeatlas watch <path> | Incremental re-index on file change | ||
codeatlas webhook <path> | GitHub webhook handler | ||
codeatlas agent-eval / eval / perf-report / doctor / data-lineage | Reproducible benchmark and diagnostics |
Source: README.md
Most query/analysis commands accept a --json flag so the CLI can be embedded in shell pipelines or used as a backing command for custom frontends.
HTTP/JSON API
The HTTP layer wraps the graph in a FastAPI application, giving non-Python clients a stable surface for integration. Pydantic models in schemas.py define the wire contract.
Source: src/codeatlas/api/schemas.py
Key response shapes include:
| Model | Fields | Used By |
|---|---|---|
GraphNode | id, name, qualified_name, kind, file, community_id | Graph view, exports |
GraphLink | source, target, kind, confidence | Graph view |
GraphResponse | nodes, links, truncated | /graph endpoint |
StatsResponse | files, symbols, relationships, languages, kinds | /stats endpoint |
SymbolRef | id, name, qualified_name, kind, file, line | Nested in detail models |
SymbolDetails | id, name, qualified_name, kind, file, start_line, end_line, signature, docstring, incoming, outgoing | /symbol/{id} endpoint |
ImpactDepthGroup | depth, count | Change-impact analysis |
Source: src/codeatlas/api/schemas.py
A comment in the file makes the stability contract explicit: *"Keep these stable — the web UI and any third-party consumers key off them. Breaking changes go behind a new /api/vN prefix rather than mutating the existing shape."*
Source: src/codeatlas/api/schemas.py
Beyond the core read API, hosted_routes.py exposes a local-dev hosted control plane with GitHub App OAuth, signed webhook handling, sync-worker dispatch, retrieval-eval endpoints, and security scanning.
Source: src/codeatlas/api/hosted_routes.py
MCP Server (AI Agent Interface)
The Model Context Protocol server exposes the graph to AI coding agents such as Claude Code and Cursor. With pip install codeatlas[mcp], agents can call tools like search_symbols, get_symbol_details, get_dependencies, trace_call_chain, get_impact_analysis, find_similar_code, detect_circular_dependencies, get_hotspots, get_symbol_coverage, and get_change_impact.
Source: README.md
The shipping example documents a minimal Claude Code wiring:
{
"mcpServers": {
"codeatlas": {
"command": "codeatlas",
"args": ["serve", "--db", "/path/to/repo/.codeatlas/graph.db"]
}
}
}
Source: examples/mcp-claude/README.md
The example emphasizes an agent workflow pattern: agents chain multiple tools — search_symbols → get_symbol_details → get_dependencies — to answer a single question, which is precisely where a graph query outperforms filesystem-only tools.
Source: examples/mcp-claude/README.md
Web UI
The Web UI is a React + TypeScript single-page application built with Vite, React Router, TanStack Query, Zustand, and react-force-graph-2d. It is launched locally with codeatlas viz --open.
Source: frontend/package.json
The HTML shell mounts a dark-themed app into #root and loads the bundled entry from /src/main.tsx.
Source: frontend/index.html
The UI is organized into several views: Search (showing signature, docstring, references), Analysis (PageRank ranking, hotspots, communities, coverage gaps), Graph (interactive force-directed visualization with kind/community coloring and file filtering), Symbols (detailed symbol pages), Diff (compare symbols between two git refs), and Settings (credentials, reindex, version info).
Source: README.md
The Overview page wires live data from the API into hero stat tiles (files indexed, symbols, relationships), a language breakdown bar, a symbol-kinds distribution, and the top-10 PageRank list with deep links to symbol pages.
Source: frontend/src/pages/Overview.tsx
Common Failure Modes
Source: frontend/src/pages/Overview.tsx
- Empty ranking on Overview — the page renders an
EmptyStatereading *"No ranking yet"* when no graph has been indexed; runcodeatlas indexfirst. - **CLI without sem
Source: https://github.com/AryanSaini26/CodeAtlas / Human Manual
Hosted Gateway, Deployment, GitHub Integration, and Operations
Related topics: Overview and System Architecture, Agent Interfaces, CLI, HTTP API, and Web UI
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview and System Architecture, Agent Interfaces, CLI, HTTP API, and Web UI
Hosted Gateway, Deployment, GitHub Integration, and Operations
CodeAtlas v1.0.0 ships a local-development hosted control plane that packages team management, repository registration, a background sync worker, GitHub App integration, and an operations surface into a single FastAPI process. The system is deliberately designed so teams can stand up a multi-tenant context gateway on a laptop before wiring real OAuth, billing, or remote MCP routing — the latter are explicitly stubbed in the MVP. Source: src/codeatlas/api/hosted_routes.py:1-15.
Architecture Overview
The hosted gateway exposes a FastAPI router mounted under /hosted, backed by a SQLite-backed HostedStore that persists teams, principals, repositories, sync jobs, and webhook delivery IDs. A separate SyncJobWorker runs in the background to clone repositories and trigger per-repo graph indexing. GitHub App concerns (OAuth, webhook signature verification, repository refresh) are isolated in codeatlas/github_app.py so the control plane can be exercised in CI without live credentials.
flowchart LR
A[Bootstrap<br/>bootstrap] --> B[Register Team & Repo<br/>register-repo]
B --> C[SyncJobWorker<br/>background]
C --> D[Per-repo Graph DB<br/>GraphStore]
D --> E[Remote MCP<br/>/remote-mcp]
D --> F[Hosted Dashboard<br/>/hosted]
G[GitHub Webhook<br/>push event] --> H[process_github_webhook]
H --> C
I[Agent / IDE] --> E
I --> FBootstrap, Registration, and Sync
Bootstrap is the entry point: it accepts a BootstrapRequest (email, name, team slug, team name) and creates the first principal and team record. Source: src/codeatlas/api/hosted_routes.py:18-25.
Repository registration uses a RepoRegistration payload persisted by HostedStore. Once registered, codeatlas hosted sync enqueues a sync job that the worker picks up. Source: src/codeatlas/hosted_worker.py and src/codeatlas/hosted.py.
| Step | Command / Action | Source |
|---|---|---|
| Initialize gateway | codeatlas hosted bootstrap --hosted-db .codeatlas/hosted.db | hosted_routes.py |
| Register repo | codeatlas hosted register-repo | hosted.py |
| Trigger sync | codeatlas hosted sync | hosted_worker.py |
| Inspect status | codeatlas hosted github status | hosted_routes.py |
Each sync run activates the repository and rebuilds the per-repo GraphStore (.codeatlas/graph.db). The dashboard reads from this store to render the Agent Context Feed, measured retrieval quality, blast-radius impact, and data lineage views. Source: src/codeatlas/api/hosted_routes.py:1-15.
GitHub App Integration
The GitHub integration is split across three concerns. First, OAuth helpers in github_app.py (build_oauth_authorize_url, exchange_oauth_code, fetch_github_user) handle user authorization. Second, verify_github_signature validates incoming webhook deliveries using the configured secret. Third, refresh_github_repositories enumerates the installation's repositories from a fixture or token source so CI runs stay hermetic. Source: src/codeatlas/github_app.py.
parse_webhook_payload and process_github_webhook decode push events, record the delivery ID via webhook_rate_limiter, and enqueue per-repo syncs. The hosted_routes.py module wires these into the /hosted/github/webhook endpoint. Source: src/codeatlas/api/hosted_routes.py:1-15 and src/codeatlas/rate_limit.py.
For repo-scoped remote context, /remote-mcp validates the X-Stratum-Audience header and serves context packs through build_context_pack along with graph summaries. The same endpoint includes prompt-injection, secret, and vendor-path scan results from scan_context_pack. Source: src/codeatlas/context_security.py and src/codeatlas/agent_context.py.
Operations: Security, Rate Limiting, and Retrieval Eval
The operations surface combines three subsystems. Rate limiting is centralized in codeatlas/rate_limit.py, exposing context_rate_limiter and webhook_rate_limiter that the routes consult before accepting context requests or webhook deliveries. Source: src/codeatlas/rate_limit.py.
Context security scanning (scan_context_pack) runs every context pack through prompt-injection, secret-leak, and vendor-path detectors before it leaves the gateway. The scan verdict travels alongside the pack so consumers can reject suspicious payloads. Source: src/codeatlas/context_security.py.
Retrieval quality is treated as a first-class operational signal: hosted_eval.run_repo_retrieval_eval measures recall@k and MRR per repository, while compute_context_savings quantifies token reduction versus prompt-only baselines. These metrics are surfaced on the hosted dashboard so teams can decide whether the graph is actually helping their agents. Source: src/codeatlas/hosted_eval.py.
Common Failure Modes
- Webhook signature mismatch — webhook deliveries without a valid HMAC are rejected before reaching the worker; verify the secret via
codeatlas hosted webhook-test. - Sync job backlog — if the worker cannot keep up,
codeatlas hosted github refresh-reposre-enumerates installations and the queue drains on the next worker tick. - Context budget overflow —
codeatlas context <query> --budget 2000trims the pack; raising the budget without re-running the eval suite can regress measured quality. - Remote MCP audience rejection — calls to
/remote-mcpwithoutX-Stratum-Audienceare refused; this is by design to keep repo context scoped.
See Also
- Retrieval and Context Pack Pipeline
- MCP Server and Agent Tooling
- GitHub App and Webhook Setup Guide
Source: https://github.com/AryanSaini26/CodeAtlas / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
Doramagic Pitfall Log
Found 7 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Configuration risk - Configuration risk requires verification.
1. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.host_targets | https://github.com/AryanSaini26/CodeAtlas
2. Capability evidence risk: Capability evidence risk requires verification
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | https://github.com/AryanSaini26/CodeAtlas
3. Maintenance risk: Maintenance risk requires verification
- Severity: medium
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/AryanSaini26/CodeAtlas
4. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: downstream_validation.risk_items | https://github.com/AryanSaini26/CodeAtlas
5. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: risks.scoring_risks | https://github.com/AryanSaini26/CodeAtlas
6. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/AryanSaini26/CodeAtlas
7. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/AryanSaini26/CodeAtlas
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using CodeAtlas with real data or production workflows.
- CodeAtlas v1.0.0 - github / github_release
- Configuration risk requires verification - GitHub / issue
Source: Project Pack community evidence and pitfall evidence