LightRAG Manual - Doramagic.ai

Doramagic Project Pack · Human Manual

LightRAG

[EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"

Overview and System Architecture

Related topics: Core API, Query Pipeline, and Multimodal Processing, Storage Backends, Knowledge Graph, and Document Management

Section Related Pages

Continue reading this section for the full explanation and source context.

Section 3.1 Python Core (lightrag/)

Continue reading this section for the full explanation and source context.

Section 3.2 LightRAG Server (lightrag-server)

Continue reading this section for the full explanation and source context.

Section 3.3 React WebUI (lightragwebui/)

Continue reading this section for the full explanation and source context.

Overview and System Architecture

1. Project Purpose and Scope

LightRAG ("Simple and Fast Retrieval-Augmented Generation") is an open-source RAG framework that combines vector retrieval with knowledge-graph construction to overcome the limitations of flat, chunk-based RAG systems. The project's stated goal is to reduce hallucination and improve answer comprehensiveness by extracting entities and relationships from source documents at index time, then using the resulting graph to enrich retrieval at query time.

The repository ships three coordinated artifacts:

A Python core library that can be embedded in any application or used directly from a script.
A FastAPI-based Server (lightrag-server) that exposes REST endpoints, an Ollama-compatible chat interface, and a built-in WebUI for knowledge-graph exploration and a simple RAG query interface.
A React/TypeScript WebUI (lightrag_webui) built with Vite, Tailwind, Radix UI, and sigma/graphology for interactive graph rendering.

The framework is published as the package lightrag-hku on PyPI and is actively maintained by the HKUDS research group, with multimodal document processing having been merged into the mainline as of v1.5.0.

2. High-Level Architecture

LightRAG is organized around a clear separation between the indexing pipeline and the query pipeline, both of which share a common unified token-control subsystem and four pluggable storage backends.

flowchart LR
    subgraph Index["Indexing Pipeline"]
        Docs["Documents<br/>(PDF, DOCX, MD, code, images)"] --> Parser["Multimodal Parser<br/>(MinerU / Docling)"]
        Parser --> Chunker["Chunker"]
        Chunker --> Extractor["LLM Entity/Relation Extractor"]
        Extractor --> KG[("Knowledge Graph<br/>(graph storage)")]
        Chunker --> VDB[("Vector DB<br/>(chunk embeddings)")]
        Extractor --> KV[("KV Storage<br/>(LLM cache, doc status)")]
    end

    subgraph Query["Query Pipeline"]
        Q["User Query"] --> Router{"Query Mode"}
        Router -->|local| LocalRetr["Entity-centric retrieval"]
        Router -->|global| GlobalRetr["Relation-centric retrieval"]
        Router -->|naive/hybrid/mix/bypass| Rerank["Reranker (optional)"]
        LocalRetr --> TokenCtrl["Unified Token Control<br/>(max_entity_tokens,<br/>max_relation_tokens,<br/>max_total_tokens)"]
        GlobalRetr --> TokenCtrl
        Rerank --> TokenCtrl
        TokenCtrl --> LLMResp["LLM Response Generator"]
    end

    KG -.->|entities/relations| Query
    VDB -.->|chunks| Query
    KV -.->|cache lookup| Query

Key architectural properties:

Graph + vectors together. Entity-relationship triples and text-chunk embeddings are produced from the same source documents and stored side-by-side. Retrieval can then fan out across both representations.
Six query modes exposed in the public API: naive, local, global, hybrid, mix, and bypass (which skips retrieval and queries the LLM directly). Source: lightrag_webui/src/api/lightrag.ts.
Unified token budget. The QueryRequest payload carries max_entity_tokens, max_relation_tokens, and max_total_tokens so callers can tune how much of the context window is spent on entities vs. relations vs. chunks vs. system prompt. Source: lightrag_webui/src/api/lightrag.ts.
Optional reranking. When a rerank model is configured, enable_rerank is on by default; a warning is emitted if it is requested but no reranker is wired up. Source: lightrag_webui/src/api/lightrag.ts.

3. System Components

3.1 Python Core (`lightrag/`)

The core package contains the indexing, retrieval, and storage abstractions. It is the layer intended for embedded use and for researchers running the published evaluations. The project recommends that production integrators prefer the REST API instead. Source: README.md.

3.2 LightRAG Server (`lightrag-server`)

The server wraps the core in a FastAPI application and adds:

A WebUI bundled into lightrag/api/webui during the front-end build.
An Ollama-compatible chat endpoint so external tools such as Open WebUI can treat LightRAG as a drop-in chat model.
A setup wizard (added in the 2026.03 release line) that can deploy local embeddings, rerankers, and storage backends via Docker.
Role-specific LLM configuration with four distinct roles — EXTRACT, QUERY, KEYWORDS, and VLM — each with independent settings (added 2026.05). Source: README.md.

3.3 React WebUI (`lightrag_webui/`)

A Vite + React 19 + TypeScript SPA. Build and test tooling are managed with Bun (bun install --frozen-lockfile, bun run build, bun test); Node.js/npm is supported as a fallback for dev/build/preview/lint but not for tests. Source: lightrag_webui/README.md. The UI consumes the same TypeScript types used by the API client, including the QueryRequest/QueryResponse and DocActionResponse shapes. Source: lightrag_webui/src/api/lightrag.ts.

3.4 Storage and Infrastructure

LightRAG supports four swappable storage subsystems (vector, graph, KV, and doc-status). Recent releases have added OpenSearch as a unified storage backend providing all four, and a KubeBlocks-based deployment recipe for running production databases on Kubernetes (PostgreSQL, Neo4j, Redis, etc.). Source: README.md, k8s-deploy/databases/README.md.

3.5 Multimodal Pipeline

External parsers — primarily MinerU and Docling — convert PDFs, Office documents, and images into structured blocks (text, tables, formulas, figures). The resulting content is then fed into the LightRAG indexer so that non-textual evidence participates in entity extraction and retrieval. The supported ingest formats are enumerated in the WebUI's MIME-type map. Source: lightrag_webui/src/lib/constants.ts.

4. Supported File Types and Evaluation

The WebUI's SUPPORTED_DOCUMENT_TYPES constant defines the accepted MIME types and extensions: common text/code formats (.md, .txt, .json, .csv, source files for many languages), PDF (.pdf), and Microsoft Office formats (.docx, .pptx, .xlsx). Source: lightrag_webui/src/lib/constants.ts.

For quality assessment, the repository ships a RAGAS-based harness in lightrag/evaluation/. A bundled sample dataset of five markdown files targets an expected RAGAS score of ~91–100% per question and is designed around the entity-relationship patterns the default extraction prompts expect. Source: lightrag/evaluation/sample_documents/README.md.

5. Community Patterns and Known Limitations

Several recurring community questions are addressed by the architecture described above:

Returning the source file name for a citation. A long-standing request (issue #323) asks how to map an answer back to the originating PDF. This is now feasible through the DocActionResponse tracking and the track_id/metadata returned by the document management API, but requires the caller to persist the document metadata at insertion time. Source: lightrag_webui/src/api/lightrag.ts.
Custom metadata columns for document management (issue #1985) — relevant to the multimodal pipeline where blocks need extra fields such as page number, modality, or table caption.
Framework integration (e.g., N8N node) (issue #328) — answered today by the Ollama-compatible chat endpoint and the documented REST API, both of which can be driven from any external orchestrator.
Production hardening (issue #422) — addressed by the role-specific LLM configuration, the unified token-control knobs, and the KubeBlocks deployment recipes.

Core API, Query Pipeline, and Multimodal Processing

Related topics: Overview and System Architecture, Storage Backends, Knowledge Graph, and Document Management

Section Related Pages

Continue reading this section for the full explanation and source context.

Core API, Query Pipeline, and Multimodal Processing

Overview

LightRAG exposes a dual surface for retrieval-augmented generation: a Python LightRAG core (intended for embedded use and research) and a LightRAG Server that hosts a REST API plus a React-based WebUI. As of v1.5.0, both surfaces share the same query pipeline, and the multimodal document processing capabilities previously shipped in the separate RAG-Anything project have been merged into the main package. Source: README.md:1-50

The Core API is therefore the single entry point for indexing documents, querying the knowledge graph, and managing entities, relations, and document lifecycles. The query pipeline is parameterized through QueryParam (Core) or a QueryRequest (REST), and supports six retrieval modes, token-budget control, reranking, conversation history, and streaming. Source: lightrag_webui/src/api/lightrag.ts:1-90

Query Pipeline and Retrieval Modes

The query pipeline supports six retrieval modes, declared as the QueryMode union in the WebUI client and consumed by the server backend:

Mode	Purpose
`naive`	Vector-only retrieval, no graph traversal
`local`	Entity-centric retrieval from the knowledge graph
`global`	Relationship-centric retrieval from the knowledge graph
`hybrid`	Combination of local and global retrieval
`mix`	Combines knowledge graph retrieval with vector chunks; default when a reranker is configured
`bypass`	Skips retrieval and sends the query directly to the LLM

Source: lightrag_webui/src/api/lightrag.ts:1-15

The README states that "Reranker is now supported, significantly boosting performance for mixed queries (set as default query mode)," which explains why mix is the recommended default. Source: README.md:1-20

The QueryRequest shape used by the REST API includes fine-grained control over retrieval: top_k (entities in local mode, relations in global mode), chunk_top_k (max chunks retained after reranking), max_entity_tokens, max_relation_tokens, and max_total_tokens — a unified token budget across entities, relations, chunks, and system prompt. Additional fields include enable_rerank, history_turns for conversation memory, and user_prompt to override the default prompt template. Source: lightrag_webui/src/api/lightrag.ts:40-90

flowchart LR
    A[User Query] --> B[QueryRequest]
    B --> C{Mode}
    C -->|naive| D[Vector Store]
    C -->|local| E[Entity Graph]
    C -->|global| F[Relation Graph]
    C -->|hybrid| E & F
    C -->|mix| E & F & D
    C -->|bypass| H[LLM]
    D & E & F --> G[Reranker]
    G --> I[Context Assembly]
    I --> J[LLM Response]

Document Lifecycle and Source Attribution

The REST API exposes document lifecycle endpoints that return structured responses. DocActionResponse carries a status field with values success | partial_success | failure, plus chunks_count and file_path for each processed document. The paginated listing endpoint (DocumentsRequest, PaginatedDocsResponse) supports status filtering and sorting by created_at, updated_at, id, or file_path. Source: lightrag_webui/src/api/lightrag.ts:90-140

Community issue #323 highlights a recurring pain point: users want to know which source file produced a given answer. LightRAG addresses this through its document tracking system (TrackStatusResponse, DocsStatusesResponse), and the DocActionResponse type confirms that source attribution flows through file_path rather than a free-form metadata field. Custom metadata columns for richer document management (issue #1985) are an active area of community discussion and are not yet a first-class configuration option in the typed surface. Source: lightrag_webui/src/api/lightrag.ts:90-140, README.md:1-50

The EntityUpdateResponse type also includes an operation_summary describing whether a rename or merge succeeded, which is relevant for users who programmatically edit the knowledge graph and need to confirm the server's resolution. Source: lightrag_webui/src/api/lightrag.ts:60-90

Multimodal Document Processing

Starting with v1.5.0, LightRAG Server includes a multimodal document pipeline for PDFs, Office documents, images, tables, and formulas. Parsing is delegated to external MinerU or Docling services, while multimodal indexing runs inside the LightRAG pipeline. The README states that "All RagAnything's multimodal processing capabilities are merged into LightRAG" and that the separate RAG-Anything repository "will no longer receive core feature updates or maintenance." Source: README.md:1-50

The pipeline requires the role-specific LLM configuration released alongside multimodal support: four distinct roles — EXTRACT, QUERY, KEYWORDS, and VLM — each with independent LLM settings. This separation lets users point the VLM (vision-language model) role at a multimodal-capable model such as GPT-4o or a Qwen-VL variant while keeping extraction on a cheaper text-only LLM. Source: README.md:1-20

WebUI Surface

The React-based WebUI (built with Bun or npm) consumes the same REST API. Its package.json reveals the rendering stack: react-markdown, remark-math, rehype-katex, mermaid, sigma and graphology for knowledge graph exploration, and react-dropzone for document upload. The lightrag_webui/README.md confirms that the build output is bundled into lightrag/api/webui and served by the LightRAG Server. Source: lightrag_webui/package.json:1-90, lightrag_webui/README.md:1-30

The graphColor.ts utility maps extracted entity types (person, organization, location, event, artifact, naturalobject, data, content, creature, method) to display colors used by the sigma graph renderer, giving the WebUI its visual taxonomy for the knowledge graph explorer. Source: lightrag_webui/src/utils/graphColor.ts:1-90

Deployment and Evaluation

For production, the project ships KubeBlocks-based Helm scripts under k8s-deploy/databases/ that provision PostgreSQL, Neo4j, and other backends on Kubernetes. Evaluation against the published paper's claims is supported through lightrag/evaluation/, which bundles a sample dataset and an eval_rag_quality.py driver that targets ~91–100% RAGAS scores on its bundled questions. Source: k8s-deploy/databases/README.md:1-30, lightrag/evaluation/sample_documents/README.md:1-20

Storage Backends, Knowledge Graph, and Document Management

Related topics: Core API, Query Pipeline, and Multimodal Processing, Deployment, WebUI, Evaluation, and Operations

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Factory and Selection

Continue reading this section for the full explanation and source context.

Section Concurrency Control

Continue reading this section for the full explanation and source context.

Section Document Status Store

Continue reading this section for the full explanation and source context.

Storage Backends, Knowledge Graph, and Document Management

Overview

LightRAG organizes persistent state into three cooperating subsystems: a storage backend layer (key-value, vector, and graph stores), a knowledge graph built from extracted entities and relations, and a document management pipeline that tracks ingestion, chunking, embedding, and deletion. The factory pattern in lightrag/kg/factory.py constructs the concrete storage instances at runtime, while a single-process lock manager in lightrag/kg/shared_storage.py coordinates concurrent access across asyncio tasks and worker threads.

The architecture is intentionally pluggable. Built-in JSON implementations are provided for development, while production deployments can swap in PostgreSQL, Neo4j, MongoDB, Milvus, Qdrant, or OpenSearch. As announced in the v1.5.0 release notes (README.md), the system has also absorbed RAG-Anything's multimodal processing, allowing images, tables, and formulas inside documents to be indexed alongside plain text.

Storage Backend Layer

Factory and Selection

The lightrag/kg/factory.py module exposes factory functions (such as get_keyed_storage, get_vector_storage, and get_graph_storage) that return a backend instance matching a configured type string. The keyed storage abstraction is implemented in lightrag/kg/json_kv_impl.py using append-only JSON files; the default vector backend is defined in lightrag/kg/nano_vector_db_impl.py; and the default graph backend lives in lightrag/kg/networkx_impl.py. Each factory call accepts a namespace (e.g., full_docs, text_chunks, entities, relationships, chunks) so the same backend type can host multiple logical stores.

Concurrency Control

lightrag/kg/shared_storage.py provides process-wide Lock and Event primitives. Because some backends (such as the JSON file stores) are not safe under concurrent writes, every mutation funnels through these locks. Reader-writer separation is enforced so that vector and graph queries can proceed in parallel with document ingestion, while writes remain serialized per namespace.

Document Status Store

lightrag/kg/json_doc_status_impl.py persists a per-document state machine (DocStatus) that records file path, content hash, chunk count, error messages, and timestamps. The frontend type DocStatusResponse in lightrag_webui/src/api/lightrag.ts mirrors this state, and the WebUI surfaces paginated views via PaginatedDocsResponse and StatusCountsResponse. This is the structure referenced in community issue #1985, where users request custom metadata columns — additional fields can be propagated through metadata: Record<string, any> on DocStatusResponse and stored in the doc-status JSON record.

Knowledge Graph and Source Attribution

Entity and Relation Storage

Entities and relationships extracted by the LLM are persisted in the entities and relationships namespaces of the keyed storage. The graph view is composed by lightrag/kg/networkx_impl.py (or an external graph database in production) and is queried during local, global, and hybrid retrieval modes. The QueryRequest and QueryResponse types in lightrag_webui/src/api/lightrag.ts expose top_k, chunk_top_k, max_entity_tokens, max_relation_tokens, and max_total_tokens so callers can tune how much of the graph is injected into the LLM context.

Mapping Answers Back to Source Files

Community issue #323 highlights a common gap: callers want to know *which file* produced a given answer. Each chunk stored in text_chunks retains a reference to its parent document, and DocStatusResponse.file_path in lightrag_webui/src/api/lightrag.ts carries the original path. The entity_update_response and chunk-level metadata allow applications to walk from a retrieved entity back to its source chunks and ultimately to the source file path stored in the doc-status record.

flowchart LR
    A[Source Document] --> B[Chunker]
    B --> C[Text Chunks KV]
    B --> D[Vector Store]
    B --> E[LLM Extraction]
    E --> F[Entities KV]
    E --> G[Relationships KV]
    F --> H[Graph Store]
    G --> H
    H --> I[Retrieval Modes]
    D --> I
    C --> I
    I --> J[LLM Answer + Citations]
    J --> K[Source File Path]

Document Management Workflow

Documents move through the states defined by DocStatus: pending, processing, processed, failed, and deleted (after 2025.08's document-deletion feature, which triggers automatic KG regeneration — see README.md). The TrackStatusResponse type in lightrag_webui/src/api/lightrag.ts returns a status_summary aggregating counts across all states, and DocumentsRequest supports paginated browsing with sorting on created_at, updated_at, id, or file_path.

The evaluation harness in lightrag/evaluation/sample_documents/README.md demonstrates the end-to-end flow: index five markdown samples, then run python lightrag/evaluation/eval_rag_quality.py to obtain RAGAS scores. This same pipeline is what production users automate through the LightRAG Server's REST endpoints and the React WebUI built with react, sigma, graphology, and react-markdown (declared in lightrag_webui/package.json).

Configuration and Operational Notes

Embedding model lock-in: As noted in README.md, the embedding model and its output dimension must be fixed before indexing begins. Changing it later requires dropping the vector tables so they can be recreated with the new dimensionality.
Reranker integration: When a reranker (e.g., BAAI/bge-reranker-v2-m3) is configured, enable_rerank on QueryRequest defaults to true and mix is the recommended default mode.
Role-specific LLMs: The 2026.05 release introduced four LLM roles — EXTRACT, QUERY, KEYWORDS, and VLM — each with its own configuration, useful when stronger models should handle extraction or vision tasks.
Production backends: The Helm chart in k8s-deploy/README.md documents switching from the built-in JSON/NanoVectorDB/NetworkX defaults to external PostgreSQL, Neo4j, Milvus, Qdrant, or OpenSearch instances for high availability and horizontal scale.

Deployment, WebUI, Evaluation, and Operations

Related topics: Core API, Query Pipeline, and Multimodal Processing, Storage Backends, Knowledge Graph, and Document Management

Section Related Pages

Continue reading this section for the full explanation and source context.

Deployment, WebUI, Evaluation, and Operations

LightRAG ships with several layers that move the project from a research prototype into a production-ready system: containerized deployment, an interactive React-based WebUI, a reproducible evaluation harness, and operationally relevant features such as tracing, token accounting, and role-specific LLM configuration. This page consolidates those capabilities and links them to the relevant community questions raised by users.

Deployment Topologies

LightRAG supports three primary deployment paths: a PyPI installable server, a Docker Compose stack, and a Helm-based Kubernetes chart. The PyPI route uses the [api] extra and is recommended for local development and small-scale use cases Source: README.md:1-100. After installing the package and building the front-end with Bun, the server can be launched directly with the lightrag-server command, or via make dev for the source tree.

For containerized deployments, the README documents a docker compose up workflow that consumes an env.example template Source: README.md:1-100. The Kubernetes path is provided through a Helm chart that includes two recommended modes: a lightweight deployment using the built-in storage for testing, and a production deployment wired to external databases such as PostgreSQL and Neo4j Source: k8s-deploy/README.md:1-30. The companion k8s-deploy/databases/README.md shows how KubeBlocks can provision the backing data stores inside the cluster, with a configurable 00-config.sh that toggles individual engines on or off Source: k8s-deploy/databases/README.md:1-80.

flowchart LR
  A[Source: uv sync / pip install -e] --> B[Build WebUI: bun run build]
  B --> C{Deployment Target}
  C -->|PyPI| D[lightrag-server]
  C -->|Docker| E[docker compose up]
  C -->|Kubernetes| F[Helm chart + KubeBlocks DBs]
  C -->|Local dev| G[make dev]

The role-specific LLM configuration released in 2026.05 — four distinct roles (EXTRACT, QUERY, KEYWORDS, VLM) with independent LLM settings — is most easily exercised through the Docker or Helm path because the env file or Helm values file can declare each role separately Source: README.md:1-100.

WebUI: Build, Run, and Query Surface

The WebUI is a React + Vite application that bundles into the server's static assets. The repository documents two build flows: Bun (recommended) and Node.js/npm as a fallback for environments where Bun is unavailable Source: lightrag_webui/README.md:1-60. Tests still require Bun even when npm is used for the build, which is called out explicitly so contributors are not surprised Source: lightrag_webui/README.md:1-60.

The TypeScript API surface exposed to the UI declares six retrieval modes — naive, local, global, hybrid, mix, and bypass — where each mode trades off graph locality against global relational context Source: lightrag_webui/src/api/lightrag.ts:1-60. The QueryRequest type carries fine-grained token budgets (max_entity_tokens, max_relation_tokens, max_total_tokens), enabling a unified token control system that the user can tune per request Source: lightrag_webui/src/api/lightrag.ts:1-80. Other operational toggles include only_need_context, only_need_prompt, streaming, reranking (enable_rerank), and history_turns for multi-turn conversations Source: lightrag_webui/src/api/lightrag.ts:1-80.

The WebUI also surfaces graph exploration features; lightrag_webui/src/utils/graphColor.ts ships an entity-type-to-color map covering both English and Chinese taxonomy terms (person, organization, event, location, artifact, technology, etc.), allowing the force-directed sigma graph to render consistently across locales Source: lightrag_webui/src/utils/graphColor.ts:1-80. Reverse-proxy deployments are accommodated by normalizeApiPrefix/normalizeWebuiPrefix, which are unit-tested to avoid protocol-relative URLs and trailing-slash issues in fetch templates Source: lightrag_webui/src/lib/pathPrefix.test.ts:1-50.

The v1.5.0 release merged RagAnything's multimodal pipeline into the main server, so the WebUI can now upload PDFs, Office documents, and images and route them through MinerU or Docling parsing services before they enter the indexing pipeline. This addresses long-standing community questions such as #323, where users wanted to know which source file (for example a.pdf) was used to answer a given query — multimodal-aware document chunks carry file metadata that the response payload can now surface Source: README.md:1-100.

Evaluation with RAGAS

Quality assurance is provided through a sample-driven RAGAS harness. The lightrag/evaluation/sample_documents/README.md documents five markdown fixtures that map one-to-one with questions in sample_dataset.json and yield an expected RAGAS score of ~91–100% per question Source: lightrag/evaluation/sample_documents/README.md:1-30. The fixtures cover LightRAG's overview, RAG architecture, improvements over traditional RAG, supported databases, and evaluation/deployment topics — chosen because they exhibit clear entity-relationship patterns that the default extraction prompts in lightrag/prompt.py can parse reliably Source: lightrag/evaluation/sample_documents/README.md:1-30.

The README's release notes mention that the API has been updated to return retrieved contexts alongside query results, which is precisely the signal RAGAS needs for context-precision and context-recall metrics Source: README.md:1-100. Operators wanting to extend coverage are advised to customize lightrag/prompt.py for their own data shapes before treating the 91–100% score as representative.

Operations, Tracing, and Integrations

For ongoing operations, the 2025.11 release integrated Langfuse tracing and RAGAS evaluation, and the 2025.10 release eliminated processing bottlenecks for large-scale datasets Source: README.md:1-100. Reranker support — enabled by default for mixed queries — improves retrieval quality and is a common point of interest for production users tuning the new mix mode Source: lightrag_webui/src/api/lightrag.ts:1-60.

Community discussions map directly to operational concerns: #422 ("production use-case questions") is a recurring thread where users ask about storage durability and throughput, both of which the Helm chart addresses via the external-database deployment mode Source: k8s-deploy/README.md:1-30; #1985 ("custom metadata columns for document management") aligns with the multimodal document pipeline that now stores richer per-chunk metadata; and #328 ("LightRAG as N8N framework node") is naturally answered by exposing the REST API as the integration surface, since the server already exposes the query, document management, and knowledge graph endpoints that N8N's AI node can consume.

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

high Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

high Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

high Configuration risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 36 structured pitfall item(s), including 8 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

1. Installation risk: Installation risk requires verification

Severity: high
Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/HKUDS/LightRAG/issues/2642

2. Installation risk: Installation risk requires verification

Severity: high
Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/HKUDS/LightRAG/issues/2746

3. Installation risk: Installation risk requires verification

Severity: high
Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/HKUDS/LightRAG/issues/3195

4. Configuration risk: Configuration risk requires verification

Severity: high
Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/HKUDS/LightRAG/issues/2502

5. Configuration risk: Configuration risk requires verification

Severity: high
Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/HKUDS/LightRAG/issues/2709

6. Capability evidence risk: Capability evidence risk requires verification

Severity: high
Finding: Project evidence flags a capability evidence risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/HKUDS/LightRAG/issues/2761

7. Runtime risk: Runtime risk requires verification

Severity: high
Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/HKUDS/LightRAG/issues/2768

8. Runtime risk: Runtime risk requires verification

Severity: high
Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/HKUDS/LightRAG/issues/3234

9. Installation risk: Installation risk requires verification

Severity: medium
Finding: Developers should check this installation risk before relying on the project: [v1.5.0] /query still returns [no-context] due to embedding worker timeout even though embeddings API is reachable and documents exist
User impact: Developers may fail before the first successful local run: [v1.5.0] /query still returns [no-context] due to embedding worker timeout even though embeddings API is reachable and documents exist
Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: [v1.5.0] /query still returns [no-context] due to embedding worker timeout even though embeddings API is reachable and documents exist. Context: Observed when using python, docker
Evidence: failure_mode_cluster:github_issue | https://github.com/HKUDS/LightRAG/issues/3195

10. Installation risk: Installation risk requires verification

Severity: medium
Finding: Developers should check this installation risk before relying on the project: v1.4.10
User impact: Upgrade or migration may change expected behavior: v1.4.10
Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: v1.4.10. Context: Observed when using windows
Evidence: failure_mode_cluster:github_release | https://github.com/HKUDS/LightRAG/releases/tag/v1.4.10

11. Installation risk: Installation risk requires verification

Severity: medium
Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/HKUDS/LightRAG/issues/3197

12. Configuration risk: Configuration risk requires verification

Severity: medium
Finding: Developers should check this configuration risk before relying on the project: RFC: hybrid BM25 + vector retrieval with graph traversal seeding for jargon-heavy domains
User impact: Developers may misconfigure credentials, environment, or host setup: RFC: hybrid BM25 + vector retrieval with graph traversal seeding for jargon-heavy domains
Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: RFC: hybrid BM25 + vector retrieval with graph traversal seeding for jargon-heavy domains. Context: Observed when using python
Evidence: failure_mode_cluster:github_issue | https://github.com/HKUDS/LightRAG/issues/3198

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using LightRAG with real data or production workflows.

Community source 1 - github / github_issue
[[Bug]:connection was closed in the middle of operation](https://github.com/HKUDS/LightRAG/issues/2746) - github / github_issue
Community source 3 - github / github_issue
RFC: introduce a BaseExternalParser protocol for pluggable OCR/VLM backe - github / github_issue
[[Feature Request]:can you add workspace。support some type konwledge by o](https://github.com/HKUDS/LightRAG/issues/3236) - github / github_issue
Community source 6 - github / github_issue
Community source 7 - github / github_issue
[[Bug] Entity description accumulated in Milvus dynamic field exceeds 65K](https://github.com/HKUDS/LightRAG/issues/3204) - github / github_issue
RFC: hybrid BM25 + vector retrieval with graph traversal seeding for jar - github / github_issue
Community source 10 - github / github_issue
Guidance on Adding Multimodal Support to LightRAG: Wrap with RAG‑Anythin - github / github_issue
[[Bug]:RagAnything with Ollma(qwen3-vl) image process, Getting error](https://github.com/HKUDS/LightRAG/issues/2502) - github / github_issue

Source: Project Pack community evidence and pitfall evidence

LightRAG

Overview and System Architecture

Related Pages

Overview and System Architecture

1. Project Purpose and Scope

2. High-Level Architecture

3. System Components

3.1 Python Core (`lightrag/`)

3.2 LightRAG Server (`lightrag-server`)

3.3 React WebUI (`lightrag_webui/`)

3.4 Storage and Infrastructure

3.5 Multimodal Pipeline

4. Supported File Types and Evaluation

5. Community Patterns and Known Limitations

See Also

Core API, Query Pipeline, and Multimodal Processing

Related Pages

Core API, Query Pipeline, and Multimodal Processing

Overview

Query Pipeline and Retrieval Modes

Document Lifecycle and Source Attribution

Multimodal Document Processing

WebUI Surface

Deployment and Evaluation

See Also

Storage Backends, Knowledge Graph, and Document Management

Related Pages

Storage Backends, Knowledge Graph, and Document Management

Overview

Storage Backend Layer

Factory and Selection

Concurrency Control

Document Status Store

Knowledge Graph and Source Attribution

Entity and Relation Storage

Mapping Answers Back to Source Files

Document Management Workflow

Configuration and Operational Notes

See Also

Deployment, WebUI, Evaluation, and Operations

Related Pages

Deployment, WebUI, Evaluation, and Operations

Deployment Topologies

WebUI: Build, Run, and Query Surface

Evaluation with RAGAS

Operations, Tracing, and Integrations

See Also

Doramagic Pitfall Log

Doramagic Pitfall Log

1. Installation risk: Installation risk requires verification

2. Installation risk: Installation risk requires verification

3. Installation risk: Installation risk requires verification

4. Configuration risk: Configuration risk requires verification

5. Configuration risk: Configuration risk requires verification

6. Capability evidence risk: Capability evidence risk requires verification

7. Runtime risk: Runtime risk requires verification

8. Runtime risk: Runtime risk requires verification

9. Installation risk: Installation risk requires verification

10. Installation risk: Installation risk requires verification

11. Installation risk: Installation risk requires verification

12. Configuration risk: Configuration risk requires verification

Community Discussion Evidence

Community Discussion Evidence