ragflow Manual - Doramagic.ai

Doramagic Project Pack · Human Manual

ragflow

RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

Project Overview and System Architecture

Related topics: Core RAG Engine: Parsing, Chunking, Retrieval, and Knowledge, Agent System, Tools, and Workflow Orchestration, Deployment, Configuration, Administration, and Model Integrat...

Section Related Pages

Continue reading this section for the full explanation and source context.

Section 2.1 Web Frontend

Continue reading this section for the full explanation and source context.

Section 2.2 API and Service Tier

Continue reading this section for the full explanation and source context.

Section 2.3 Model Context Protocol (MCP) Server

Continue reading this section for the full explanation and source context.

Project Overview and System Architecture

1. Purpose and Scope

RAGFlow is an open-source Retrieval-Augmented Generation (RAG) engine that fuses RAG with Agent capabilities to create a context layer for large language models. According to the project README, RAGFlow is "powered by a converged context engine and pre-built agent templates" and is intended for "enterprises of any scale" — ranging from individual developers running it locally to large organizations operating multi-tenant deployments. Source: README.md.

The project's stated design goals — paraphrased from the README — include:

Quality in, quality out: Deep document understanding for extracting knowledge from unstructured data, including formats such as Word, slides, Excel, txt, images, scanned copies, structured data, and web pages. Source: README.md.
Template-based chunking that is explainable and configurable per use case.
Grounded citations with visualization of text chunking to support traceable answers and reduce hallucinations.
Heterogeneous data-source compatibility through an extensible connector layer.
Configurable LLMs and embedding models with multiple recall fused with re-ranking.

The community roadmap (tracking issues #4214 and #162) shows the project has progressed through v0.9.0, v0.10.0, v0.21.0, v0.22.0, v0.23.0, v0.24.0, and the v0.25.x line, with active demand for features such as Text2SQL, TTS, Kubernetes deployment (issue #864), Ollama rerank integration (issue #4406), and the Docling parser (issue #3443).

2. Architectural Layers

RAGFlow follows a layered architecture in which the Python API, Go services, and a Vite-based web frontend collaborate through clearly defined interfaces. The diagram below summarizes the high-level data flow from ingestion to retrieval and agent execution.

flowchart LR
    UI["Web Frontend<br/>(Vite + React)"]
    CLI["CLI / Virtual FS<br/>(internal/cli)"]
    MCP["MCP Server<br/>(mcp/server)"]
    PYAPI["Python API<br/>(api/ragflow_server)"]
    GOAPI["Go Service<br/>(cmd/server_main)"]
    ENGINE["Doc Engine<br/>(internal/engine)"]
    DEEPDOC["DeepDoc<br/>(parser + vision)"]
    SANDBOX["Agent Sandbox<br/>(agent/sandbox)"]
    DS["Data Sources<br/>(firecrawl, S3, RSS, etc.)"]
    LLM["LLM / Embedding / Rerank"]

    UI --> PYAPI
    CLI --> PYAPI
    MCP --> GOAPI
    PYAPI --> DEEPDOC
    PYAPI --> ENGINE
    PYAPI --> SANDBOX
    PYAPI --> LLM
    GOAPI --> ENGINE
    DS --> PYAPI
    SANDBOX --> LLM

2.1 Web Frontend

The web UI lives under the web/ directory and is a Vite-based single-page application. Source: web/package.json lists scripts such as dev (Vite dev server), build (production), lint (ESLint), and test (Jest), and depends on @ant-design/icons, @antv/g2, @antv/g6, and form-handling libraries. UI components for building structured JSON schemas — used by the Agent designer — live in web/src/components/jsonjoy-builder/lib/schema-editor.ts, which exports helpers such as createFieldSchema, validateFieldName, and getSchemaProperties. Source: web/src/components/jsonjoy-builder/lib/schema-editor.ts.

2.2 API and Service Tier

The Python Flask API (api/ragflow_server.py) serves as the public HTTP surface for the web UI and external integrations. It delegates heavy lifting — embedding, retrieval, agent execution — to background workers, while delegating document storage and search to the Go-side Doc Engine.

The Go service (cmd/server_main.go) initializes the Doc Engine on startup using the engine.Init(&cfg...) pattern documented in internal/engine/README.md. The Go side exposes retrieval-test, search, and admin RPCs that the Python layer consumes.

2.3 Model Context Protocol (MCP) Server

RAGFlow ships an MCP server at mcp/server/server.py that exposes RAGFlow datasets and retrieval operations as Model Context Protocol tools. The RAGFlowConnector class implements _fetch_datasets_page, list_datasets, resolve_dataset_ids, and a call_tool dispatcher that routes the ragflow_retrieval tool. The retrieval tool accepts dataset_ids, document_ids, question, page, page_size, similarity_threshold, vector_similarity_weight, keyword, top_k, rerank_id, and force_refresh parameters, allowing any MCP-compatible client (e.g., Claude Desktop) to query RAGFlow datasets directly.

3. Key Subsystems

3.1 DeepDoc — Document Understanding

deepdoc/ contains the document parsing pipeline and the vision subsystem. According to deepdoc/README.md, DeepDoc provides OCR, layout recognition (with 10 components — text, title, figure, figure caption, table, table caption, header, footer, reference, equation), and Table Structure Recognition (TSR). The CLI test scripts deepdoc/vision/t_ocr.py and deepdoc/vision/t_recognizer.py accept --inputs and --output_dir arguments so developers can verify model behavior on local PDFs and images.

3.2 Doc Engine — Pluggable Storage and Retrieval

The Doc Engine in internal/engine/README.md abstracts over Elasticsearch and Infinity (an in-house database). The engine is configured via conf/service_conf.yaml under the doc_engine key with sub-keys es (hosts, username, password) and infinity (uri, postgres_port, db_name). The Go package layout separates client.go, search.go, index.go, and document.go per backend so that switching engines only requires changing doc_engine.type.

The schema used by the engine — exposed in tools/es-to-oceanbase-migration/src/es_ob_migration/schema.py — shows the underlying document model with fields such as content_with_weight, content_ltks, content_sm_ltks, important_kwd, question_kwd, tag_kwd, and available_int. This schema documents the fields an embedder/retriever must populate.

3.3 CLI and Virtual Filesystem

The CLI in internal/cli/README.md exposes a unified, path-based interface over RAGFlow REST APIs. Paths include /datasets, /datasets/{name} (documents), and /datasets/{name}/{doc} (document info). The implementation uses a provider pattern (parser/, filesystem/, engine.go, base.go, dataset.go, file.go, utils.go).

A notable subsystem — documented in internal/cli/filesystem/README.md — is the skill management layer. It supports install-skill <space> <source> from local paths, GitHub repos (github.com/owner/repo/path), ClawHub (clawhub://owner/skill-name), and skills.sh (skill://skill-name). The system enforces a defense-in-depth security model: HTTPS source validation, quarantine of downloaded artifacts, regex-based static analysis across 100+ threat patterns in six categories (exfiltration, injection, destructive operations, persistence, network, obfuscation), trust tiers based on source reputation, mandatory --force for high-risk installs, and audit logging. Skills must be ≤ 50 MB total, ≤ 5 MB per file, text-only, with lowercase alphanumeric names.

3.4 Agent Sandbox

The Agent subsystem in agent/sandbox/README.md runs agent code inside isolated containers managed by gVisor. Sandboxed agents execute Python and Node.js workloads via base images sandbox-base-python and sandbox-base-nodejs, orchestrated by sandbox-executor-manager. The README warns that older executor-manager images shipped Docker CLI 24.x, which cannot talk to newer Docker daemons; rebuilding with Docker CLI 29.1.0+ is required.

3.5 Data-Source Connectors

The tools/ directory hosts pluggable connectors. The Firecrawl integration (tools/firecrawl/README.md) implements single-URL scraping, website crawling, batch processing, multiple output formats, rate limiting, and language detection — surfacing as a selectable data source in the RAGFlow UI.

4. Deployment, Operations, and Community Context

4.1 Self-Hosting

Per the README, RAGFlow is deployed via Docker Compose with minimum requirements of 4 CPU cores and ≥ 8 GB RAM (the README line is truncated in this snapshot). The project roadmap and community issue #864 ("How to deploy based on kubernetes?") confirm that Helm/YAML deployment is a long-standing user demand, currently addressed by Docker Compose only.

4.2 Release Cadence and Roadmap

Releases follow a numbered scheme from v0.9.0 through v0.25.x, with a rolling nightly build. Recent milestones include:

Release	Notable change	Source
v0.24.0	Memory APIs/SDK for agents; metadata batch management; ToC renamed to PageIndex; Chat-like Agent management	Community release notes
v0.25.0	Seven ingestion-pipeline templates; new data sources (Seafile, RSS, DingTalk AI Sheet); deletion sync	Community release notes
v0.25.4	Generic RESTful API data-source connector; gpt-5.4-mini/nano support	Community release notes
v0.25.5	Local & SSH providers in admin panel; ~50–100% dataset-search latency reduction	Community release notes
v0.25.6	Browser component for autonomous web navigation; Ψ-RAG (AHC) mode for RAPTOR	Community release notes

4.3 Known Type and API Issues

Community issue #15714 reports a Go-side tenant_rerank_id type mismatch (*string vs. *int) in service.RetrievalTestRequest and SearchBotRetrievalTestRequest, illustrating that the Go ↔ Python API contract remains a focus area for engineering work.

Core RAG Engine: Parsing, Chunking, Retrieval, and Knowledge

Related topics: Project Overview and System Architecture, Agent System, Tools, and Workflow Orchestration, Deployment, Configuration, Administration, and Model Integration

Section Related Pages

Continue reading this section for the full explanation and source context.

Core RAG Engine: Parsing, Chunking, Retrieval, and Knowledge

Overview

The Core RAG Engine is the heart of RAGFlow, an open-source Retrieval-Augmented Generation engine described in README.md. It implements the full pipeline from raw unstructured documents to grounded, citation-backed LLM responses. The engine is split into four collaborating subsystems:

Parsing (DeepDoc) — turns raw bytes (PDF, DOCX, images, slides) into structured layout-aware text plus tables, figures, and equations.
Chunking & Knowledge Structuring — segments parsed content into explainable chunks and indexes them into a multi-field schema.
Retrieval — performs hybrid (vector + keyword) recall with optional rerank across one or more datasets.
Document Engine / Storage — persists chunks, vectors, and metadata in pluggable backends (Elasticsearch or Infinity).

The following diagram illustrates how a query flows from input to grounded answer through these subsystems.

flowchart LR
    A[Unstructured Document] --> B[DeepDoc Parser<br/>OCR / Layout / TSR]
    B --> C[Template-based Chunker]
    C --> D[Doc Engine<br/>Elasticsearch or Infinity]
    D --> E[Hybrid Retrieval<br/>vector + keyword]
    E --> F[Rerank Model]
    F --> G[LLM with Citations]

Source: README.md:1-50, deepdoc/README.md:1-60, internal/engine/README.md:1-50

Source: https://github.com/infiniflow/ragflow / Human Manual

Agent System, Tools, and Workflow Orchestration

Related topics: Project Overview and System Architecture, Core RAG Engine: Parsing, Chunking, Retrieval, and Knowledge, Deployment, Configuration, Administration, and Model Integration

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Component Types

Continue reading this section for the full explanation and source context.

Section Workflow Execution Flow

Continue reading this section for the full explanation and source context.

Section Skill Sources

Continue reading this section for the full explanation and source context.

Agent System, Tools, and Workflow Orchestration

Overview and Purpose

RAGFlow's agent system fuses retrieval-augmented generation with agentic capabilities to deliver a configurable context layer for LLM applications. The runtime is assembled from modular components that can be composed into workflows for both personal and enterprise deployments (Source: README.md). Pre-built agent templates and a converged context engine allow developers to transform complex data into production-ready AI systems with high efficiency (Source: README.md).

Recent releases have progressively expanded the agent surface area:

Memory for AI agents (added 2025-12-26)
Agentic workflow and MCP integration (added 2025-08-01)
Python/JavaScript code executor component (added 2025-05-23)
Browser component for autonomous web navigation (added in v0.25.6, May 2026)
Chat-like Agent conversation management (v0.24.0)

Component-Based Architecture

The agent runtime follows a component-based design in which each node in a workflow is implemented as a self-contained class inheriting from a common base. The canvas is responsible for assembling components, routing data between them, and orchestrating execution order (Source: agent/canvas.py). All components share a unified interface defined in the base class, covering parameter validation, execution lifecycle, and canvas serialization (Source: agent/component/base.py).

Core Component Types

Begin — Defines initial input and conversation start parameters; the entry point of every workflow (Source: agent/component/begin.py).
LLM — Performs language model inference with configurable prompts, model parameters, and tool bindings (Source: agent/component/llm.py).
Retrieval — Performs RAG retrieval against datasets and documents; the implementation bridges to the shared tools/retrieval.py logic (Source: agent/component/retrieval).
Code Exec — Executes Python or JavaScript snippets in a sandboxed environment to support computational reasoning (Source: tools/code_exec.py).
Base — Foundation class providing the common contract (input/output schema, invoke lifecycle, canvas representation) that all other components extend (Source: agent/component/base.py).

Workflow Execution Flow

flowchart LR
    A[User Input] --> B[Begin Component]
    B --> C{Route / Branch}
    C -->|Retrieval needed| D[Retrieval Component]
    C -->|Compute needed| E[Code Exec Component]
    C -->|Reasoning needed| F[LLM Component]
    D --> F
    E --> F
    F --> G[Output / Tool Call]
    G -.iterates.-> C

MCP (Model Context Protocol) Integration

RAGFlow exposes its retrieval layer as MCP tools so that external agent clients can invoke retrieval against managed datasets. The MCP server registers a ragflow_retrieval tool that accepts document_ids, dataset_ids, question, similarity_threshold, vector_similarity_weight, keyword, top_k, rerank_id, force_refresh, and pagination parameters (page, page_size) (Source: mcp/server/server.py:).

The server runs as a Starlette ASGI application in either HOST mode or standalone mode, gated by an AuthMiddleware that validates the API key on every request (Source: mcp/server/server.py:). It fetches accessible datasets via the /datasets REST endpoint and paginates through all results when resolving the full set of dataset IDs for MCP retrieval fallback (Source: mcp/server/server.py:).

Tools, Skills, and Memory Management

Beyond built-in components, RAGFlow supports a pluggable skills and memory system exposed through the CLI filesystem (Source: internal/cli/filesystem/README.md:). The CLI parses commands using a recursive descent parser (parser/parser.go) over a lexer, and routes them to a virtual filesystem backed by providers (dataset.go, file.go) that wrap RAGFlow's RESTful APIs (Source: internal/cli/README.md:).

Skill Sources

The install-skill command accepts skills from local paths, GitHub URLs, ClawHub references, or skills.sh identifiers, then validates and stores them in an isolated space (Source: internal/cli/filesystem/README.md:).

Security Validation

The skill manager applies defense-in-depth checks before installation:

HTTPS source validation with SSL certificate verification
Quarantine isolation of downloaded skills prior to install
Static analysis scanning 100+ threat patterns across six categories: Exfiltration, Injection, Destructive, Persistence, Network, and Obfuscation
Trust tiers based on source reputation
Explicit --force user confirmation for high-risk installs
Audit logging of every installation with its scan results (Source: internal/cli/filesystem/README.md:)

Memory System

Memory is organized hierarchically into category folders (e.g., memory/categories/category1, category2) and per-agent memory files for tool and skill usage patterns, supporting retrieval augmentation across long-lived agent sessions (Source: internal/cli/filesystem/README.md:).

Common Failure Modes and Community Notes

TenantRerankID type mismatch in Go SDK: service.RetrievalTestRequest.TenantRerankID and SearchBotRetrievalTestRequest.TenantRerankID are declared as *string but are consumed as *int in some retrieval-test code paths, which can surface as runtime errors when invoking the retrieval benchmark (Source: issue #15714).
Empty memory object on startup: The RAGFlow server previously failed to start when an empty memory object existed; this was fixed in v0.23.1.
Memory extraction stability: When all memory types are selected simultaneously, extraction stability was hardened in v0.23.1.
Browser component: Newly added in v0.25.6, the Browser component enables autonomous web navigation; expect evolving behavior and config knobs (Source: README.md).
Kubernetes deployment: Helm charts / raw Kubernetes manifests are not first-class supported; production deployment remains primarily via docker-compose.

Deployment, Configuration, Administration, and Model Integration

Related topics: Project Overview and System Architecture, Core RAG Engine: Parsing, Chunking, Retrieval, and Knowledge, Agent System, Tools, and Workflow Orchestration

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Prerequisites and Runtime Stack

Continue reading this section for the full explanation and source context.

Section Compose Topology

Continue reading this section for the full explanation and source context.

Section Kubernetes and Cloud

Continue reading this section for the full explanation and source context.

Deployment, Configuration, Administration, and Model Integration

RAGFlow is an open-source Retrieval-Augmented Generation (RAG) engine that fuses RAG with Agent capabilities. Operating the system at production scale requires mastering four interrelated concerns: deploying the runtime stack, configuring infrastructure and models, administering tenants and resources, and integrating third-party model providers. This page covers all four areas, drawing on the project's official deployment manifests, engine abstractions, and operator interfaces. Source: README.md

Deployment

Prerequisites and Runtime Stack

The official deployment path is Docker Compose. The system requires at minimum 4 CPU cores, 16 GB RAM, 50 GB disk, Docker >= 24.0.0 with Docker Compose >= v2.26.1, Python >= 3.13, and (optionally) gVisor when the Agent's code executor sandbox is enabled. Source: README.md

Before starting, the host kernel parameter vm.max_map_count must be set to at least 262144 (Elasticsearch requirement). The README documents how to check and set it via sysctl -w vm.max_map_count=262144. Source: README.md

Compose Topology

Two compose files are maintained:

docker/docker-compose.yml brings up the RAGFlow application service on top of a dependency stack.
docker/docker-compose-base.yml provides the dependencies: Elasticsearch (or Infinity), MySQL, MinIO, and Redis.

A legacy docker-compose-CN-oc9.yml and a docker-compose-macos.yml exist but are not actively maintained. Source: docker/README.md

The high-level deployment topology is:

flowchart LR
  User[User / MCP Client] --> Web[Web Frontend<br/>Vite + React]
  User --> API[RAGFlow API Server<br/>Python + Go]
  API --> MySQL[(MySQL)]
  API --> Redis[(Redis)]
  API --> MinIO[(MinIO)]
  API --> Engine{Doc Engine}
  Engine -->|type=elasticsearch| ES[(Elasticsearch)]
  Engine -->|type=infinity| INF[(Infinity)]
  Web -.->|serves| User

Kubernetes and Cloud

A community-requested Kubernetes deployment path (Helm charts or raw manifests) is tracked in issue #864. As of the most recent releases, official Helm support is not yet shipped; the supported path remains Docker Compose on a single host, optionally scaled by externalizing the dependency services. Source: docker/README.md

Configuration

Docker Environment Variables

The docker/.env file is the primary configuration surface for the container stack. The following variables are documented:

Variable	Default	Purpose
`STACK_VERSION`	`8.11.3`	Elasticsearch image version
`ES_PORT`	`1200`	Host port exposed for Elasticsearch
`ELASTIC_PASSWORD`	—	Elasticsearch bootstrap password
`KIBANA_PORT`	—	Host port for the Kibana UI

Source: docker/README.md

Service Configuration

docker/service_conf.yaml.template is rendered at startup and configures the RAGFlow service. The internal Go engine selects a document store via a doc_engine.type key. Two backend values are supported:

elasticsearch — fully implemented, configured with doc_engine.es.hosts, username, password.
infinity — a placeholder backend waiting for the official Infinity Go SDK; configuration keys include uri, postgres_port, db_name.

Source: internal/engine/README.md

Engine selection happens once at process startup. The Go factory in internal/engine/engine_factory.go returns a DocEngine interface implementation that the rest of the service consumes uniformly for indexing, search, and document operations. Source: internal/engine/README.md

Parsers and OCR

The deepdoc/README.md introduces *Deep*Doc, RAGFlow's vision and parser subsystem. The vision pipeline provides OCR, layout recognition (10 base layout components: Text, Title, Figure, Figure caption, Table, Table caption, Header, Footer, Reference, Equation), and Table Structure Recognition (TSR). Operators can smoke-test these on local files using python deepdoc/vision/t_ocr.py and python deepdoc/vision/t_recognizer.py. Source: deepdoc/README.md

Administration

Admin Panel

Release v0.25.5 introduced local and SSH providers in the admin panel (PR #15039), allowing administrators to manage users, datasets, and storage backends directly from the web console. Source: GitHub release v0.25.5

CLI and Virtual Filesystem

The Go CLI under internal/cli exposes a virtual filesystem layered over RAGFlow's RESTful APIs. The design follows three principles: (1) no server-side changes, (2) a provider pattern with a common Provider interface in filesystem/base.go, and (3) unified commands (ls, search, cat, mkdir) over virtual paths. Supported paths include /datasets, /datasets/{name} (lists documents), and /datasets/{name}/{doc} (fetches a single document). Source: internal/cli/README.md

Skill Management

The CLI's install-skill command supports four source types: local paths, github.com/owner/repo/path, clawhub://owner/skill-name (ClawHub), and skill://skill-name (skills.sh). A defense-in-depth security architecture validates sources over HTTPS with SSL verification, quarantines downloads, runs regex-based static analysis against 100+ threat patterns (exfiltration, injection, destructive operations, persistence, network, obfuscation), and applies trust tiers. Limits: total skill size <= 50 MB, individual file <= 5 MB, text files only, lowercase alphanumeric names with hyphens/underscores. Source: internal/cli/filesystem/README.md

Frontend Build

The web UI is a Vite + React project. Build and development entry points live in web/package.json: npm run dev starts the dev server, npm run build produces a production bundle, and npm run type-check validates TypeScript. The schema editor component in web/src/components/jsonjoy-builder/lib/schema-editor.ts enforces JSONSchema field-name validation against the pattern ^[a-zA-Z_$][a-zA-Z0-9_$]*$. Source: web/package.json

Model Integration

LLM and Embedding Providers

RAGFlow ships with a model registry supporting OpenAI-compatible APIs. Release v0.25.4 added gpt-5.4-mini and gpt-5.4-nano to the OpenAI model list, and release v0.25.6 extended the Agent with a new Browser component that lets models navigate and interact with web pages autonomously (PR #14888). DeepSeek v4 support was added on 2026-04-24, and Gemini 3 Pro support on 2025-11-19. Source: README.md and GitHub release v0.25.4

Rerank and Retrieval

A community feature request to add Ollama rerank integration is tracked in issue #4406. Rerank models are referenced by ID through the rerank_id parameter on retrieval calls. The MCP server's ragflow_retrieval tool accepts rerank_id, similarity_threshold, vector_similarity_weight, keyword, top_k, and force_refresh arguments that all flow into the unified retrieval service. Source: mcp/server/server.py

Release v0.25.5 accelerated the dataset search path, reducing latency by 50–100% by removing an expensive vector fetch and rerank similarity computation from the hot path (PR #14970). Source: GitHub release v0.25.5

MCP Server

The mcp/server/server.py exposes RAGFlow as a Model Context Protocol server. Two tool entry points are registered: list_datasets (paginates /datasets and returns newline-delimited JSON) and ragflow_retrieval (performs cross-dataset retrieval with the parameters above). When MODE == HOST, the server installs an AuthMiddleware to enforce API key authentication. Source: mcp/server/server.py

Document Parsers

The v0.25.0 release added 7 built-in ingestion pipeline templates aligned with RAGFlow's native parsers, and v0.25.1 added the OpenDataLoader PDF backend. A community request to integrate Docling as an additional parser is tracked in issue #3443. For users migrating from Elasticsearch to OceanBase, the schema mapping in tools/es-to-oceanbase-migration/src/es_ob_migration/schema.py documents how chunk fields (content, tokens, keywords, tags, PageRank) translate to OceanBase column types. Source: GitHub release v0.25.1

Common Failure Modes

vm.max_map_count too low — Elasticsearch container fails to start. Mitigate with sudo sysctl -w vm.max_map_count=262144. Source: README.md
Infinity backend selected without SDK — the Infinity implementation is a placeholder; only Elasticsearch is fully functional. Source: internal/engine/README.md
Type mismatch on rerank IDs — the Go service.RetrievalTestRequest.TenantRerankID field has a known *string vs *int mismatch with retrieval tests (issue #15714). Source: issue #15714
Skill installation blocked — over-size archives, binary files, or suspicious patterns are rejected by the static analyzer. Source: internal/cli/filesystem/README.md

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

high Configuration risk requires verification

May increase setup, validation, or first-run risk for the user.

high Maintenance risk requires verification

May increase setup, validation, or first-run risk for the user.

high Security or permission risk requires verification

Developers may expose sensitive permissions or credentials: [Question]: What is the purpose of the 'kb_ids' parameter in the API?

Doramagic Pitfall Log

Found 32 structured pitfall item(s), including 8 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

1. Installation risk: Installation risk requires verification

Severity: high
Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/infiniflow/ragflow/issues/16008

2. Configuration risk: Configuration risk requires verification

Severity: high
Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/infiniflow/ragflow/issues/15714

3. Maintenance risk: Maintenance risk requires verification

Severity: high
Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/infiniflow/ragflow/issues/16170

4. Security or permission risk: Security or permission risk requires verification

Severity: high
Finding: Developers should check this security_permissions risk before relying on the project: [Question]: What is the purpose of the 'kb_ids' parameter in the API?
User impact: Developers may expose sensitive permissions or credentials: [Question]: What is the purpose of the 'kb_ids' parameter in the API?
Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: [Question]: What is the purpose of the 'kb_ids' parameter in the API?. Context: Source discussion did not expose a precise runtime context.
Evidence: failure_mode_cluster:github_issue | https://github.com/infiniflow/ragflow/issues/9099

5. Security or permission risk: Security or permission risk requires verification

Severity: high
Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/infiniflow/ragflow/issues/16208

6. Security or permission risk: Security or permission risk requires verification

Severity: high
Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/infiniflow/ragflow/issues/16126

7. Security or permission risk: Security or permission risk requires verification

Severity: high
Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/infiniflow/ragflow/issues/15525

8. Security or permission risk: Security or permission risk requires verification

Severity: high
Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/infiniflow/ragflow/issues/9099

9. Installation risk: Installation risk requires verification

Severity: medium
Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/infiniflow/ragflow/issues/15751

10. Installation risk: Installation risk requires verification

Severity: medium
Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/infiniflow/ragflow/issues/16205

11. Configuration risk: Configuration risk requires verification

Severity: medium
Finding: Developers should check this configuration risk before relying on the project: [Feature Request]:Support Custom Model Headers and Generation Parameters
User impact: Developers may misconfigure credentials, environment, or host setup: [Feature Request]:Support Custom Model Headers and Generation Parameters
Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: [Feature Request]:Support Custom Model Headers and Generation Parameters. Context: Source discussion did not expose a precise runtime context.
Evidence: failure_mode_cluster:github_issue | https://github.com/infiniflow/ragflow/issues/15981

12. Configuration risk: Configuration risk requires verification

Severity: medium
Finding: Developers should check this configuration risk before relying on the project: [Question]: Title Chunker Failure after upgrading to v0.25.6
User impact: Developers may misconfigure credentials, environment, or host setup: [Question]: Title Chunker Failure after upgrading to v0.25.6
Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: [Question]: Title Chunker Failure after upgrading to v0.25.6. Context: Observed when using python, docker
Evidence: failure_mode_cluster:github_issue | https://github.com/infiniflow/ragflow/issues/15525

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using ragflow with real data or production workflows.

[[Question]: Question about the hard limit of 10000 in doc_metadata_servi](https://github.com/infiniflow/ragflow/issues/16170) - github / github_issue
Tables missing after parsing laws rule DOCX document - github / github_issue
[[Bug]: /api/v1/dify/retrieval resets the connection (502 / "Reached max](https://github.com/infiniflow/ragflow/issues/16208) - github / github_issue
[[Feature Request]: Enable folder-level configuration in SharePoint conne](https://github.com/infiniflow/ragflow/issues/16206) - github / github_issue
GraphRAG set_graph() extremely slow due to per-entity/relation embedding - github / github_issue
[[Bug]: vllm instance already exist, while trying to add models with diff](https://github.com/infiniflow/ragflow/issues/16126) - github / github_issue
[[Question]: What is the purpose of the 'kb_ids' parameter in the API?](https://github.com/infiniflow/ragflow/issues/9099) - github / github_issue
[[Feature Request]:Support Custom Model Headers and Generation Parameters](https://github.com/infiniflow/ragflow/issues/15981) - github / github_issue
[[Question]: Why are addresses which resolve to internal IPs considered i](https://github.com/infiniflow/ragflow/issues/8230) - github / github_issue
[[Question]: MaxConnectionsExceeded('Exceeded maximum connections.')](https://github.com/infiniflow/ragflow/issues/410) - github / github_issue
[[Question]: Title Chunker Failure after upgrading to v0.25.6](https://github.com/infiniflow/ragflow/issues/15525) - github / github_issue
Go test files not compiled in CI — missing import undetected - github / github_issue

Source: Project Pack community evidence and pitfall evidence

ragflow

Project Overview and System Architecture

Related Pages

Project Overview and System Architecture

1. Purpose and Scope

2. Architectural Layers

2.1 Web Frontend

2.2 API and Service Tier

2.3 Model Context Protocol (MCP) Server

3. Key Subsystems

3.1 DeepDoc — Document Understanding

3.2 Doc Engine — Pluggable Storage and Retrieval

3.3 CLI and Virtual Filesystem

3.4 Agent Sandbox

3.5 Data-Source Connectors

4. Deployment, Operations, and Community Context

4.1 Self-Hosting

4.2 Release Cadence and Roadmap

4.3 Known Type and API Issues

See Also

Core RAG Engine: Parsing, Chunking, Retrieval, and Knowledge

Related Pages

Core RAG Engine: Parsing, Chunking, Retrieval, and Knowledge

Overview

Agent System, Tools, and Workflow Orchestration

Related Pages

Agent System, Tools, and Workflow Orchestration

Overview and Purpose

Component-Based Architecture

Core Component Types

Workflow Execution Flow

MCP (Model Context Protocol) Integration

Tools, Skills, and Memory Management

Skill Sources

Security Validation

Memory System

Common Failure Modes and Community Notes

See Also

Deployment, Configuration, Administration, and Model Integration

Related Pages

Deployment, Configuration, Administration, and Model Integration

Deployment

Prerequisites and Runtime Stack

Compose Topology

Kubernetes and Cloud

Configuration

Docker Environment Variables

Service Configuration

Parsers and OCR

Administration

Admin Panel

CLI and Virtual Filesystem

Skill Management

Frontend Build

Model Integration

LLM and Embedding Providers

Rerank and Retrieval

MCP Server

Document Parsers

Common Failure Modes

See Also

Doramagic Pitfall Log

Doramagic Pitfall Log

1. Installation risk: Installation risk requires verification

2. Configuration risk: Configuration risk requires verification

3. Maintenance risk: Maintenance risk requires verification

4. Security or permission risk: Security or permission risk requires verification

5. Security or permission risk: Security or permission risk requires verification

6. Security or permission risk: Security or permission risk requires verification

7. Security or permission risk: Security or permission risk requires verification

8. Security or permission risk: Security or permission risk requires verification

9. Installation risk: Installation risk requires verification

10. Installation risk: Installation risk requires verification

11. Configuration risk: Configuration risk requires verification

12. Configuration risk: Configuration risk requires verification

Community Discussion Evidence

Community Discussion Evidence