Doramagic Project Pack · Human Manual
ragflow
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
Project Overview and System Architecture
Related topics: Core RAG Engine: Parsing, Chunking, Retrieval, and Knowledge, Agent System, Tools, and Workflow Orchestration, Deployment, Configuration, Administration, and Model Integrat...
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Core RAG Engine: Parsing, Chunking, Retrieval, and Knowledge, Agent System, Tools, and Workflow Orchestration, Deployment, Configuration, Administration, and Model Integration
Project Overview and System Architecture
1. Purpose and Scope
RAGFlow is an open-source Retrieval-Augmented Generation (RAG) engine that fuses RAG with Agent capabilities to create a context layer for large language models. According to the project README, RAGFlow is "powered by a converged context engine and pre-built agent templates" and is intended for "enterprises of any scale" — ranging from individual developers running it locally to large organizations operating multi-tenant deployments. Source: README.md.
The project's stated design goals — paraphrased from the README — include:
- Quality in, quality out: Deep document understanding for extracting knowledge from unstructured data, including formats such as Word, slides, Excel, txt, images, scanned copies, structured data, and web pages. Source: README.md.
- Template-based chunking that is explainable and configurable per use case.
- Grounded citations with visualization of text chunking to support traceable answers and reduce hallucinations.
- Heterogeneous data-source compatibility through an extensible connector layer.
- Configurable LLMs and embedding models with multiple recall fused with re-ranking.
The community roadmap (tracking issues #4214 and #162) shows the project has progressed through v0.9.0, v0.10.0, v0.21.0, v0.22.0, v0.23.0, v0.24.0, and the v0.25.x line, with active demand for features such as Text2SQL, TTS, Kubernetes deployment (issue #864), Ollama rerank integration (issue #4406), and the Docling parser (issue #3443).
2. Architectural Layers
RAGFlow follows a layered architecture in which the Python API, Go services, and a Vite-based web frontend collaborate through clearly defined interfaces. The diagram below summarizes the high-level data flow from ingestion to retrieval and agent execution.
flowchart LR
UI["Web Frontend<br/>(Vite + React)"]
CLI["CLI / Virtual FS<br/>(internal/cli)"]
MCP["MCP Server<br/>(mcp/server)"]
PYAPI["Python API<br/>(api/ragflow_server)"]
GOAPI["Go Service<br/>(cmd/server_main)"]
ENGINE["Doc Engine<br/>(internal/engine)"]
DEEPDOC["DeepDoc<br/>(parser + vision)"]
SANDBOX["Agent Sandbox<br/>(agent/sandbox)"]
DS["Data Sources<br/>(firecrawl, S3, RSS, etc.)"]
LLM["LLM / Embedding / Rerank"]
UI --> PYAPI
CLI --> PYAPI
MCP --> GOAPI
PYAPI --> DEEPDOC
PYAPI --> ENGINE
PYAPI --> SANDBOX
PYAPI --> LLM
GOAPI --> ENGINE
DS --> PYAPI
SANDBOX --> LLM2.1 Web Frontend
The web UI lives under the web/ directory and is a Vite-based single-page application. Source: web/package.json lists scripts such as dev (Vite dev server), build (production), lint (ESLint), and test (Jest), and depends on @ant-design/icons, @antv/g2, @antv/g6, and form-handling libraries. UI components for building structured JSON schemas — used by the Agent designer — live in web/src/components/jsonjoy-builder/lib/schema-editor.ts, which exports helpers such as createFieldSchema, validateFieldName, and getSchemaProperties. Source: web/src/components/jsonjoy-builder/lib/schema-editor.ts.
2.2 API and Service Tier
The Python Flask API (api/ragflow_server.py) serves as the public HTTP surface for the web UI and external integrations. It delegates heavy lifting — embedding, retrieval, agent execution — to background workers, while delegating document storage and search to the Go-side Doc Engine.
The Go service (cmd/server_main.go) initializes the Doc Engine on startup using the engine.Init(&cfg...) pattern documented in internal/engine/README.md. The Go side exposes retrieval-test, search, and admin RPCs that the Python layer consumes.
2.3 Model Context Protocol (MCP) Server
RAGFlow ships an MCP server at mcp/server/server.py that exposes RAGFlow datasets and retrieval operations as Model Context Protocol tools. The RAGFlowConnector class implements _fetch_datasets_page, list_datasets, resolve_dataset_ids, and a call_tool dispatcher that routes the ragflow_retrieval tool. The retrieval tool accepts dataset_ids, document_ids, question, page, page_size, similarity_threshold, vector_similarity_weight, keyword, top_k, rerank_id, and force_refresh parameters, allowing any MCP-compatible client (e.g., Claude Desktop) to query RAGFlow datasets directly.
3. Key Subsystems
3.1 DeepDoc — Document Understanding
deepdoc/ contains the document parsing pipeline and the vision subsystem. According to deepdoc/README.md, DeepDoc provides OCR, layout recognition (with 10 components — text, title, figure, figure caption, table, table caption, header, footer, reference, equation), and Table Structure Recognition (TSR). The CLI test scripts deepdoc/vision/t_ocr.py and deepdoc/vision/t_recognizer.py accept --inputs and --output_dir arguments so developers can verify model behavior on local PDFs and images.
3.2 Doc Engine — Pluggable Storage and Retrieval
The Doc Engine in internal/engine/README.md abstracts over Elasticsearch and Infinity (an in-house database). The engine is configured via conf/service_conf.yaml under the doc_engine key with sub-keys es (hosts, username, password) and infinity (uri, postgres_port, db_name). The Go package layout separates client.go, search.go, index.go, and document.go per backend so that switching engines only requires changing doc_engine.type.
The schema used by the engine — exposed in tools/es-to-oceanbase-migration/src/es_ob_migration/schema.py — shows the underlying document model with fields such as content_with_weight, content_ltks, content_sm_ltks, important_kwd, question_kwd, tag_kwd, and available_int. This schema documents the fields an embedder/retriever must populate.
3.3 CLI and Virtual Filesystem
The CLI in internal/cli/README.md exposes a unified, path-based interface over RAGFlow REST APIs. Paths include /datasets, /datasets/{name} (documents), and /datasets/{name}/{doc} (document info). The implementation uses a provider pattern (parser/, filesystem/, engine.go, base.go, dataset.go, file.go, utils.go).
A notable subsystem — documented in internal/cli/filesystem/README.md — is the skill management layer. It supports install-skill <space> <source> from local paths, GitHub repos (github.com/owner/repo/path), ClawHub (clawhub://owner/skill-name), and skills.sh (skill://skill-name). The system enforces a defense-in-depth security model: HTTPS source validation, quarantine of downloaded artifacts, regex-based static analysis across 100+ threat patterns in six categories (exfiltration, injection, destructive operations, persistence, network, obfuscation), trust tiers based on source reputation, mandatory --force for high-risk installs, and audit logging. Skills must be ≤ 50 MB total, ≤ 5 MB per file, text-only, with lowercase alphanumeric names.
3.4 Agent Sandbox
The Agent subsystem in agent/sandbox/README.md runs agent code inside isolated containers managed by gVisor. Sandboxed agents execute Python and Node.js workloads via base images sandbox-base-python and sandbox-base-nodejs, orchestrated by sandbox-executor-manager. The README warns that older executor-manager images shipped Docker CLI 24.x, which cannot talk to newer Docker daemons; rebuilding with Docker CLI 29.1.0+ is required.
3.5 Data-Source Connectors
The tools/ directory hosts pluggable connectors. The Firecrawl integration (tools/firecrawl/README.md) implements single-URL scraping, website crawling, batch processing, multiple output formats, rate limiting, and language detection — surfacing as a selectable data source in the RAGFlow UI.
4. Deployment, Operations, and Community Context
4.1 Self-Hosting
Per the README, RAGFlow is deployed via Docker Compose with minimum requirements of 4 CPU cores and ≥ 8 GB RAM (the README line is truncated in this snapshot). The project roadmap and community issue #864 ("How to deploy based on kubernetes?") confirm that Helm/YAML deployment is a long-standing user demand, currently addressed by Docker Compose only.
4.2 Release Cadence and Roadmap
Releases follow a numbered scheme from v0.9.0 through v0.25.x, with a rolling nightly build. Recent milestones include:
| Release | Notable change | Source |
|---|---|---|
| v0.24.0 | Memory APIs/SDK for agents; metadata batch management; ToC renamed to PageIndex; Chat-like Agent management | Community release notes |
| v0.25.0 | Seven ingestion-pipeline templates; new data sources (Seafile, RSS, DingTalk AI Sheet); deletion sync | Community release notes |
| v0.25.4 | Generic RESTful API data-source connector; gpt-5.4-mini/nano support | Community release notes |
| v0.25.5 | Local & SSH providers in admin panel; ~50–100% dataset-search latency reduction | Community release notes |
| v0.25.6 | Browser component for autonomous web navigation; Ψ-RAG (AHC) mode for RAPTOR | Community release notes |
4.3 Known Type and API Issues
Community issue #15714 reports a Go-side tenant_rerank_id type mismatch (*string vs. *int) in service.RetrievalTestRequest and SearchBotRetrievalTestRequest, illustrating that the Go ↔ Python API contract remains a focus area for engineering work.
See Also
- DeepDoc — Document Understanding
- Doc Engine — Storage and Retrieval
- Agent Sandbox — Secure Execution
- MCP Server — Tool Integration
- Data Sources and Connectors
Source: https://github.com/infiniflow/ragflow / Human Manual
Core RAG Engine: Parsing, Chunking, Retrieval, and Knowledge
Related topics: Project Overview and System Architecture, Agent System, Tools, and Workflow Orchestration, Deployment, Configuration, Administration, and Model Integration
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Project Overview and System Architecture, Agent System, Tools, and Workflow Orchestration, Deployment, Configuration, Administration, and Model Integration
Core RAG Engine: Parsing, Chunking, Retrieval, and Knowledge
Overview
The Core RAG Engine is the heart of RAGFlow, an open-source Retrieval-Augmented Generation engine described in README.md. It implements the full pipeline from raw unstructured documents to grounded, citation-backed LLM responses. The engine is split into four collaborating subsystems:
- Parsing (DeepDoc) — turns raw bytes (PDF, DOCX, images, slides) into structured layout-aware text plus tables, figures, and equations.
- Chunking & Knowledge Structuring — segments parsed content into explainable chunks and indexes them into a multi-field schema.
- Retrieval — performs hybrid (vector + keyword) recall with optional rerank across one or more datasets.
- Document Engine / Storage — persists chunks, vectors, and metadata in pluggable backends (Elasticsearch or Infinity).
The following diagram illustrates how a query flows from input to grounded answer through these subsystems.
flowchart LR
A[Unstructured Document] --> B[DeepDoc Parser<br/>OCR / Layout / TSR]
B --> C[Template-based Chunker]
C --> D[Doc Engine<br/>Elasticsearch or Infinity]
D --> E[Hybrid Retrieval<br/>vector + keyword]
E --> F[Rerank Model]
F --> G[LLM with Citations]Source: README.md:1-50, deepdoc/README.md:1-60, internal/engine/README.md:1-50
Source: https://github.com/infiniflow/ragflow / Human Manual
Agent System, Tools, and Workflow Orchestration
Related topics: Project Overview and System Architecture, Core RAG Engine: Parsing, Chunking, Retrieval, and Knowledge, Deployment, Configuration, Administration, and Model Integration
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Project Overview and System Architecture, Core RAG Engine: Parsing, Chunking, Retrieval, and Knowledge, Deployment, Configuration, Administration, and Model Integration
Agent System, Tools, and Workflow Orchestration
Overview and Purpose
RAGFlow's agent system fuses retrieval-augmented generation with agentic capabilities to deliver a configurable context layer for LLM applications. The runtime is assembled from modular components that can be composed into workflows for both personal and enterprise deployments (Source: README.md). Pre-built agent templates and a converged context engine allow developers to transform complex data into production-ready AI systems with high efficiency (Source: README.md).
Recent releases have progressively expanded the agent surface area:
- Memory for AI agents (added 2025-12-26)
- Agentic workflow and MCP integration (added 2025-08-01)
- Python/JavaScript code executor component (added 2025-05-23)
- Browser component for autonomous web navigation (added in v0.25.6, May 2026)
- Chat-like Agent conversation management (v0.24.0)
Component-Based Architecture
The agent runtime follows a component-based design in which each node in a workflow is implemented as a self-contained class inheriting from a common base. The canvas is responsible for assembling components, routing data between them, and orchestrating execution order (Source: agent/canvas.py). All components share a unified interface defined in the base class, covering parameter validation, execution lifecycle, and canvas serialization (Source: agent/component/base.py).
Core Component Types
Begin— Defines initial input and conversation start parameters; the entry point of every workflow (Source: agent/component/begin.py).LLM— Performs language model inference with configurable prompts, model parameters, and tool bindings (Source: agent/component/llm.py).Retrieval— Performs RAG retrieval against datasets and documents; the implementation bridges to the sharedtools/retrieval.pylogic (Source: agent/component/retrieval).Code Exec— Executes Python or JavaScript snippets in a sandboxed environment to support computational reasoning (Source: tools/code_exec.py).Base— Foundation class providing the common contract (input/output schema,invokelifecycle, canvas representation) that all other components extend (Source: agent/component/base.py).
Workflow Execution Flow
flowchart LR
A[User Input] --> B[Begin Component]
B --> C{Route / Branch}
C -->|Retrieval needed| D[Retrieval Component]
C -->|Compute needed| E[Code Exec Component]
C -->|Reasoning needed| F[LLM Component]
D --> F
E --> F
F --> G[Output / Tool Call]
G -.iterates.-> CMCP (Model Context Protocol) Integration
RAGFlow exposes its retrieval layer as MCP tools so that external agent clients can invoke retrieval against managed datasets. The MCP server registers a ragflow_retrieval tool that accepts document_ids, dataset_ids, question, similarity_threshold, vector_similarity_weight, keyword, top_k, rerank_id, force_refresh, and pagination parameters (page, page_size) (Source: mcp/server/server.py:).
The server runs as a Starlette ASGI application in either HOST mode or standalone mode, gated by an AuthMiddleware that validates the API key on every request (Source: mcp/server/server.py:). It fetches accessible datasets via the /datasets REST endpoint and paginates through all results when resolving the full set of dataset IDs for MCP retrieval fallback (Source: mcp/server/server.py:).
Tools, Skills, and Memory Management
Beyond built-in components, RAGFlow supports a pluggable skills and memory system exposed through the CLI filesystem (Source: internal/cli/filesystem/README.md:). The CLI parses commands using a recursive descent parser (parser/parser.go) over a lexer, and routes them to a virtual filesystem backed by providers (dataset.go, file.go) that wrap RAGFlow's RESTful APIs (Source: internal/cli/README.md:).
Skill Sources
The install-skill command accepts skills from local paths, GitHub URLs, ClawHub references, or skills.sh identifiers, then validates and stores them in an isolated space (Source: internal/cli/filesystem/README.md:).
Security Validation
The skill manager applies defense-in-depth checks before installation:
- HTTPS source validation with SSL certificate verification
- Quarantine isolation of downloaded skills prior to install
- Static analysis scanning 100+ threat patterns across six categories: Exfiltration, Injection, Destructive, Persistence, Network, and Obfuscation
- Trust tiers based on source reputation
- Explicit
--forceuser confirmation for high-risk installs - Audit logging of every installation with its scan results (Source: internal/cli/filesystem/README.md:)
Memory System
Memory is organized hierarchically into category folders (e.g., memory/categories/category1, category2) and per-agent memory files for tool and skill usage patterns, supporting retrieval augmentation across long-lived agent sessions (Source: internal/cli/filesystem/README.md:).
Common Failure Modes and Community Notes
- TenantRerankID type mismatch in Go SDK:
service.RetrievalTestRequest.TenantRerankIDandSearchBotRetrievalTestRequest.TenantRerankIDare declared as*stringbut are consumed as*intin some retrieval-test code paths, which can surface as runtime errors when invoking the retrieval benchmark (Source: issue #15714). - Empty memory object on startup: The RAGFlow server previously failed to start when an empty memory object existed; this was fixed in v0.23.1.
- Memory extraction stability: When all memory types are selected simultaneously, extraction stability was hardened in v0.23.1.
- Browser component: Newly added in v0.25.6, the Browser component enables autonomous web navigation; expect evolving behavior and config knobs (Source: README.md).
- Kubernetes deployment: Helm charts / raw Kubernetes manifests are not first-class supported; production deployment remains primarily via
docker-compose.
See Also
- Project README
- MCP Server Source
- Admin CLI Documentation
- Internal CLI Filesystem
- DeepDoc Module
- Firecrawl Integration
Source: https://github.com/infiniflow/ragflow / Human Manual
Deployment, Configuration, Administration, and Model Integration
Related topics: Project Overview and System Architecture, Core RAG Engine: Parsing, Chunking, Retrieval, and Knowledge, Agent System, Tools, and Workflow Orchestration
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Project Overview and System Architecture, Core RAG Engine: Parsing, Chunking, Retrieval, and Knowledge, Agent System, Tools, and Workflow Orchestration
Deployment, Configuration, Administration, and Model Integration
RAGFlow is an open-source Retrieval-Augmented Generation (RAG) engine that fuses RAG with Agent capabilities. Operating the system at production scale requires mastering four interrelated concerns: deploying the runtime stack, configuring infrastructure and models, administering tenants and resources, and integrating third-party model providers. This page covers all four areas, drawing on the project's official deployment manifests, engine abstractions, and operator interfaces. Source: README.md
Deployment
Prerequisites and Runtime Stack
The official deployment path is Docker Compose. The system requires at minimum 4 CPU cores, 16 GB RAM, 50 GB disk, Docker >= 24.0.0 with Docker Compose >= v2.26.1, Python >= 3.13, and (optionally) gVisor when the Agent's code executor sandbox is enabled. Source: README.md
Before starting, the host kernel parameter vm.max_map_count must be set to at least 262144 (Elasticsearch requirement). The README documents how to check and set it via sysctl -w vm.max_map_count=262144. Source: README.md
Compose Topology
Two compose files are maintained:
docker/docker-compose.ymlbrings up the RAGFlow application service on top of a dependency stack.docker/docker-compose-base.ymlprovides the dependencies: Elasticsearch (or Infinity), MySQL, MinIO, and Redis.
A legacy docker-compose-CN-oc9.yml and a docker-compose-macos.yml exist but are not actively maintained. Source: docker/README.md
The high-level deployment topology is:
flowchart LR
User[User / MCP Client] --> Web[Web Frontend<br/>Vite + React]
User --> API[RAGFlow API Server<br/>Python + Go]
API --> MySQL[(MySQL)]
API --> Redis[(Redis)]
API --> MinIO[(MinIO)]
API --> Engine{Doc Engine}
Engine -->|type=elasticsearch| ES[(Elasticsearch)]
Engine -->|type=infinity| INF[(Infinity)]
Web -.->|serves| UserKubernetes and Cloud
A community-requested Kubernetes deployment path (Helm charts or raw manifests) is tracked in issue #864. As of the most recent releases, official Helm support is not yet shipped; the supported path remains Docker Compose on a single host, optionally scaled by externalizing the dependency services. Source: docker/README.md
Configuration
Docker Environment Variables
The docker/.env file is the primary configuration surface for the container stack. The following variables are documented:
| Variable | Default | Purpose |
|---|---|---|
STACK_VERSION | 8.11.3 | Elasticsearch image version |
ES_PORT | 1200 | Host port exposed for Elasticsearch |
ELASTIC_PASSWORD | — | Elasticsearch bootstrap password |
KIBANA_PORT | — | Host port for the Kibana UI |
Source: docker/README.md
Service Configuration
docker/service_conf.yaml.template is rendered at startup and configures the RAGFlow service. The internal Go engine selects a document store via a doc_engine.type key. Two backend values are supported:
elasticsearch— fully implemented, configured withdoc_engine.es.hosts,username,password.infinity— a placeholder backend waiting for the official Infinity Go SDK; configuration keys includeuri,postgres_port,db_name.
Source: internal/engine/README.md
Engine selection happens once at process startup. The Go factory in internal/engine/engine_factory.go returns a DocEngine interface implementation that the rest of the service consumes uniformly for indexing, search, and document operations. Source: internal/engine/README.md
Parsers and OCR
The deepdoc/README.md introduces *Deep*Doc, RAGFlow's vision and parser subsystem. The vision pipeline provides OCR, layout recognition (10 base layout components: Text, Title, Figure, Figure caption, Table, Table caption, Header, Footer, Reference, Equation), and Table Structure Recognition (TSR). Operators can smoke-test these on local files using python deepdoc/vision/t_ocr.py and python deepdoc/vision/t_recognizer.py. Source: deepdoc/README.md
Administration
Admin Panel
Release v0.25.5 introduced local and SSH providers in the admin panel (PR #15039), allowing administrators to manage users, datasets, and storage backends directly from the web console. Source: GitHub release v0.25.5
CLI and Virtual Filesystem
The Go CLI under internal/cli exposes a virtual filesystem layered over RAGFlow's RESTful APIs. The design follows three principles: (1) no server-side changes, (2) a provider pattern with a common Provider interface in filesystem/base.go, and (3) unified commands (ls, search, cat, mkdir) over virtual paths. Supported paths include /datasets, /datasets/{name} (lists documents), and /datasets/{name}/{doc} (fetches a single document). Source: internal/cli/README.md
Skill Management
The CLI's install-skill command supports four source types: local paths, github.com/owner/repo/path, clawhub://owner/skill-name (ClawHub), and skill://skill-name (skills.sh). A defense-in-depth security architecture validates sources over HTTPS with SSL verification, quarantines downloads, runs regex-based static analysis against 100+ threat patterns (exfiltration, injection, destructive operations, persistence, network, obfuscation), and applies trust tiers. Limits: total skill size <= 50 MB, individual file <= 5 MB, text files only, lowercase alphanumeric names with hyphens/underscores. Source: internal/cli/filesystem/README.md
Frontend Build
The web UI is a Vite + React project. Build and development entry points live in web/package.json: npm run dev starts the dev server, npm run build produces a production bundle, and npm run type-check validates TypeScript. The schema editor component in web/src/components/jsonjoy-builder/lib/schema-editor.ts enforces JSONSchema field-name validation against the pattern ^[a-zA-Z_$][a-zA-Z0-9_$]*$. Source: web/package.json
Model Integration
LLM and Embedding Providers
RAGFlow ships with a model registry supporting OpenAI-compatible APIs. Release v0.25.4 added gpt-5.4-mini and gpt-5.4-nano to the OpenAI model list, and release v0.25.6 extended the Agent with a new Browser component that lets models navigate and interact with web pages autonomously (PR #14888). DeepSeek v4 support was added on 2026-04-24, and Gemini 3 Pro support on 2025-11-19. Source: README.md and GitHub release v0.25.4
Rerank and Retrieval
A community feature request to add Ollama rerank integration is tracked in issue #4406. Rerank models are referenced by ID through the rerank_id parameter on retrieval calls. The MCP server's ragflow_retrieval tool accepts rerank_id, similarity_threshold, vector_similarity_weight, keyword, top_k, and force_refresh arguments that all flow into the unified retrieval service. Source: mcp/server/server.py
Release v0.25.5 accelerated the dataset search path, reducing latency by 50–100% by removing an expensive vector fetch and rerank similarity computation from the hot path (PR #14970). Source: GitHub release v0.25.5
MCP Server
The mcp/server/server.py exposes RAGFlow as a Model Context Protocol server. Two tool entry points are registered: list_datasets (paginates /datasets and returns newline-delimited JSON) and ragflow_retrieval (performs cross-dataset retrieval with the parameters above). When MODE == HOST, the server installs an AuthMiddleware to enforce API key authentication. Source: mcp/server/server.py
Document Parsers
The v0.25.0 release added 7 built-in ingestion pipeline templates aligned with RAGFlow's native parsers, and v0.25.1 added the OpenDataLoader PDF backend. A community request to integrate Docling as an additional parser is tracked in issue #3443. For users migrating from Elasticsearch to OceanBase, the schema mapping in tools/es-to-oceanbase-migration/src/es_ob_migration/schema.py documents how chunk fields (content, tokens, keywords, tags, PageRank) translate to OceanBase column types. Source: GitHub release v0.25.1
Common Failure Modes
vm.max_map_counttoo low — Elasticsearch container fails to start. Mitigate withsudo sysctl -w vm.max_map_count=262144. Source: README.md- Infinity backend selected without SDK — the Infinity implementation is a placeholder; only Elasticsearch is fully functional. Source: internal/engine/README.md
- Type mismatch on rerank IDs — the Go
service.RetrievalTestRequest.TenantRerankIDfield has a known*stringvs*intmismatch with retrieval tests (issue #15714). Source: issue #15714 - Skill installation blocked — over-size archives, binary files, or suspicious patterns are rejected by the static analyzer. Source: internal/cli/filesystem/README.md
See Also
- README.md — Project overview and quickstart
- docker/README.md — Full Docker deployment reference
- deepdoc/README.md — Vision and parser subsystem
- internal/engine/README.md — Doc engine abstraction
- internal/cli/README.md — CLI and virtual filesystem
- mcp/server/server.py — MCP server reference
Source: https://github.com/infiniflow/ragflow / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
Upgrade or migration may change expected behavior: v0.23.1
Doramagic Pitfall Log
Found 20 structured pitfall item(s), including 2 high/blocking item(s). Top priority: Configuration risk - Configuration risk requires verification.
1. Configuration risk: Configuration risk requires verification
- Severity: high
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_7154f1df73d9467aa3d747477287e392 | https://github.com/infiniflow/ragflow/issues/15714
2. Security or permission risk: Security or permission risk requires verification
- Severity: high
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_8d8565f17f754fe3a6f7ad1f3b4be33d | https://github.com/infiniflow/ragflow/issues/15525
3. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_408303dfb4fb43a781b7dc14724082b9 | https://github.com/infiniflow/ragflow/issues/15751
4. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Developers should check this configuration risk before relying on the project: v0.23.1
- User impact: Upgrade or migration may change expected behavior: v0.23.1
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: v0.23.1. Context: Observed when using docker
- Evidence: failure_mode_cluster:github_release | fmev_38f958bf7c9ad232f6049339e1321be7 | https://github.com/infiniflow/ragflow/releases/tag/v0.23.1
5. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Developers should check this configuration risk before relying on the project: v0.24.0
- User impact: Upgrade or migration may change expected behavior: v0.24.0
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: v0.24.0. Context: Observed when using docker
- Evidence: failure_mode_cluster:github_release | fmev_0ca2840fc49d848176cce456864aafa3 | https://github.com/infiniflow/ragflow/releases/tag/v0.24.0
6. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Developers should check this configuration risk before relying on the project: v0.25.0
- User impact: Upgrade or migration may change expected behavior: v0.25.0
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: v0.25.0. Context: Observed when using python, docker
- Evidence: failure_mode_cluster:github_release | fmev_7154c897fed0437e0ca58d1f443b8d97 | https://github.com/infiniflow/ragflow/releases/tag/v0.25.0
7. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Developers should check this configuration risk before relying on the project: v0.25.1
- User impact: Upgrade or migration may change expected behavior: v0.25.1
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: v0.25.1. Context: Observed during version upgrade or migration.
- Evidence: failure_mode_cluster:github_release | fmev_12ff69cd8f090474bcc8768ed255e16a | https://github.com/infiniflow/ragflow/releases/tag/v0.25.1
8. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Developers should check this configuration risk before relying on the project: v0.25.2
- User impact: Upgrade or migration may change expected behavior: v0.25.2
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: v0.25.2. Context: Observed when using python
- Evidence: failure_mode_cluster:github_release | fmev_7f58552889f29288945720d487e8fbb7 | https://github.com/infiniflow/ragflow/releases/tag/v0.25.2
9. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Developers should check this configuration risk before relying on the project: v0.25.3
- User impact: Upgrade or migration may change expected behavior: v0.25.3
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: v0.25.3. Context: Observed when using docker
- Evidence: failure_mode_cluster:github_release | fmev_14af37b03860695c40160c241d23e5b1 | https://github.com/infiniflow/ragflow/releases/tag/v0.25.3
10. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Developers should check this configuration risk before relying on the project: v0.25.4
- User impact: Upgrade or migration may change expected behavior: v0.25.4
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: v0.25.4. Context: Source discussion did not expose a precise runtime context.
- Evidence: failure_mode_cluster:github_release | fmev_026d052ebdc28ef87ab4152d11b96502 | https://github.com/infiniflow/ragflow/releases/tag/v0.25.4
11. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Developers should check this configuration risk before relying on the project: v0.25.5
- User impact: Upgrade or migration may change expected behavior: v0.25.5
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: v0.25.5. Context: Observed when using python
- Evidence: failure_mode_cluster:github_release | fmev_57690c932d554b7b2b477b7e4564f3f5 | https://github.com/infiniflow/ragflow/releases/tag/v0.25.5
12. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Developers should check this configuration risk before relying on the project: v0.25.6
- User impact: Upgrade or migration may change expected behavior: v0.25.6
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: v0.25.6. Context: Observed when using python, cuda
- Evidence: failure_mode_cluster:github_release | fmev_e1befbd52e751833a5dab041663c4bf0 | https://github.com/infiniflow/ragflow/releases/tag/v0.25.6
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using ragflow with real data or production workflows.
- [[Question]: Title Chunker Failure after upgrading to v0.25.6](https://github.com/infiniflow/ragflow/issues/15525) - github / github_issue
- Go test files not compiled in CI — missing import undetected - github / github_issue
- [[Go] tenant_rerank_id type mismatch: *string should be *int — retrieval_](https://github.com/infiniflow/ragflow/issues/15714) - github / github_issue
- nightly - github / github_release
- v0.25.6 - github / github_release
- v0.25.5 - github / github_release
- v0.25.4 - github / github_release
- v0.25.3 - github / github_release
- v0.25.2 - github / github_release
- v0.25.1 - github / github_release
- v0.25.0 - github / github_release
- v0.24.0 - github / github_release
Source: Project Pack community evidence and pitfall evidence