# https://github.com/OpenBMB/UltraRAG Project Manual

Generated at: 2026-06-17 05:24:36 UTC

## Table of Contents

- [Overview and Core Architecture](#page-1)
- [MCP Servers and Core Components](#page-2)
- [Pipelines, Workflows and Examples](#page-3)
- [UI, Memory System and API Deployment](#page-4)

<a id='page-1'></a>

## Overview and Core Architecture

### Related Pages

Related topics: [MCP Servers and Core Components](#page-2), [Pipelines, Workflows and Examples](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/OpenBMB/UltraRAG/blob/main/README.md)
- [servers/corpus/src/corpus.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/corpus/src/corpus.py)
- [servers/retriever/src/websearch_backends/__init__.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/retriever/src/websearch_backends/__init__.py)
- [servers/retriever/src/websearch_backends/base.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/retriever/src/websearch_backends/base.py)
- [servers/retriever/src/index_backends/__init__.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/retriever/src/index_backends/__init__.py)
- [servers/retriever/src/index_backends/faiss_backend.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/retriever/src/index_backends/faiss_backend.py)
- [servers/retriever/src/index_backends/milvus_backend.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/retriever/src/index_backends/milvus_backend.py)
- [servers/retriever/src/websearch_backends/exa_backend.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/retriever/src/websearch_backends/exa_backend.py)
- [servers/retriever/src/websearch_backends/tavily_backend.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/retriever/src/websearch_backends/tavily_backend.py)
- [servers/retriever/src/websearch_backends/zhipuai_backend.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/retriever/src/websearch_backends/zhipuai_backend.py)
- [servers/evaluation/src/evaluation.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/evaluation/src/evaluation.py)
- [servers/custom/src/custom.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/custom/src/custom.py)
- [ui/frontend/package.json](https://github.com/OpenBMB/UltraRAG/blob/main/ui/frontend/package.json)
- [ui/frontend/src/shared/lib/chatMarkdown.ts](https://github.com/OpenBMB/UltraRAG/blob/main/ui/frontend/src/shared/lib/chatMarkdown.ts)
</details>

# Overview and Core Architecture

## 1. Purpose and Scope

UltraRAG is a lightweight RAG (Retrieval-Augmented Generation) development framework built on the **Model Context Protocol (MCP)** architecture. It is jointly developed by THUNLP at Tsinghua University, NEUIR at Northeastern University, OpenBMB, and AI9stars, and is positioned for both research exploration and industrial prototyping. Source: [README.md:43-49]().

The framework standardizes core RAG components — such as retrievers, generators, corpus processors, and evaluators — as independent **MCP Servers**, while a centralized **MCP Client** handles workflow orchestration. Developers express control flow (sequential, loop, and conditional branches) declaratively in YAML, letting them implement complex iterative RAG logic in a few dozen lines of configuration. Source: [README.md:9-15]().

The project targets two distinct user audiences:

- **Researchers** who need standardized evaluation workflows, ready-to-use benchmarks, and reproducible metric management.
- **Developers / end users** who need a fast path from a pipeline definition to a working conversational Web UI.

This dual-purpose design is reflected in the repository layout, which separates the orchestration layer (YAML pipelines, MCP client) from the component layer (atomic MCP servers such as `corpus`, `retriever`, `generator`, `evaluation`, and `custom`). Source: [README.md:9-19](), [README.md:75-83]().

## 2. Core Architecture

At a high level, UltraRAG is organized as a thin orchestration shell that talks to many small, pluggable MCP servers. Each server is a `fastmcp` application that registers its functionality through the `@app.tool` decorator, exposing one or more typed functions to the client.

```mermaid
flowchart LR
    YAML[YAML Pipeline Config] --> Client[MCP Client / Orchestrator]
    UI[UltraRAG UI / Canvas] --> Client
    Client -->|tool call| S1[corpus Server]
    Client -->|tool call| S2[retriever Server]
    Client -->|tool call| S3[generator Server]
    Client -->|tool call| S4[evaluation Server]
    Client -->|tool call| S5[custom Server]
    S2 -->|backend| W1[FAISS / Milvus]
    S2 -->|backend| W2[Exa / Tavily / ZhipuAI]
    S1 --> Corpus[JSONL Chunks]
    S4 --> Results[Metrics + Reports]
```

The MCP architecture is the defining feature: every functional unit (chunking, embedding, retrieval, web search, generation, evaluation, prompt transformation) is decoupled into an independent server. New features only need to be registered as function-level tools to integrate into existing workflows, giving very high reusability. Source: [README.md:13-19]().

The UI is a separate React/TypeScript application that consumes the same pipelines. It uses `@xyflow/react` for the visual canvas, `@tanstack/react-query` for server state, and renders chat output through a custom Markdown pipeline that supports KaTeX math, tables, and citation link rewriting. Source: [ui/frontend/package.json:11-29](), [ui/frontend/src/shared/lib/chatMarkdown.ts:6-22]().

## 3. Pluggable Backend Pattern

A defining implementation detail of the framework is the **backend registry pattern** used by the retriever and corpus servers. Each category of capability (index storage, web search provider) is encapsulated behind an abstract base class, and concrete implementations are registered in a dictionary that maps a short name to a `(module, class)` pair. The factory function dynamically imports the module and instantiates the class.

For example, the index backend registry maps `"faiss"` and `"milvus"` to their respective backend classes, and a `create_index_backend()` factory is the single entry point used by callers. Source: [servers/retriever/src/index_backends/__init__.py:5-26](). The same pattern is used for web search, where the registry maps `"exa"`, `"tavily"`, and `"zhipuai"` to their backend classes, all inheriting from a common `BaseWebSearchBackend`. Source: [servers/retriever/src/websearch_backends/__init__.py:8-32]().

This pattern delivers three concrete benefits:

1. **Uniform configuration surface** — callers only need to know the backend name and a config dict, not the underlying SDK.
2. **Optional dependency isolation** — if a backend's SDK (e.g., `exa_py`, `tavily`, `pymilvus`) is missing, only that backend fails to load; the rest of the framework still runs. Source: [servers/retriever/src/index_backends/milvus_backend.py:18-25](), [servers/retriever/src/websearch_backends/exa_backend.py:17-23]().
3. **Swappable providers** — switching from local FAISS to a managed Milvus cluster, or from Tavily to ZhipuAI web search, is purely a configuration change.

The base web search class also implements `_parallel_search`, a generic concurrency-controlled async dispatcher with a configurable `retrieve_thread_num`, so every concrete backend automatically gets rate-limited parallel execution. Source: [servers/retriever/src/websearch_backends/base.py:24-46]().

## 4. Component Inventory

The repository ships the following atomic servers, each addressing a single RAG concern:

| Server | Module Path | Responsibility |
| --- | --- | --- |
| `corpus` | `servers/corpus/src/corpus.py` | Token/sentence/recursive chunking via chonkie + tiktoken into JSONL. Source: [servers/corpus/src/corpus.py:80-90]() |
| `retriever` | `servers/retriever/src/index_backends/` | Vector indexing + similarity search over FAISS or Milvus. Source: [servers/retriever/src/index_backends/faiss_backend.py:18-26](), [servers/retriever/src/index_backends/milvus_backend.py:34-50]() |
| `retriever` (web) | `servers/retriever/src/websearch_backends/` | Pluggable web search across Exa, Tavily, ZhipuAI. Source: [servers/retriever/src/websearch_backends/exa_backend.py:11-30](), [servers/retriever/src/websearch_backends/tavily_backend.py:12-30](), [servers/retriever/src/websearch_backends/zhipuai_backend.py:14-32]() |
| `evaluation` | `servers/evaluation/src/evaluation.py` | Standardized metric collection, JSON + Markdown reporting with timestamped output. Source: [servers/evaluation/src/evaluation.py:6-29]() |
| `custom` | `servers/custom/src/custom.py` | RAG-specific prompt transforms: Search-o1 information extraction, IterRetGen query building, `\boxed{}` answer extraction. Source: [servers/custom/src/custom.py:6-9](), [servers/custom/src/custom.py:80-95]() |

Each tool is registered with an explicit `output="a,b->c"` mapping that tells the orchestrator how to feed the tool's return value into downstream step inputs. This is the contract that makes the YAML pipeline declarative. Source: [servers/custom/src/custom.py:6-9](), [servers/custom/src/custom.py:80-95]().

## 5. Deployment and the "Pipeline-as-API" Question

The framework deliberately separates authoring from serving. A pipeline is first authored as a YAML file that the orchestrator runs locally, and the same YAML can be loaded by the UI's Pipeline Builder (canvas + code, with bidirectional sync) for visual debugging. Source: [README.md:51-60](), [README.md:9-15]().

A common community question (GitHub issue #95) is whether a finished pipeline can be exposed as a callable API, similar to Dify. The architecture already supports this direction: every server runs as a standalone `fastmcp` process over stdio, and the MCP client is the only component that needs to wrap the YAML execution loop in an HTTP service. In practice, the typical patterns are:

- Wrap the `MCP Client` runner behind a FastAPI/Flask handler that accepts a query, dispatches to the registered MCP servers, and returns the final answer.
- For production, deploy each MCP server as a separate container and have the client connect over the network transport instead of stdio.
- Use the UI as a frontend that already calls the same pipeline over HTTP, so the same backend can serve both the canvas and external API consumers.

The release notes for **v0.3.0.2** (2026-04-09) further strengthen this serving story by adding SQLite-backed authentication, persistent chat sessions, nickname and model settings, and a memory-aware RAG demo — all of which assume a stateful HTTP service in front of the orchestrator. Source: [README.md:107-117]().

## See Also

- Research Experiments — datasets, evaluation workflows, and case-study debugging
- UI Quick Start — launching the Pipeline Builder and admin mode
- Deployment Guide — production setup for retrievers, LLMs, and Milvus
- Code Integration — calling UltraRAG components directly from Python

---

<a id='page-2'></a>

## MCP Servers and Core Components

### Related Pages

Related topics: [Overview and Core Architecture](#page-1), [Pipelines, Workflows and Examples](#page-3), [UI, Memory System and API Deployment](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/OpenBMB/UltraRAG/blob/main/README.md)
- [servers/corpus/src/corpus.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/corpus/src/corpus.py)
- [servers/retriever/src/retriever.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/retriever/src/retriever.py)
- [servers/retriever/src/index_backends/__init__.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/retriever/src/index_backends/__init__.py)
- [servers/retriever/src/index_backends/faiss_backend.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/retriever/src/index_backends/faiss_backend.py)
- [servers/retriever/src/index_backends/milvus_backend.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/retriever/src/index_backends/milvus_backend.py)
- [servers/retriever/src/websearch_backends/__init__.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/retriever/src/websearch_backends/__init__.py)
- [servers/retriever/src/websearch_backends/base.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/retriever/src/websearch_backends/base.py)
- [servers/retriever/src/websearch_backends/exa_backend.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/retriever/src/websearch_backends/exa_backend.py)
- [servers/retriever/src/websearch_backends/tavily_backend.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/retriever/src/websearch_backends/tavily_backend.py)
- [servers/retriever/src/websearch_backends/zhipuai_backend.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/retriever/src/websearch_backends/zhipuai_backend.py)
- [servers/generation/src/generation.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/generation/src/generation.py)
- [servers/evaluation/src/evaluation.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/evaluation/src/evaluation.py)
- [servers/memory/src/memory.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/memory/src/memory.py)
- [servers/custom/src/custom.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/custom/src/custom.py)
</details>

# MCP Servers and Core Components

## Overview

UltraRAG is a lightweight RAG (Retrieval-Augmented Generation) development framework built on top of the **Model Context Protocol (MCP)**. Its core design philosophy is to decouple every RAG capability into a standalone **MCP Server** that exposes fine-grained **Tools** over a standardized interface. A separate **MCP Client** orchestrates these servers through YAML pipelines, supporting sequential execution, loops, and conditional branches without writing glue code.

Source: [README.md:14-18]()

The framework is jointly maintained by THUNLP at Tsinghua University, NEUIR at Northeastern University, OpenBMB, and AI9stars, and targets both research exploration and industrial prototyping. Because each server is a normal MCP process, the same tool can be reused across pipelines, swapped in benchmarks, or wrapped behind custom UIs.

## Core MCP Server Inventory

UltraRAG ships a curated set of servers, each registered as a separate stdio-processable module. The following table summarizes the canonical servers found in the repository tree:

| Server | Module Path | Primary Responsibility |
|---|---|---|
| `corpus` | `servers/corpus/src/corpus.py` | Document loading and chunking (`token`, `sentence`, `recursive` strategies via `chonkie`) |
| `retriever` | `servers/retriever/src/retriever.py` | Embedding-based and web-search-based retrieval with pluggable backends |
| `generation` | `servers/generation/src/generation.py` | LLM inference, including vLLM, multimodal, and multi-turn generation |
| `evaluation` | `servers/evaluation/src/evaluation.py` | Metric computation and result persistence (JSON + Markdown) |
| `memory` | `servers/memory/src/memory.py` | Persistent per-user and per-project memory with filesystem isolation |
| `custom` | `servers/custom/src/custom.py` | Project-specific utility tools (e.g., Search-o1 reason/final information extraction) |

Source: [servers/corpus/src/corpus.py:1-50](), [servers/retriever/src/retriever.py:1-80](), [servers/memory/src/memory.py:1-30]()

Each server is instantiated through a shared helper, `UltraRAG_MCP_Server`, which is imported from the `ultrarag.server` package. For example, the memory server starts with `app = UltraRAG_MCP_Server("memory")` and then registers tools via decorators, while the generation server binds methods through `mcp_inst.tool(...)` with explicit `output` signatures that double as contract definitions for the client. Source: [servers/memory/src/memory.py:14-19](), [servers/generation/src/generation.py:30-60]()

## Retriever Internals: Pluggable Backends

The retriever is the most backend-rich server in the framework. It separates concerns into two sub-packages, each registered through its own factory:

### Index Backends

The index layer is responsible for vector storage and nearest-neighbor search. Backends are dynamically loaded by name:

```python
_INDEX_BACKENDS = {
    "faiss": ".faiss_backend.FaissIndexBackend",
    "milvus": ".milvus_backend.MilvusIndexBackend",
}
```

Source: [servers/retriever/src/index_backends/__init__.py:10-14]()

- **FAISS** supports both CPU and GPU modes. The constructor reads `index_use_gpu` and a configurable `device_num`, then resolves a persistent `index_path` for serialization. It requires `faiss-cpu` or `faiss-gpu-cu12`. Source: [servers/retriever/src/index_backends/faiss_backend.py:19-50]()
- **Milvus** connects to an external `MilvusClient` over `uri`/`token`, declares a `collection_name` and the names of the `id_field`/`vector_field` columns, and is the only supported index in the built-in demo mode. Source: [servers/retriever/src/index_backends/milvus_backend.py:20-60](), [servers/retriever/src/retriever.py:40-58]()

When `is_demo=True` is set on the retriever, both the embedding backend and the index backend are forced to `openai` and `milvus` respectively, and the server raises a `ValidationError` if those keys are missing from configuration. Source: [servers/retriever/src/retriever.py:42-58]()

### Web-Search Backends

For open-domain retrieval, UltraRAG wraps three commercial search providers behind a common async interface:

```python
_WEBSEARCH_BACKENDS = {
    "exa":     ".exa_backend.ExaWebSearchBackend",
    "tavily":  ".tavily_backend.TavilyWebSearchBackend",
    "zhipuai": ".zhipuai_backend.ZhipuaiWebSearchBackend",
}
```

Source: [servers/retriever/src/websearch_backends/__init__.py:12-16]()

The abstract base class `BaseWebSearchBackend` implements a concurrency-controlled `asyncio.Semaphore` worker pool that processes queries in parallel, with a `tqdm` progress bar that integrates with the server's logging. Source: [servers/retriever/src/websearch_backends/base.py:18-46]()

Each concrete backend reads its API key from configuration or environment variables (`EXA_API_KEY`, `TAVILY_API_KEY`, `ZHIPUAI_API_KEY`), and raises explicit `ToolError`/`ImportError` exceptions when dependencies or credentials are missing. This is the project's idiomatic way of surfacing misconfiguration to the orchestration layer. Source: [servers/retriever/src/websearch_backends/exa_backend.py:18-30](), [servers/retriever/src/websearch_backends/tavily_backend.py:18-50](), [servers/retriever/src/websearch_backends/zhipuai_backend.py:15-30]()

## Memory, Evaluation, and the Web UI

The `memory` server, introduced prominently in v0.3.0.2, stores persistent state under `<ui>/storage/memory/<user_id>/`, with a `MEMORY.md` file and a per-project subdirectory. The `user_id` is validated against `^[A-Za-z0-9_-]+$` to prevent path traversal, and the storage root can be relocated via the `ULTRARAG_UI_STORAGE_ROOT` environment variable. Source: [servers/memory/src/memory.py:14-40]()

The `evaluation` server is intentionally lightweight: it accepts a metric dictionary, writes a timestamped JSON file under the configured `save_path`, and optionally renders the result as a Markdown table for human inspection. Source: [servers/evaluation/src/evaluation.py:30-60]()

The companion Web IDE is built on React 19, Vite, TanStack Query, and `@xyflow/react` for the pipeline canvas, with `highlight.js`, `marked`, `katex`, and `dompurify` powering the rich-text rendering of prompts and responses. Source: [ui/frontend/package.json:6-28]()

## Community Context

A recurring community question is whether a finished pipeline can be deployed as a callable API, similar to Dify. Because each server speaks MCP natively, the same YAML pipeline that runs locally can be driven by any MCP client; the practical pattern is to keep the orchestration running as a service and expose the client over HTTP, rather than wrapping the YAML itself. The v0.3.0.2 release also adds SQLite-backed authentication and persistent chat sessions in the Web UI, which already expose the pipelines as interactive endpoints that can be reverse-proxied behind a public API gateway.

## See Also

- UltraRAG Pipeline Authoring (YAML control structures)
- Retriever Index Backend Configuration
- Memory Server and UI Storage Layout
- UltraRAG Web UI Overview

---

<a id='page-3'></a>

## Pipelines, Workflows and Examples

### Related Pages

Related topics: [Overview and Core Architecture](#page-1), [MCP Servers and Core Components](#page-2), [UI, Memory System and API Deployment](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [examples/experiments/sayhello.yaml](https://github.com/OpenBMB/UltraRAG/blob/main/examples/experiments/sayhello.yaml)
- [examples/experiments/rag_full.yaml](https://github.com/OpenBMB/UltraRAG/blob/main/examples/experiments/rag_full.yaml)
- [examples/experiments/rag_loop.yaml](https://github.com/OpenBMB/UltraRAG/blob/main/examples/rag_loop.yaml)
- [examples/experiments/rag_branch.yaml](https://github.com/OpenBMB/UltraRAG/blob/main/examples/experiments/rag_branch.yaml)
- [examples/experiments/rag_deploy.yaml](https://github.com/OpenBMB/UltraRAG/blob/main/examples/experiments/rag_deploy.yaml)
- [examples/experiments/ircot.yaml](https://github.com/OpenBMB/UltraRAG/blob/main/examples/experiments/ircot.yaml)
- [README.md](https://github.com/OpenBMB/UltraRAG/blob/main/README.md)
- [servers/retriever/src/websearch_backends/__init__.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/retriever/src/websearch_backends/__init__.py)
- [servers/retriever/src/index_backends/__init__.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/retriever/src/index_backends/__init__.py)
- [servers/custom/src/custom.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/custom/src/custom.py)
- [servers/evaluation/src/evaluation.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/evaluation/src/evaluation.py)
</details>

# Pipelines, Workflows and Examples

## Overview

A **pipeline** in UltraRAG is a YAML-declared workflow that orchestrates one or more MCP Servers into an end-to-end RAG or agent procedure. Instead of writing Python glue code, developers describe a graph of node invocations, control structures, and parameter bindings in a single configuration file. The MCP Client (the framework's orchestrator) consumes this YAML and dispatches calls to atomic MCP Servers such as `corpus`, `retriever`, `generation`, `evaluation`, and `custom` (Source: [README.md](https://github.com/OpenBMB/UltraRAG/blob/main/README.md)).

UltraRAG natively supports the three control structures that RAG research and prototyping actually need: **sequential** execution, **loop** (e.g., iterative retrieval-generation), and **conditional branch** (Source: [README.md](https://github.com/OpenBMB/UltraRAG/blob/main/README.md)). Each node in a pipeline is a function-level **Tool** exposed by an MCP Server, and new tools can be added by registering them — the pipeline layer remains unchanged. The example workflows under `examples/experiments/` illustrate every pattern.

```mermaid
flowchart LR
    A[Query] --> B[Retriever Server]
    B --> C{Loop or Branch?}
    C -->|loop| B
    C -->|sequential| D[Generation Server]
    D --> E[Evaluation Server]
    E --> F[Result]
```

## Bundled Example Pipelines

The `examples/experiments/` directory ships a small, ordered curriculum of pipeline YAMLs that doubles as both documentation and test fixtures.

| Pipeline | Purpose | Key Servers Touched |
|---|---|---|
| `sayhello.yaml` | Minimal smoke test that wires one server call end-to-end | `custom` |
| `rag_full.yaml` | Canonical RAG: retrieve → generate → evaluate | `retriever`, `generation`, `evaluation` |
| `rag_loop.yaml` | Iterative retrieval/generation (e.g., IRCoT-style refinement) | `retriever`, `generation`, `custom` |
| `rag_branch.yaml` | Conditional routing (e.g., skip retrieval when confidence is high) | `retriever`, `generation` |
| `rag_deploy.yaml` | Production-shaped pipeline ready to be served as a demo | `retriever`, `generation`, `custom` |
| `ircot.yaml` | Interleaved Retrieval + Chain-of-Thought research recipe | `retriever`, `generation`, `custom` |

### Sequential flow (`rag_full.yaml`)

The default RAG path chains three nodes: corpus indexing (optional), a retriever call, and a generation call. Retriever backends are pluggable — `create_index_backend` resolves names like `faiss` and `milvus` from the `index_backends` registry (Source: [servers/retriever/src/index_backends/__init__.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/retriever/src/index_backends/__init__.py)), and `create_websearch_backend` resolves `exa`, `tavily`, and `zhipuai` (Source: [servers/retriever/src/websearch_backends/__init__.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/retriever/src/websearch_backends/__init__.py)). This lets `rag_full.yaml` remain backend-agnostic — switching from FAISS to Milvus or Tavily to Exa is a one-line config change.

### Loop flow (`rag_loop.yaml` and `ircot.yaml`)

Loop pipelines repeatedly invoke a sub-graph until a stop condition is met. The `custom` server supplies the glue: `iterretgen_nextquery` concatenates the previous query and answer to produce the next retrieval query (Source: [servers/custom/src/custom.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/custom/src/custom.py)), and `search_o1_extract_query` pulls `<|begin_of_query|>`-tagged queries out of LLM output for the next iteration. `ircot.yaml` builds on the same primitives to interleave chain-of-thought reasoning with retrieval steps.

### Branch flow (`rag_branch.yaml`)

Conditional branches route execution based on a runtime predicate evaluated against the current state. Typical predicates include "retrieval confidence above threshold" or "answer already contains citation." The branch node is declared in YAML; the underlying evaluation logic is supplied by an MCP Server tool, keeping the orchestration declarative.

## Evaluation and Debugging Hooks

Every research pipeline can attach the `evaluation` server as a terminal node. The `save_eval_results` tool writes timestamped JSON and optionally prints a Markdown table of averaged metrics, which makes benchmarking reproducible (Source: [servers/evaluation/src/evaluation.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/evaluation/src/evaluation.py)). The repo also provides a **Structured Debugging Guide** covering four layers — input & retrieval, reasoning & planning, state & context, and deployment & runtime — to attribute failures when answers look suspicious (Source: [README.md](https://github.com/OpenBMB/UltraRAG/blob/main/README.md)).

## From Pipeline to Service

A recurring community question is how to expose a finished pipeline as a callable HTTP API (Dify-style). The `rag_deploy.yaml` example and the **One-Click Delivery** workflow address this: a pipeline is converted into an interactive conversational Web UI with a single command (Source: [README.md](https://github.com/OpenBMB/UltraRAG/blob/main/README.md)). For developers, the recommended path is to start from `rag_deploy.yaml`, then consult the [Deployment Guide](https://ultrarag.openbmb.cn/pages/en/ui/prepare) for production environment setup including Retriever, LLM, and Milvus configuration. The Deep Research demo (powered by the `AgentCPM-Report` model) demonstrates this end-to-end: a pipeline runs multi-step retrieval and integration to produce a long-form report (Source: [README.md](https://github.com/OpenBMB/UltraRAG/blob/main/README.md)).

## See Also

- Quick Start (Research Experiments): [ultrarag.openbmb.cn/pages/en/getting_started/quick_start](https://ultrarag.openbmb.cn/pages/en/getting_started/quick_start)
- Case Analysis & Visual Debugging: [ultrarag.openbmb.cn/pages/en/develop_guide/case_study](https://ultrarag.openbmb.cn/pages/en/develop_guide/case_study)
- Code Integration (calling components from Python): [ultrarag.openbmb.cn/pages/en/develop_guide/code_integration](https://ultrarag.openbmb.cn/pages/en/develop_guide/code_integration)
- UI Quick Start: [ultrarag.openbmb.cn/pages/en/ui/start](https://ultrarag.openbmb.cn/pages/en/ui/start)
- Deep Research demo: [ultrarag.openbmb.cn/pages/en/demo/deepresearch](https://ultrarag.openbmb.cn/pages/en/demo/deepresearch)

---

<a id='page-4'></a>

## UI, Memory System and API Deployment

### Related Pages

Related topics: [Overview and Core Architecture](#page-1), [MCP Servers and Core Components](#page-2), [Pipelines, Workflows and Examples](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [ui/backend/app.py](https://github.com/OpenBMB/UltraRAG/blob/main/ui/backend/app.py)
- [ui/backend/auth.py](https://github.com/OpenBMB/UltraRAG/blob/main/ui/backend/auth.py)
- [ui/backend/chat_store.py](https://github.com/OpenBMB/UltraRAG/blob/main/ui/backend/chat_store.py)
- [ui/backend/kb_visibility_store.py](https://github.com/OpenBMB/UltraRAG/blob/main/ui/backend/kb_visibility_store.py)
- [ui/backend/pipeline_manager.py](https://github.com/OpenBMB/UltraRAG/blob/main/ui/backend/pipeline_manager.py)
- [ui/backend/storage_paths.py](https://github.com/OpenBMB/UltraRAG/blob/main/ui/backend/storage_paths.py)
- [README.md](https://github.com/OpenBMB/UltraRAG/blob/main/README.md)
</details>

# UI, Memory System and API Deployment

## 1. Overview and Scope

UltraRAG ships a first-class **visual RAG Integrated Development Environment (IDE)** that goes beyond a conventional chat interface. The UI combines pipeline orchestration, debugging, and demonstration in a single surface, allowing users to design, run, and inspect MCP-based RAG pipelines without writing code by hand. According to the project README, "UltraRAG UI transcends the boundaries of traditional chat interfaces, evolving into a visual RAG Integrated Development Environment (IDE) that combines orchestration, debugging, and demonstration." Source: [README.md](https://github.com/OpenBMB/UltraRAG/blob/main/README.md)

Three concerns are tightly coupled in the `ui/backend` module:

| Concern | Source File | Role |
| --- | --- | --- |
| HTTP entry point | [ui/backend/app.py](https://github.com/OpenBMB/UltraRAG/blob/main/ui/backend/app.py) | Hosts REST endpoints consumed by the React/Vite frontend (`ui/frontend/package.json`) |
| Identity | [ui/backend/auth.py](https://github.com/OpenBMB/UltraRAG/blob/main/ui/backend/auth.py) | SQLite-backed authentication, nickname and model settings |
| State | [ui/backend/chat_store.py](https://github.com/OpenBMB/UltraRAG/blob/main/ui/backend/chat_store.py) | Persistent chat sessions |
| Knowledge base ACL | [ui/backend/kb_visibility_store.py](https://github.com/OpenBMB/UltraRAG/blob/main/ui/backend/kb_visibility_store.py) | Per-user knowledge base visibility |
| Pipeline execution | [ui/backend/pipeline_manager.py](https://github.com/OpenBMB/UltraRAG/blob/main/ui/backend/pipeline_manager.py) | Bridges UI actions to MCP servers and pipelines |
| Path resolution | [ui/backend/storage_paths.py](https://github.com/OpenBMB/UltraRAG/blob/main/ui/backend/storage_paths.py) | Centralizes where artefacts are written on disk |

The v0.3.0.2 release (2026-04-09) explicitly introduced a **memory upgrade**: persistent user memory, project memory retrieval, a memory-aware RAG demo, and SQLite-backed authentication, persistent chat sessions, nickname and model settings management. Source: [GitHub Release v0.3.0.2](https://github.com/OpenBMB/UltraRAG/releases)

## 2. The UI: A Visual RAG IDE

The UI is implemented as a React 19 + Vite single-page application that talks to the FastAPI/Flask-style backend in `ui/backend/app.py`. The frontend stack (Radix UI primitives, `@xyflow/react` for the canvas, `@tanstack/react-query` for data fetching, `marked` + KaTeX for rendering, and `js-yaml` for editing) indicates a canvas-based pipeline builder with bidirectional YAML synchronization. Source: [ui/frontend/package.json](https://github.com/OpenBMB/UltraRAG/blob/main/ui/frontend/package.json)

Key user-facing capabilities, as documented in the README, include:

- **Pipeline Builder** with bidirectional real-time synchronization between "Canvas Construction" and "Code Editing," allowing granular online adjustments of pipeline parameters and prompts.
- **Intelligent AI Assistant** that assists the full development lifecycle.
- **One-Click Delivery** — a Pipeline defined in YAML can be converted into an interactive conversational Web UI. Source: [README.md](https://github.com/OpenBMB/UltraRAG/blob/main/README.md)

The flow between a user's click and a pipeline execution is mediated by `pipeline_manager.py`, which wraps the YAML-driven MCP client and the atomic MCP servers (Retriever, Generation, Corpus, Evaluation, Custom).

```mermaid
flowchart LR
    User[Browser UI] -->|HTTP| App[app.py]
    App --> Auth[auth.py]
    App --> PM[pipeline_manager.py]
    PM -->|YAML| MCPClient[MCP Client]
    MCPClient --> Srv1[Retriever Server]
    MCPClient --> Srv2[Generation Server]
    MCPClient --> Srv3[Corpus / Eval / Custom]
    PM --> Chat[chat_store.py]
    Chat --> SQLite[(SQLite)]
    Auth --> SQLite
    KBV[kb_visibility_store.py] --> SQLite
    SP[storage_paths.py] --> FS[(File System)]
```

## 3. Memory System (v0.3.0.2)

The v0.3.0.2 release introduced three layered memory capabilities, all routed through the `ui/backend` layer:

### 3.1 Persistent User Memory
`auth.py` and `chat_store.py` together provide SQLite-backed authentication, nicknames, model preferences, and persistent chat sessions. This means a returning user sees their prior conversations, selected model, and identity without reconfiguration. Source: [GitHub Release v0.3.0.2](https://github.com/OpenBMB/UltraRAG/releases)

### 3.2 Project Memory Retrieval
Project memory is shared, retrievable state that augments the RAG pipeline itself. The release notes describe "persistent user memory, project memory retrieval, and a dedicated memory-aware RAG demo." This is exposed as additional context fetched by the pipeline orchestrator (`pipeline_manager.py`) before generation. Source: [GitHub Release v0.3.0.2](https://github.com/OpenBMB/UltraRAG/releases)

### 3.3 Memory-Aware RAG Demo
A dedicated demo showcases how the memory layer plugs into an existing pipeline. The demo is delivered as a configured YAML pipeline plus a UI mode, leveraging `storage_paths.py` to keep memory artefacts isolated per project. Source: [ui/backend/storage_paths.py](https://github.com/OpenBMB/UltraRAG/blob/main/ui/backend/storage_paths.py)

Together, the three layers make the demo experience "significantly more stateful and personalized" — every chat turn is grounded in the user's identity, project context, and historical interactions. Source: [GitHub Release v0.3.0.2](https://github.com/OpenBMB/UltraRAG/releases)

## 4. Pipeline Deployment as an API

A recurring community question is whether a tested pipeline can be exposed as a callable HTTP API, comparable to Dify. The top community issue (#95) asks: "Does it support wrapping a pipeline as an API that can be called, similar to Dify?" — confirming strong demand for productionization. Source: [GitHub Issue #95](https://github.com/OpenBMB/UltraRAG/issues/95)

UltraRAG answers this through the same `ui/backend/app.py` layer used by the web IDE. The `pipeline_manager.py` module loads a YAML pipeline, instantiates the MCP client, and invokes the configured MCP servers. Because this invocation path is decoupled from the WebSocket/HTTP transport used by the SPA, the same manager can be exposed behind any HTTP route — effectively turning the pipeline into a callable service.

The typical deployment pattern is:

1. Author the pipeline as YAML (using the visual builder or by hand).
2. Configure a backend route in `app.py` that accepts a request body, hands it to `pipeline_manager.py`, and returns the pipeline's final output.
3. Reuse the existing MCP servers under `servers/*` (Retriever, Generation, Corpus, Evaluation, Custom) — for example, the `servers/retriever/src/retriever.py` server already exposes FAISS and Milvus index backends plus Exa / Tavily / ZhipuAI web search backends, all addressable from the same API surface. Source: [servers/retriever/src/retriever.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/retriever/src/retriever.py)

For knowledge base isolation between API consumers, `kb_visibility_store.py` provides per-user access control so that the same deployed API can serve multiple tenants without leaking corpora. Source: [ui/backend/kb_visibility_store.py](https://github.com/OpenBMB/UltraRAG/blob/main/ui/backend/kb_visibility_store.py)

## 5. Common Failure Modes

- **Authentication required**: With SQLite-backed auth enabled in v0.3.0.2, API callers must provide valid credentials; anonymous calls return 401 from `auth.py`.
- **Missing index backend**: If the pipeline selects Milvus but `pymilvus` is not installed, the retriever raises `ImportError`. Source: [servers/retriever/src/index_backends/milvus_backend.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/retriever/src/index_backends/milvus_backend.py)
- **Missing web search dependency**: Each web search backend (Exa, Tavily, ZhipuAI) lazily imports its client and raises `ImportError` if the optional dependency is missing. Source: [servers/retriever/src/websearch_backends/__init__.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/retriever/src/websearch_backends/__init__.py)
- **Chunking backend unavailable**: The corpus server requires `chonkie` and `tiktoken`; absence raises `ToolError`. Source: [servers/corpus/src/corpus.py](https://github.com/OpenBMB/UltraRAG/blob/main/servers/corpus/src/corpus.py)
- **Stale project paths**: Moving the project directory invalidates the resolved paths from `storage_paths.py` and causes write failures.

## See Also

- [README.md](https://github.com/OpenBMB/UltraRAG/blob/main/README.md) — project overview, installation, and feature highlights.
- [GitHub Issue #95](https://github.com/OpenBMB/UltraRAG/issues/95) — community discussion on pipeline-as-API deployment.
- [GitHub Release v0.3.0.2](https://github.com/OpenBMB/UltraRAG/releases) — official notes for the memory upgrade.
- [UltraRAG Documentation](https://ultrarag.openbmb.cn/pages/en/ui/start) — UI quick start and deployment guide.
- [UltraRAG Deep Research Demo](https://ultrarag.openbmb.cn/pages/en/demo/deepresearch) — flagship end-to-end pipeline that exercises the same UI and deployment path.

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: OpenBMB/UltraRAG

Summary: Found 6 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Capability evidence risk - Capability evidence risk requires verification.

## 1. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/OpenBMB/UltraRAG

## 2. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/OpenBMB/UltraRAG

## 3. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/OpenBMB/UltraRAG

## 4. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/OpenBMB/UltraRAG

## 5. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/OpenBMB/UltraRAG

## 6. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/OpenBMB/UltraRAG

<!-- canonical_name: OpenBMB/UltraRAG; human_manual_source: deepwiki_human_wiki -->