# https://github.com/LazyAGI/LazyLLM Project Manual

Generated at: 2026-06-20 06:05:28 UTC

## Table of Contents

- [LazyLLM Overview and System Architecture](#page-1)
- [Components, Modules, and Flows](#page-2)
- [RAG Pipeline, Document Processing, and Stores](#page-3)
- [Agents, Tools, Memory, and Online Model Integration](#page-4)

<a id='page-1'></a>

## LazyLLM Overview and System Architecture

### Related Pages

Related topics: [Components, Modules, and Flows](#page-2), [RAG Pipeline, Document Processing, and Stores](#page-3), [Agents, Tools, Memory, and Online Model Integration](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/LazyAGI/LazyLLM/blob/main/README.md)
- [lazyllm/cli/main.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/cli/main.py)
- [lazyllm/cli/README.md](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/cli/README.md)
- [lazyllm/cli/skills.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/cli/skills.py)
- [lazyllm/cli/review.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/cli/review.py)
- [lazyllm/tools/agent/AGENTS.md](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/agent/AGENTS.md)
- [lazyllm/tools/agent/base.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/agent/base.py)
- [lazyllm/tools/agent/reactAgent.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/agent/reactAgent.py)
- [lazyllm/tools/agent/toolsManager.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/agent/toolsManager.py)
- [lazyllm/tools/agent/skill_manager.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/agent/skill_manager.py)
- [lazyllm/prompt_templates/prompts_actor/README.md](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/prompt_templates/prompts_actor/README.md)
</details>

# LazyLLM Overview and System Architecture

## Purpose and Scope

LazyLLM is a low-code development tool for building **multi-agent** large language model (LLM) applications. It enables developers to assemble complex AI applications from reusable modules, flows, and components without deep knowledge of LLM infrastructure, prompt engineering, or deployment plumbing. Source: [README.md](https://github.com/LazyAGI/LazyLLM/blob/main/README.md).

The project positions itself around four design pillars documented in the README:

- **Convenient AI Application Assembly** — Lego-like composition of agents, data flows, and functional modules.
- **One-Click Deployment** — Lightweight gateway during POC, and one-click image packaging for production.
- **Cross-Platform Compatibility** — Single code path across bare-metal, dev machines, Slurm clusters, and public clouds.
- **Unified User Experience** — A single API surface for online (OpenAI, SenseNova, Kimi, ChatGLM, etc.) and locally deployed models. Source: [README.md](https://github.com/LazyAGI/LazyLLM/blob/main/README.md).

The application development lifecycle follows a **prototype → data feedback → iterative optimization** loop, where LazyLLM aims to support each stage — from rapid prototyping through fine-tuning and production deployment. Source: [README.md](https://github.com/LazyAGI/LazyLLM/blob/main/README.md).

## High-Level System Architecture

LazyLLM is organized as a layered system. At the top, the **CLI** (`lazyllm` command) provides entry points for installation, deployment, running, skills, and code review. Source: [lazyllm/cli/main.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/cli/main.py). Below it, the **Module system** and **Flow system** form the core abstractions for building applications, while **Agents**, **RAG infrastructure**, and **Deployment infrastructure** sit on top as higher-level building blocks.

```mermaid
graph TB
    subgraph CLI["CLI Layer (lazyllm cli/main.py)"]
        Install["install"]
        Deploy["deploy"]
        Run["run"]
        Skills["skills"]
        Review["review / review-local"]
    end

    subgraph Core["Core Abstractions"]
        Modules["Module System<br/>(ModuleBase, TrainableModule,<br/>OnlineChatModule, etc.)"]
        Flows["Flow System<br/>(Pipeline, Parallel,<br/>Loop, IFS, Warp)"]
    end

    subgraph HighLevel["High-Level Components"]
        Agents["Agent System<br/>(ReactAgent, PlanAndSolveAgent,<br/>ReWOOAgent, FunctionCall)"]
        RAG["RAG Infrastructure<br/>(Document, Retriever,<br/>Reranker, Splitter)"]
        DeployInfra["Deployment Infrastructure<br/>(ServerModule, WebModule,<br/>TrainableModule)"]
    end

    subgraph Backends["Backend Integrations"]
        LocalModels["Local Inference<br/>(lightllm, vllm)"]
        OnlineModels["Online Providers<br/>(OpenAI, SiliconFlow,<br/>MiniMax, SenseNova, ...)"]
        Storage["Storage<br/>(Elasticsearch, OceanBase,<br/>Milvus, ChromaDB)"]
    end

    CLI --> Core
    Core --> HighLevel
    HighLevel --> Backends
```

## Core Abstractions: Modules and Flows

### Module System

The Module system is the foundational abstraction. LazyLLM provides a structured taxonomy of module types, each combining training, fine-tuning, serving, and deployment capabilities. Source: [README.md](https://github.com/LazyAGI/LazyLLM/blob/main/README.md).

| Module Type | Purpose | Train | Fine-tune | Serve | Deploy |
|---|---|---|---|---|---|
| `ModuleBase` | Wrap any callable into a Module | — | — | — | — |
| `ActionModule` | Trainable & deployable wrapper | ✅ | ✅ | ✅ | ✅ |
| `UrlModule` | Wrap external URLs as Modules | ❌ | ❌ | ✅ | ✅ |
| `ServerModule` | Wrap any callable as an API service | ❌ | ✅ | ✅ | ✅ |
| `TrainableModule` | Base for all supported models | ✅ | ✅ | ✅ | ✅ |
| `WebModule` | Multi-round dialogue interface | ❌ | ✅ | ❌ | ✅ |
| `OnlineChatModule` | Online chat (training + inference) | ✅ | ✅ | ✅ | ✅ |
| `OnlineEmbeddingModule` | Online embedding inference | ❌ | ✅ | ✅ | ✅ |

Source: [README.md](https://github.com/LazyAGI/LazyLLM/blob/main/README.md).

### Flow System

Flows describe how data is passed between callable objects. LazyLLM ships with predefined flow primitives: **Pipeline**, **Parallel**, **Diverter**, **Warp**, **IFS**, and **Loop**. These can be composed recursively with Modules, Components, or any Python callable. The flow abstraction makes it simple to add, replace, and reorganize components without rewriting application code. Source: [README.md](https://github.com/LazyAGI/LazyLLM/blob/main/README.md).

## Agent Subsystem

The `lazyllm/tools/agent/` directory implements LazyLLM's Agent system. Source: [lazyllm/tools/agent/AGENTS.md](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/agent/AGENTS.md).

### Core Agent Types

| File | Agent |
|---|---|
| `base.py` | `LazyLLMAgentBase` — common base for all agents |
| `functionCall.py` | `FunctionCall` / `FunctionCallAgent` — single-turn tool-call execution |
| `reactAgent.py` | `ReactAgent` — ReAct loop agent |
| `planAndSolveAgent.py` | `PlanAndSolveAgent` — plan-then-execute agent |
| `rewooAgent.py` | `ReWOOAgent` — blueprint + evidence + answer agent |
| `toolsManager.py` | `ToolManager`, `ModuleTool`, `register` — tool registration |
| `skill_manager.py` | `SkillManager` — workflow-style skill management |

Source: [lazyllm/tools/agent/AGENTS.md](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/agent/AGENTS.md).

### ReAct Loop Flow

`ReactAgent` wraps `FunctionCall` in a `Loop` with a stop condition. On each iteration the LLM is asked to reason and either emit tool calls or a final string answer:

- The agent builds history messages and injects them into `locals['_lazyllm_agent']['workspace']`. Source: [lazyllm/tools/agent/AGENTS.md](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/agent/AGENTS.md).
- LLM output is parsed in `_post_action`. If `tool_calls` are present, the `ToolManager` executes them and returns a `dict`, continuing the loop. Otherwise a `str` is returned, triggering the loop's stop condition (`isinstance(x, str)`). Source: [lazyllm/tools/agent/AGENTS.md](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/agent/AGENTS.md).

The base agent `LazyLLMAgentBase` accepts parameters for LLM, tools, max retries, streaming, return-trace, skills, memory, sandbox, and file-system access. Source: [lazyllm/tools/agent/base.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/agent/base.py).

### Tool Registration

Tools can be registered by inheriting `ModuleTool` (recommended for complex tools) — the class reads the `apply` method's docstring, type hints, and signature to construct an LLM-callable schema. Source: [lazyllm/tools/agent/toolsManager.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/agent/toolsManager.py).

### Skill Management

`SkillManager` manages reusable workflows ("skills") with the `get_skill` / `read_reference` / `run_script` tool trio. The skill system enforces strict rules: agents must call `get_skill` first to retrieve `SKILL.md`, and reference/script paths must be copied verbatim from the skill's documentation — fabricated paths are forbidden. Source: [lazyllm/tools/agent/skill_manager.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/agent/skill_manager.py).

## Command-Line Interface

The `lazyllm` CLI exposes five subcommands routed in `lazyllm/cli/main.py`:

- `lazyllm install [...]` — install dependencies for a model/project. Source: [lazyllm/cli/main.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/cli/main.py).
- `lazyllm deploy <model> [...]` — deploy an LLM service (e.g., VLLM deployments support restricted parameters; bypass via `LAZYLLM_VLLM_SKIP_CHECK_KW=True`). Source: [lazyllm/cli/README.md](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/cli/README.md).
- `lazyllm run [...]` — run a project.
- `lazyllm skills <list|info|delete|add|import|install> [...]` — manage skills, including installing them into a project or agent. Source: [lazyllm/cli/skills.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/cli/skills.py).
- `lazyllm review --pr <number> [...]` and `lazyllm review-local [...]` — multi-round AI code review for GitHub PRs or local git branches; the local variant diffs against a base branch via `git merge-base` and writes a JSON report. Source: [lazyllm/cli/review.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/cli/review.py).

## RAG and Data Subsystems

LazyLLM integrates a complete RAG stack that includes:

- **Engineering**: Horizontal scaling of RAG modules, multi-knowledge-base Q&A, and LazyRAG integration (V0.7). Source: [README.md](https://github.com/LazyAGI/LazyLLM/blob/main/README.md).
- **Data Capabilities**: Table parsing, CAD image parsing, and pretrain data processing. Source: [README.md](https://github.com/LazyAGI/LazyLLM/blob/main/README.md).
- **Algorithm Capabilities**: Structured-text processing (CSV), multi-hop retrieval, information-conflict handling, and agentic-RL problem solving. Source: [README.md](https://github.com/LazyAGI/LazyLLM/blob/main/README.md).

A typical RAG pipeline uses `Document` with `Retriever`, `Reranker`, and `SentenceSplitter` components wired together through `pipeline` and `parallel` flows. The README demonstrates online deployments combining `OnlineEmbeddingModule` with cosine/B M25 retrievers and a `ModuleReranker`. Source: [README.md](https://github.com/LazyAGI/LazyLLM/blob/main/README.md).

## Prompt Templates and Data Lineage

LazyLLM ships a curated set of prompt templates in `lazyllm/prompt_templates/prompts_actor/`. The project tracks data lineage and licensing for these resources:

- **awesome-chatgpt-prompts-zh.json** (124 Chinese prompts) — MIT licensed, sourced from [PlexPt/awesome-chatgpt-prompts-zh](https://github.com/PlexPt/awesome-chatgpt-prompts-zh), lightly reformatted. Source: [lazyllm/prompt_templates/prompts_actor/README.md](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/prompt_templates/prompts_actor/README.md).
- **prompts.chat.json** (1192 English prompts) — CC0-1.0 licensed, sourced from [f/prompts.chat](https://github.com/f/prompts.chat), with normalization and duplicate removal. Source: [lazyllm/prompt_templates/prompts_actor/README.md](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/prompt_templates/prompts_actor/README.md).

"Lightly modified" in this context means key-name normalization, whitespace fixes, and minor wording adjustments — no wholesale rewriting of original content. Source: [lazyllm/prompt_templates/prompts_actor/README.md](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/prompt_templates/prompts_actor/README.md).

## Roadmap and Recent Milestones

Per the v0.7.1 release notes (current latest stable referenced in community context), recent milestones include:

- **Agent Module Refactor** — major rewrite for maintainability.
- **New storage providers** — Elasticsearch, OceanBase.
- **New online model providers** — SiliconFlow, MiniMax.
- **Comprehensive caching system** for performance gains.
- **Document parsing service** and **startup system** refactors.

Source: [Community release notes](https://github.com/LazyAGI/LazyLLM/releases/tag/v0.7.1).

Open community feature requests (e.g., interleaved text+image content for `OnlineModule(type='image_editing')` in [issue #1035](https://github.com/LazyAGI/LazyLLM/issues/1035)) indicate ongoing evolution of online module capabilities, while documentation build issues (e.g., [issue #655](https://github.com/LazyAGI/LazyLLM/issues/655)) reflect active investment in tutorial and learning material quality.

## See Also

- [Agent System and Tool Registration](agent-system.md)
- [CLI Reference](cli-reference.md)
- [RAG Pipeline Guide](rag-pipeline.md)
- [Module and Flow Reference](module-flow-reference.md)

---

<a id='page-2'></a>

## Components, Modules, and Flows

### Related Pages

Related topics: [LazyLLM Overview and System Architecture](#page-1), [RAG Pipeline, Document Processing, and Stores](#page-3), [Agents, Tools, Memory, and Online Model Integration](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/LazyAGI/LazyLLM/blob/main/README.md)
- [lazyllm/tools/agent/AGENTS.md](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/agent/AGENTS.md)
- [lazyllm/tools/agent/reactAgent.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/agent/reactAgent.py)
- [lazyllm/tools/agent/toolsManager.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/agent/toolsManager.py)
- [lazyllm/tools/agent/file_tool.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/agent/file_tool.py)
- [lazyllm/tools/agent/skill_manager.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/agent/skill_manager.py)
- [lazyllm/tools/agent/skill_hub.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/agent/skill_hub.py)
- [lazyllm/components/utils/downloader/model_mapping.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/components/utils/downloader/model_mapping.py)
- [lazyllm/cli/main.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/cli/main.py)
- [lazyllm/cli/review.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/cli/review.py)
- [lazyllm/cli/README.md](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/cli/README.md)
</details>

# Components, Modules, and Flows

## Overview

LazyLLM is a framework for building AI applications by composing reusable units. The project organizes its building blocks into three primary abstractions: **Components** (low-level utilities such as model downloaders and prompt templates), **Modules** (high-level wrappers that encapsulate models, services, and callable logic), and **Flows** (data-stream primitives that connect Modules and Components into executable graphs). Together they let developers "wrap functions, modules, flows, etc., into a Module" and assemble multi-agent applications with a Lego-like experience ([README.md](README.md)).

The framework emphasizes four goals that shape its design ([README.md](README.md)):

- **Convenient assembly** — pipelines can be expressed declaratively with Flows.
- **One-click deployment** — Modules can be promoted to services without rewriting.
- **Cross-platform compatibility** — the same code runs on bare-metal, Slurm, and public clouds.
- **Unified experience** — online and local model providers share a single interface.

## Module Hierarchy

Modules in LazyLLM are typed wrappers. The README documents the canonical set and the capabilities each one offers ([README.md](README.md)):

| Module | Purpose | Training | Fine-tune | Deploy |
|--------|---------|----------|-----------|--------|
| UrlModule | Wraps any URL into a Module to access external services | ❌ | ❌ | ✅ |
| ServerModule | Wraps any function, flow, or Module into an API service | ❌ | ✅ | ✅ |
| TrainableModule | Trainable Module; all supported models are TrainableModules | ✅ | ✅ | ✅ |
| WebModule | Launches a multi-round dialogue interface service | ❌ | ✅ | ❌ |
| OnlineChatModule | Integrates online model fine-tuning and inference services | ✅ | ✅ | ✅ |
| OnlineEmbeddingModule | Integrates online Embedding model inference services | ❌ | ✅ | ✅ |

These Modules are composed of lower-level **Components**, such as `model_mapping.py`, which maps model identifiers to Hugging Face / ModelScope namespaces and to model-specific prompt keys (`sos`, `soh`, `soa`, `stop_words`, etc.) for chat-template construction ([lazyllm/components/utils/downloader/model_mapping.py](lazyllm/components/utils/downloader/model_mapping.py)). For example, the `deepseek` entry defines `sos: '<｜begin▁of▁sentence｜>'` and `stop_words: ['<｜end▁of▁sentence｜>']` so that prompts and stop tokens are produced automatically ([lazyllm/components/utils/downloader/model_mapping.py](lazyllm/components/utils/downloader/model_mapping.py)).

## Flow System

A Flow is a data-stream primitive: it describes how a value is passed from one callable object to another. According to the project README, LazyLLM ships with **Pipeline, Parallel, Diverter, Warp, IFS, and Loop** flows, which together "can cover almost all application scenarios" ([README.md](README.md)). Flows are the mechanism by which complex graphs are assembled from Modules and Components without manual plumbing.

The `Loop` primitive is also the workhorse of the agent system: `ReactAgent` wraps `FunctionCall` inside a `Loop`, with a stop condition that fires when `FunctionCall` returns a `str` (final answer) instead of a `dict` (tool calls) ([lazyllm/tools/agent/AGENTS.md](lazyllm/tools/agent/AGENTS.md)). This means the same Loop abstraction is reused for both data-flow graphs and agent reasoning loops.

## Agent Subsystem

Agents are first-class Modules that combine an LLM with a tool registry. The framework ships four agent implementations, each suited to a different reasoning style ([lazyllm/tools/agent/AGENTS.md](lazyllm/tools/agent/AGENTS.md)):

- `ReactAgent` — Reason→Act→Observe loop; the default for general multi-step tool use.
- `PlanAndSolveAgent` — Planner decomposes a task; Solver executes the plan.
- `ReWOOAgent` — Planner emits a blueprint; Workers collect evidence in parallel; Solver returns the answer.
- `FunctionCallAgent` — Deprecated single-shot tool caller; superseded by `ReactAgent`.

All four share `FunctionCall` as their inner execution unit. A single round follows this pattern ([lazyllm/tools/agent/AGENTS.md](lazyllm/tools/agent/AGENTS.md)):

```mermaid
flowchart TD
    A[Input] --> B[_build_history]
    B --> C[LLM reasoning]
    C --> D{tool_calls?}
    D -- yes --> E[ToolManager.execute]
    E --> F[dict: continue Loop]
    D -- no --> G[str: stop Loop]
```

The `ReactAgent` prompt template encodes the same loop explicitly: "Reason → Act → Observe → Reflect", with a hard rule of "at most one tool per action step" and a final-answer rule that breaks out of the loop ([lazyllm/tools/agent/reactAgent.py](lazyllm/tools/agent/reactAgent.py)). A `_FORCE_SUMMARIZE_MSG` is injected when the agent exhausts `max_retries`, telling the LLM to "Stop calling tools now and provide your final answer immediately" ([lazyllm/tools/agent/reactAgent.py](lazyllm/tools/agent/reactAgent.py)).

Tools are registered through `ModuleTool` and `ToolManager`. `ModuleTool` parses the function's docstring and type hints to build a Pydantic schema for the LLM, raising an error if the docstring return type and the Python return annotation disagree ([lazyllm/tools/agent/toolsManager.py](lazyllm/tools/agent/toolsManager.py)). When variable-argument functions are used, the schema falls back to the docstring types rather than the runtime signature ([lazyllm/tools/agent/toolsManager.py](lazyllm/tools/agent/toolsManager.py)). Built-in tools such as `write_file` are registered through the `@register('builtin_tools', ...)` decorator ([lazyllm/tools/agent/file_tool.py](lazyllm/tools/agent/file_tool.py)).

Beyond built-in tools, users can install external **Skills** from GitHub with `install_skill` ([lazyllm/tools/agent/skill_hub.py](lazyllm/tools/agent/skill_hub.py)). The skill hub fetches the repository file tree via the Git Trees API, locates a `SKILL.md`, and exposes the skill's workflow to the agent. The skill manager's prompt enforces a strict prerequisite: `read_reference` and `run_script` may only be called after the agent has fetched the skill's `SKILL.md`, and `rel_path` values must be copied verbatim from that file ([lazyllm/tools/agent/skill_manager.py](lazyllm/tools/agent/skill_manager.py)).

## CLI Surface

The framework exposes a unified CLI for the full lifecycle. `lazyllm deploy` starts a model service (e.g. `lazyllm deploy llama2 --tp=2`) ([lazyllm/cli/README.md](lazyllm/cli/README.md)), and the top-level dispatcher routes `install`, `deploy`, `run`, `skills`, `review`, and `review-local` subcommands ([lazyllm/cli/main.py](lazyllm/cli/main.py)). The review subcommand performs multi-round AI code review on a local repository, diffing the current branch against a base using `git merge-base` and writing the result to JSON ([lazyllm/cli/review.py](lazyllm/cli/review.py)).

## Community Notes

- **Feature parity for image editing** — Issue #1035 reports that `OnlineModule(type='image_editing')` lacks interleaved text+image content support, an example of the kind of capability gap that flows through the OnlineChatModule/OnlineEmbeddingModule table above.
- **Documentation rendering bugs** — Issue #655 notes that several tutorial page headings fail to compile, which has a direct impact on discoverability of the Flow and Module APIs documented here.
- **Release v0.7.1** — The release notes flag a major Agent-module refactor — a relevant heads-up for anyone tracking the `ReactAgent` / `FunctionCall` code paths cited above.

## See Also

- [README.md](https://github.com/LazyAGI/LazyLLM/blob/main/README.md) — top-level project overview.
- [lazyllm/tools/agent/AGENTS.md](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/agent/AGENTS.md) — agent internals.

---

<a id='page-3'></a>

## RAG Pipeline, Document Processing, and Stores

### Related Pages

Related topics: [LazyLLM Overview and System Architecture](#page-1), [Components, Modules, and Flows](#page-2), [Agents, Tools, Memory, and Online Model Integration](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [lazyllm/tools/rag/document.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/rag/document.py)
- [lazyllm/tools/rag/retriever.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/rag/retriever.py)
- [lazyllm/tools/rag/rerank.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/rag/rerank.py)
- [lazyllm/tools/rag/doc_node.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/rag/doc_node.py)
- [lazyllm/tools/rag/doc_impl.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/rag/doc_impl.py)
- [lazyllm/tools/rag/default_index.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/rag/default_index.py)
- [README.md](https://github.com/LazyAGI/LazyLLM/blob/main/README.md)
- [examples/rag_with_parsing_service/README.md](https://github.com/LazyAGI/LazyLLM/blob/main/examples/rag_with_parsing_service/README.md)
</details>

# RAG Pipeline, Document Processing, and Stores

## Overview

LazyLLM provides a first-class **Retrieval-Augmented Generation (RAG)** stack that combines a `Document` index, pluggable **node groups** (splits), **Retrievers**, and **Rerankers** into a `Flow`-compatible pipeline. The RAG subsystem targets three goals: (1) support 20+ splitting strategies and many document types, (2) horizontally scale across multiple knowledge bases and machines, and (3) integrate at least one open-source knowledge-graph framework. Source: [README.md]()

The v0.7.1 release expanded the storage ecosystem with **Elasticsearch** and **OceanBase** backends, added **SiliconFlow** and additional online providers, and refactored the **document parsing service** and **launcher** systems for better maintainability. The release also introduced a comprehensive caching layer that accelerates repeated RAG queries. Source: [README.md]()

## Architecture and Data Flow

A RAG application in LazyLLM is composed of four cooperating layers:

```mermaid
flowchart LR
    A[Raw files / URL] --> B[Document Parser]
    B --> C[Node Groups<br/>Sentences / CoarseChunk / KB]
    C --> D[Embedding / BM25 Index]
    D --> E[Retriever]
    E --> F[Reranker]
    F --> G[LLM Prompt + Answer]
```

- **Document** owns the dataset, parsers, and one or more **node groups** Source: [lazyllm/tools/rag/document.py]()
- **Node groups** are transformed views of the document (e.g. `Sentences`, `CoarseChunk`, knowledge-graph triples) Source: [lazyllm/tools/rag/doc_node.py]()
- **Retriever** is a callable that queries a node group with a similarity function Source: [lazyllm/tools/rag/retriever.py]()
- **Reranker** reorders retrieved nodes before they are passed to the LLM Source: [lazyllm/tools/rag/rerank.py]()

A canonical end-to-end pipeline (from the project README) wires these layers with `pipeline` and `parallel` Flows:

```python
import lazyllm
from lazyllm import pipeline, parallel, bind, SentenceSplitter, Document, Retriever, Reranker

documents = Document(
    dataset_path="your data path",
    embed=lazyllm.OnlineEmbeddingModule(),
    manager=False,
)
documents.create_node_group(
    name="sentences",
    transform=SentenceSplitter,
    chunk_size=1024,
    chunk_overlap=100,
)

with pipeline() as ppl:
    with parallel().sum as ppl.prl:
        prl.retriever1 = Retriever(documents, group_name="sentences",
                                   similarity="cosine", topk=3)
        prl.retriever2 = Retriever(documents, "CoarseChunk",
                                   "bm25_chinese", 0.003, topk=3)
    ppl.reranker = Reranker("ModuleReranker", model="bge-reranker-large", topk=1) \
                   | bind(query=ppl.input)
    ppl.formatter = (lambda nodes, query: dict(
        context_str="".join([node.get_content() for node in nodes]), query=query)) \
        | bind(query=ppl.input)
    ppl.llm = lazyllm.OnlineChatModule(stream=False).prompt(
        lazyllm.ChatPrompter(prompt, extra_keys=["context_str"]))
```

Source: [README.md:0-0]()

## Document Processing

`Document` is the central entry point. It accepts a `dataset_path` (a local directory or a URL when used in **client mode**), an `embed` module, and an optional `manager` flag. The manager flag controls whether a built-in `DocServer` and UI are spawned. Source: [examples/rag_with_parsing_service/README.md]()

### Node groups and splitting strategies

LazyLLM exposes splitting strategies through `transform` callables. The default `SentenceSplitter` accepts `chunk_size` and `chunk_overlap`. Beyond sentence-level splits the system supports structured strategies such as `CoarseChunk` (used for BM25 retrieval in the demo) and a knowledge-graph extractor, with the stated goal of supporting "no less than 20 types" of splitters across the v0.6–v0.8 roadmap. Source: [README.md]()

### Standalone parsing service

For high-throughput or multi-process deployments, the parser can be detached into a service. `DocumentProcessor(url=...)` points a `Document` at a remote parser, disables local file-change monitoring, and requires a persistent `store_conf` (a pure in-memory map store cannot be shared across processes — use OpenSearch, Milvus, Elasticsearch, OceanBase, etc.). The example ships three scripts:

| Script | Purpose |
| --- | --- |
| `server_with_worker.py` | Run parser server + worker in one process |
| `server_and_separate_workers.py` | Run parser server; start workers separately via `DocumentProcessorWorker` |
| `document.py` | Register a `Document` with the parsing service |
| `retriever_using_url.py` | Query the document remotely via its URL |

Source: [examples/rag_with_parsing_service/README.md]()

### Embedding and online modules

`Document` accepts any callable that conforms to the embedding contract. `OnlineEmbeddingModule` is the zero-setup choice, and additional online providers (SiliconFlow, etc.) were added in v0.7.1. Source: [README.md]()

## Stores and Indexes

Stores hold both **raw segments** and **indexed vectors**. The default indexer is implemented in `default_index.py` and supports vector similarity, BM25 keyword search, and knowledge-graph lookups. Source: [lazyllm/tools/rag/default_index.py]()

| Backend | Type | Use case |
| --- | --- | --- |
| Map (in-memory) | Vector / segment | Single-process demos; not for shared deployments |
| Milvus | Vector | Production vector search |
| OpenSearch | Vector + keyword | Hybrid search in distributed setups |
| Elasticsearch | Vector + keyword | Added in v0.7.1; horizontal scaling |
| OceanBase | Vector + keyword | Added in v0.7.1; SQL-compatible hybrid store |

Source: [README.md](), [examples/rag_with_parsing_service/README.md]()

A common pitfall: when `manager=False` is combined with a remote parser, `store_conf` **must not** be a pure map store, because map stores have no persistence and cannot be shared across processes. Source: [examples/rag_with_parsing_service/README.md]()

## Retrievers and Rerankers

`Retriever(documents, group_name, similarity, topk)` queries a single node group. The `similarity` argument selects the algorithm — `"cosine"` for dense vectors, `"bm25_chinese"` (or `"bm25"`) for keyword search, plus a similarity threshold such as `0.003`. Multiple retrievers can be combined in `parallel().sum` to merge their hits. Source: [lazyllm/tools/rag/retriever.py](), [README.md]()

`Reranker(name, model, topk)` wraps a model-based reranker. `ModuleReranker` uses a HuggingFace-compatible model such as `bge-reranker-large`; other registered backends plug in custom scorers. Because Rerankers accept and return node lists, they slot directly into a `pipeline` and can be `bind`-ed to the user query. Source: [lazyllm/tools/rag/rerank.py]()

The v0.7.1 release also extended the RAG module with multi-hop retrieval (following links and references inside documents), information-conflict handling, AI Writer, and AI Review capabilities — these are exposed as additional retriever/reasoning components on top of the core pipeline. Source: [README.md]()

## Common Failure Modes and Gotchas

- **Map store in distributed mode** — causes silent data loss across workers; switch to Milvus, OpenSearch, Elasticsearch, or OceanBase. Source: [examples/rag_with_parsing_service/README.md]()
- **Parser URL unreachable** — when `DocumentProcessor(url=...)` cannot reach the parser, registration and `dataset_path` monitoring are disabled; verify the URL and that the worker has started. Source: [examples/rag_with_parsing_service/README.md]()
- **Splitter mismatch** — calling `Retriever` with a `group_name` that does not exist on the `Document` raises immediately; always create the node group with `create_node_group` first. Source: [lazyllm/tools/rag/document.py]()
- **Top-K tuning** — dense and BM25 retrievers typically return overlapping but non-identical hits; merging via `parallel().sum` improves recall but inflates tokens, so set `topk` on the reranker to keep the prompt bounded. Source: [README.md]()

## See Also

- [Agents, Tools, and Skills](agents-tools-and-skills.md) — the agent layer that often consumes RAG retrievers as tools.
- [CLI and Deployment](cli-and-deployment.md) — `lazyllm deploy` and the `install` / `run` / `skills` commands for packaging RAG services.
- [Flows and Modules](flows-and-modules.md) — `pipeline`, `parallel`, `bind`, and the `Module` table that defines `TrainableModule`, `OnlineChatModule`, etc.

---

<a id='page-4'></a>

## Agents, Tools, Memory, and Online Model Integration

### Related Pages

Related topics: [LazyLLM Overview and System Architecture](#page-1), [Components, Modules, and Flows](#page-2), [RAG Pipeline, Document Processing, and Stores](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [lazyllm/tools/agent/AGENTS.md](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/agent/AGENTS.md)
- [lazyllm/tools/agent/reactAgent.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/agent/reactAgent.py)
- [lazyllm/tools/agent/functionCall.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/agent/functionCall.py)
- [lazyllm/tools/agent/toolsManager.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/agent/toolsManager.py)
- [lazyllm/tools/agent/skill_manager.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/tools/agent/skill_manager.py)
- [lazyllm/cli/main.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/cli/main.py)
- [lazyllm/cli/skills.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/cli/skills.py)
- [lazyllm/cli/README.md](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/cli/README.md)
- [lazyllm/components/utils/downloader/model_mapping.py](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/components/utils/downloader/model_mapping.py)
- [README.md](https://github.com/LazyAGI/LazyLLM/blob/main/README.md)
- [lazyllm/prompt_templates/prompts_actor/README.md](https://github.com/LazyAGI/LazyLLM/blob/main/lazyllm/prompt_templates/prompts_actor/README.md)
</details>

# Agents, Tools, Memory, and Online Model Integration

## Overview

LazyLLM exposes a unified Agent surface that combines a set of reusable reasoning loops, a registry-based tool system, persistent memory and skills, and a pluggable online model layer. According to the project README, the framework targets "convenient AI application assembly" with one-click deployment and a consistent user experience across locally deployed and online models. Source: [README.md:8-22]().

The release notes for v0.7.1 highlight a "major change: Agent module refactor" and additions of new online model providers such as SiliconFlow and MiniMax, together with a comprehensive caching system. Source: [README.md](https://github.com/LazyAGI/LazyLLM/releases/tag/v0.7.1). Community issue #1035 reports that `OnlineModule(type='image_editing')` does not yet support interleaved text+image content, illustrating a known limitation of the online model integration layer.

## Agent System

The Agent subsystem is implemented under `lazyllm/tools/agent/`. Per the directory's AGENTS guide, every concrete Agent inherits from `LazyLLMAgentBase` and delegates a single "reason + tool call" round to `FunctionCall`. Source: [lazyllm/tools/agent/AGENTS.md:30-40]().

Four Agent classes ship out of the box:

| Agent | Working method | Typical use case |
|-------|----------------|------------------|
| `ReactAgent` | Reason → Act → Observe loop until final answer | Multi-step tasks with tool use |
| `PlanAndSolveAgent` | Planner decomposes subtasks; Solver executes | Tasks needing upfront planning |
| `ReWOOAgent` | Planner generates a blueprint; Worker gathers evidence; Solver answers | Parallelizable evidence collection |
| `FunctionCallAgent` | Direct tool selection (deprecated, prefer `ReactAgent`) | Simple tool calls |

Source: [lazyllm/tools/agent/AGENTS.md:50-62]().

`ReactAgent` wraps `FunctionCall` in a `Loop`, stopping when the output becomes a `str` (the final answer) and continuing while it remains a `dict` containing `tool_calls`. Source: [lazyllm/tools/agent/AGENTS.md:18-30](). The class prompt explicitly enforces "use at most one tool per action step" and "do not call any tools after you already have enough information to answer." Source: [lazyllm/tools/agent/reactAgent.py:1-60]().

The execution flow for one round is:

```mermaid
flowchart TD
    A[input] --> B[_build_history<br/>injects workspace locals]
    B --> C[LLM reasoning]
    C --> D{has tool_calls?}
    D -- yes --> E[ToolManager._execute_tool]
    E --> F[returns dict<br/>continue Loop]
    D -- no --> G[returns str<br/>stop Loop]
```

Conversation history is stored in `locals['_lazyllm_agent']['workspace']` rather than instance attributes so that concurrent requests do not leak history across users. Source: [lazyllm/tools/agent/AGENTS.md:30-46]().

## Tools and Skills

The `ToolManager` owns tool registration, schema generation, and execution. It wraps user tools in `ModuleTool` and generates an OpenAI function-calling `tools_description` from each tool's docstring. Source: [lazyllm/tools/agent/AGENTS.md:96-118]().

A tool's docstring must follow a strict format — first-line short description, an `Args:` block, type annotations, and a `Returns:` block — or the LLM cannot generate a valid schema. Source: [lazyllm/tools/agent/AGENTS.md:78-94](). Tools can be registered either by inheriting `ModuleTool` or by passing plain callables, and they live in a temporary group (`tmp_tool`) that is discarded after the call.

Complementing transient tools, `SkillManager` provides persistent, named skills that an Agent can recall mid-conversation. Source: [lazyllm/tools/agent/skill_manager.py:1-30](). The `skill_manager` prompt mandates a strict prerequisite: an Agent must call `get_skill` to load `SKILL.md` *before* using `read_reference` or `run_script`, and the `rel_path` argument must be copied verbatim from that document — fabrication is explicitly forbidden. Source: [lazyllm/tools/agent/skill_manager.py:14-42]().

The CLI exposes skill operations through `lazyllm skills ...`, supporting `init`, `list`, `info`, `add`, `delete`, `import`, and `install --agent`. Source: [lazyllm/cli/skills.py:1-40](). Top-level commands such as `install`, `deploy`, `run`, `skills`, `review`, and `review-local` are dispatched in `lazyllm/cli/main.py:18-32`. The deploy subcommand can launch local model servers (for example via vLLM with tensor parallelism), and is governed by an allow-list governed by `LAZYLLM_VLLM_SKIP_CHECK_KW`. Source: [lazyllm/cli/README.md:1-40]().

## Online Model Integration

LazyLLM unifies locally trained and hosted models behind the same `Module` API. The README documents `OnlineChatModule` (integrates online model fine-tuning and inference) and `OnlineEmbeddingModule` (online embedding inference), both of which support training, inference, deployment, and serving in the same way as their local counterparts. Source: [README.md:58-72]().

Per-model prompt tokens are stored in `model_mapping.py`, which defines `prompt_keys` (such as `sos`, `soh`, `soa`, `eoa`, `stop_words`, and `system`) for families including `internlm`, `internlm2`, `chatglm3`, `glm-4`, `baichuan2`, `deepseek`, and Llama-3. Source: [lazyllm/components/utils/downloader/model_mapping.py:1-40]().

Online provider configuration is sourced from `~/.lazyllm/config.json` or environment variables such as `LAZYLLM_OPENAI_API_KEY`, as shown in the chatbot example in the README. Source: [README.md:30-50](). Memory itself is delivered as a built-in functional module that "supports memory capabilities," listed under Feature Modules. Source: [README.md:96-108]().

The prompt library shipped at `lazyllm/prompt_templates/prompts_actor/` aggregates 124 Chinese prompts from `awesome-chatgpt-prompts-zh` (MIT) and 1192 English prompts from `prompts.chat` (CC0-1.0), lightly normalized to fit the project's schema. Source: [lazyllm/prompt_templates/prompts_actor/README.md:1-26]().

## Common Pitfalls

- **Bad tool docstrings.** Tools without a properly structured `Args:` block cannot be invoked correctly by the LLM. Source: [lazyllm/tools/agent/AGENTS.md:90-94]().
- **Fabricated skill paths.** `read_reference` and `run_script` must use paths copied verbatim from `SKILL.md`; any fabricated path violates the skill protocol. Source: [lazyllm/tools/agent/skill_manager.py:18-30]().
- **Online module limitations.** `OnlineModule(type='image_editing')` does not yet accept interleaved text+image content — see community issue #1035.
- **vLLM parameter gating.** Custom vLLM flags are rejected unless `LAZYLLM_VLLM_SKIP_CHECK_KW=True` is exported. Source: [lazyllm/cli/README.md:14-30]().

## See Also

- [Modules and Flows](README.md)
- [RAG and Document Parsing](lazyllm/tools/agent/AGENTS.md)
- [CLI Reference](lazyllm/cli/README.md)

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: LazyAGI/LazyLLM

Summary: Found 6 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Capability evidence risk - Capability evidence risk requires verification.

## 1. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/LazyAGI/LazyLLM

## 2. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/LazyAGI/LazyLLM

## 3. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/LazyAGI/LazyLLM

## 4. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/LazyAGI/LazyLLM

## 5. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/LazyAGI/LazyLLM

## 6. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/LazyAGI/LazyLLM

<!-- canonical_name: LazyAGI/LazyLLM; human_manual_source: deepwiki_human_wiki -->
