# https://github.com/run-llama/llama_index 项目说明书

生成时间：2026-05-15 21:38:00 UTC

## 目录

- [Introduction to LlamaIndex](#introduction)
- [Quick Start Guide](#quickstart)
- [Core Architecture](#core-architecture)
- [Integration Architecture](#integration-architecture)
- [Documents and Nodes](#documents-nodes)
- [Storage Systems](#storage-systems)
- [Query Engines](#query-engines)
- [Retrieval and Reranking](#retrieval-reranking)
- [Agent Framework](#agent-framework)
- [Memory Systems](#memory-systems)

<a id='introduction'></a>

## Introduction to LlamaIndex

### 相关页面

相关主题：[Core Architecture](#core-architecture), [Quick Start Guide](#quickstart)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [README.md](https://github.com/run-llama/llama_index/blob/main/README.md)
- [llama-index-integrations/indices/llama-index-indices-managed-vectara/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/indices/llama-index-indices-managed-vectara/README.md)
- [llama-index-integrations/llms/llama-index-llms-contextual/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-contextual/README.md)
- [llama-index-integrations/readers/llama-index-readers-docling/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/readers/llama-index-readers-docling/README.md)
- [llama-index-integrations/readers/llama-index-readers-wikipedia/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/readers/llama-index-readers-wikipedia/README.md)
- [llama-index-integrations/readers/llama-index-readers-web/llama_index/readers/web/whole_site/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/readers/llama-index-readers-web/llama_index/readers/web/whole_site/README.md)
</details>

# Introduction to LlamaIndex

LlamaIndex is a comprehensive data framework designed for building LLM (Large Language Model) applications. It provides the essential tools, abstractions, and integrations needed to connect custom data sources to LLMs for retrieval-augmented generation (RAG), question-answering systems, and other AI-powered applications.

## Overview

LlamaIndex serves as the foundational layer for building AI applications that require sophisticated data ingestion, indexing, and querying capabilities. The framework enables developers to:

- Ingest data from various sources (PDFs, documents, websites, databases)
- Process and chunk data into optimal segments for LLM consumption
- Create vector indices for efficient semantic search
- Build query engines and retrieval pipelines
- Integrate with hundreds of external services and model providers

资料来源：[README.md:1-20](https://github.com/run-llama/llama_index/blob/main/README.md)

## Core Architecture

The LlamaIndex framework follows a modular architecture with distinct components that work together to provide end-to-end data pipeline capabilities.

### Package Structure

LlamaIndex offers two primary installation methods to accommodate different use cases:

| Package | Description | Use Case |
|---------|-------------|----------|
| `llama-index` | Starter package with core + selected integrations | Quick start, common setups |
| `llama-index-core` | Core package only | Custom, minimal deployments |

资料来源：[README.md:45-55](https://github.com/run-llama/llama_index/blob/main/README.md)

### Import Patterns

The framework uses a namespaced import system that distinguishes between core modules and integration packages:

```python
# Core modules (included in llama-index-core)
from llama_index.core.xxx import ClassABC

# Integration modules (from separate packages)
from llama_index.xxx.yyy import SubclassABC

# Concrete examples
from llama_index.core.llms import LLM
from llama_index.llms.openai import OpenAI
```

资料来源：[README.md:56-68](https://github.com/run-llama/llama_index/blob/main/README.md)

## Data Flow Architecture

The following diagram illustrates the typical data flow in a LlamaIndex application:

```mermaid
graph TD
    A[Data Sources] --> B[Readers/Loaders]
    B --> C[Documents]
    C --> D[Node Parsers]
    D --> E[Nodes/Chunks]
    E --> F[Vector Index]
    F --> G[Retriever]
    G --> H[Query Engine]
    H --> I[LLM Response]
    
    A1[Web Pages] --> B
    A2[PDFs] --> B
    A3[Databases] --> B
    A4[APIs] --> B
```

## Key Components

### 1. Document Loaders

Document loaders (Readers) are responsible for ingesting data from external sources. LlamaIndex provides a vast ecosystem of readers:

| Reader | Purpose | Source |
|--------|---------|--------|
| `WikipediaReader` | Load Wikipedia pages | [llama-index-readers-wikipedia](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/readers/llama-index-readers-wikipedia/README.md) |
| `WholeSiteReader` | Scrape entire websites | [llama-index-readers-web](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/readers/llama-index-readers-web/llama_index/readers/web/whole_site/README.md) |
| `DoclingReader` | Parse PDFs, DOCX, HTML | [llama-index-readers-docling](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/readers/llama-index-readers-docling/README.md) |
| `RemoteDepthReader` | Extract from URLs recursively | [llama-index-readers-remote-depth](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/readers/llama-index-readers-remote-depth/README.md) |

#### Wikipedia Reader Example

```python
from llama_index.readers.wikipedia import WikipediaReader

reader = WikipediaReader()
documents = reader.load_data(pages=["Page Title 1", "Page Title 2"])
```

资料来源：[llama-index-readers-wikipedia/README.md:1-25](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/readers/llama-index-readers-wikipedia/README.md)

#### Docling Reader Example

```python
from llama_index.readers.docling import DoclingReader

reader = DoclingReader()
docs = reader.load_data(file_path="https://arxiv.org/pdf/2408.09869")
```

资料来源：[llama-index-readers-docling/README.md:1-30](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/readers/llama-index-readers-docling/README.md)

### 2. Indices

Indices organize documents for efficient retrieval. LlamaIndex supports both managed indices and customizable self-hosted options.

#### Managed Indices

Managed indices like `VectaraIndex` provide fully hosted solutions:

```python
from llama_index.indices.managed.vectara import VectaraIndex
from llama_index.core.schema import Document, MediaResource

docs = [
    Document(
        id_="doc1",
        text_resource=MediaResource(
            text="This is test text for Vectara integration.",
        ),
    ),
]
index = VectaraIndex.from_documents(docs)
```

资料来源：[llama-index-indices-managed-vectara/README.md:30-50](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/indices/llama-index-indices-managed-vectara/README.md)

### 3. LLM Integrations

LlamaIndex provides integrations with numerous LLM providers through a standardized interface:

```python
# Example: Contextual LLM Integration
from llama_index.llms.contextual import Contextual

llm = Contextual(model="contextual-clm", api_key="your_api_key")
response = llm.complete("Explain the importance of Grounded Language Models.")
```

资料来源：[llama-index-llms-contextual/README.md:1-20](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-contextual/README.md)

## Usage Patterns

### Building a Simple RAG Pipeline

The most common pattern involves loading documents, creating an index, and querying it:

```python
from llama_index.core import VectorStoreIndex
from llama_index.readers.docling import DoclingReader

# Step 1: Load documents
reader = DoclingReader()
documents = reader.load_data(file_path="document.pdf")

# Step 2: Create index
index = VectorStoreIndex.from_documents(documents)

# Step 3: Query
query_engine = index.as_query_engine()
response = query_engine.query("Summarize this document")
```

### Retrieval-Only Pattern

For applications requiring only retrieval without generation:

```python
retriever = index.as_retriever(similarity_top_k=2)
results = retriever.retrieve("How will users feel about this new tool?")
```

资料来源：[llama-index-indices-managed-vectara/README.md:50-65](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/indices/llama-index-indices-managed-vectara/README.md)

### LangChain Integration

LlamaIndex components can be used as tools within LangChain agents:

```python
from langchain.agents import initialize_agent, Tool
from langchain.llms import OpenAI
from llama_index.core import VectorStoreIndex
from llama_index.readers.web import WholeSiteReader

# Initialize scraper
scraper = WholeSiteReader(prefix="https://docs.llamaindex.ai/en/stable/", max_depth=10)
documents = scraper.load_data(base_url="https://docs.llamaindex.ai/en/stable/")

# Create index
index = VectorStoreIndex.from_documents(documents)

# Define tools
tools = [
    Tool(
        name="Website Index",
        func=lambda q: index.query(q),
        description="Useful for answering questions about text on websites.",
    ),
]
```

资料来源：[llama-index-readers-web/llama_index/readers/web/whole_site/README.md:1-40](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/readers/llama-index-readers-web/llama_index/readers/web/whole_site/README.md)

## LlamaParse Platform

**LlamaParse** is a complementary platform (separate from the open-source LlamaIndex framework) focused on document agents and agentic OCR:

| Component | Function |
|-----------|----------|
| **Parse** | Agentic OCR and document parsing (130+ formats) |
| **Extract** | Structured data extraction from documents |
| **Index** | Ingest, index, and RAG pipelines |
| **Split** | Split large documents into subcategories |

资料来源：[README.md:75-85](https://github.com/run-llama/llama_index/blob/main/README.md)

## Ecosystem Overview

LlamaIndex maintains an extensive ecosystem with over 300 integration packages available through LlamaHub:

```mermaid
graph LR
    subgraph "Data Sources"
        Web[Web]
        PDFs[PDFs]
        DB[Databases]
        APIs[APIs]
    end
    
    subgraph "LlamaIndex Core"
        Docs[Documents]
        Nodes[Nodes]
        Indices[Indices]
    end
    
    subgraph "LLM Providers"
        OpenAI[OpenAI]
        HuggingFace[HF]
        Local[Local Models]
    end
    
    Web --> Docs
    PDFs --> Docs
    DB --> Docs
    APIs --> Docs
    Docs --> Indices
    Indices --> OpenAI
    Indices --> HuggingFace
    Indices --> Local
```

## Configuration Options

### Common Reader Configuration Parameters

| Parameter | Type | Description | Example |
|-----------|------|-------------|---------|
| `file_path` | str | Path to input file/URL | `"document.pdf"` |
| `prefix` | str | URL prefix for filtering | `"https://example.com/"` |
| `max_depth` | int | Maximum recursion depth | `10` |
| `where` | dict | Metadata filter condition | `{"category": "AI"}` |
| `query` | list | Search query text | `["search term"]` |

资料来源：[llama-index-readers-chroma/README.md:1-20](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/readers/llama-index-readers-chroma/README.md)

## Installation

### Quick Start (Recommended)

```bash
pip install llama-index
```

### Minimal Installation

```bash
pip install llama-index-core
```

### Individual Integrations

```bash
pip install llama-index-readers-wikipedia
pip install llama-index-readers-docling
pip install llama-index-llms-openai
```

## Citation

If you use LlamaIndex in academic work, cite as:

```
@software{Liu_LlamaIndex_2022,
author = {Liu, Jerry},
doi = {10.5281/zenodo.1234},
month = {11},
title = {{LlamaIndex}},
url = {https://github.com/jerryjliu/llama_index},
year = {2022}
}
```

资料来源：[README.md:95-105](https://github.com/run-llama/llama_index/blob/main/README.md)

## Next Steps

To continue learning LlamaIndex:

1. **Getting Started** - Follow the [starter example](https://docs.llamaindex.ai/en/stable/getting_started/starter_example.html)
2. **Concepts** - Understand core concepts like Documents, Nodes, and Indices
3. **LlamaHub** - Browse [300+ integrations](https://llamahub.ai/) for various data sources and LLM providers
4. **Examples** - Explore [Jupyter notebooks](https://github.com/run-llama/llama_index/blob/main/docs/examples/) for detailed use cases

---

<a id='quickstart'></a>

## Quick Start Guide

### 相关页面

相关主题：[Introduction to LlamaIndex](#introduction), [Documents and Nodes](#documents-nodes)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [llama-dev/README.md](https://github.com/run-llama/llama_index/blob/main/llama-dev/README.md)
- [llama-index-integrations/llms/llama-index-llms-ollama/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-ollama/README.md)
- [llama-index-integrations/llms/llama-index-llms-mistralai/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-mistralai/README.md)
- [llama-index-integrations/llms/llama-index-llms-konko/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-konko/README.md)
- [llama-index-integrations/readers/llama-index-readers-web/llama_index/readers/web/whole_site/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/readers/llama-index-readers-web/llama_index/readers/web/whole_site/README.md)
- [llama-index-integrations/llms/llama-index-llms-modelscope/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-modelscope/README.md)
</details>

# Quick Start Guide

This guide provides a comprehensive introduction to getting started with LlamaIndex, covering environment setup, core installation methods, and essential development workflows.

## Prerequisites

Before beginning, ensure your environment meets the following requirements:

| Requirement | Version/Details |
|-------------|-----------------|
| Python | 3.8 or higher |
| Package Manager | `uv` (recommended) or `pip` |
| Operating System | Unix-like (Linux, macOS), Windows with WSL |
| Git | Latest stable version |

## Environment Setup

### Creating a Virtual Environment

LlamaIndex recommends using `uv` for dependency management. Create a virtual environment as follows:

```bash
uv venv
source .venv/bin/activate
```

资料来源：[llama-dev/README.md:11]()

### Installing the Development CLI

The `llama-dev` CLI tool is the official command-line interface for development, testing, and automation in the LlamaIndex monorepo.

Install it in editable mode:

```bash
uv pip install -e .
```

After installation, verify the CLI is available:

```bash
llama-dev --help
```

资料来源：[llama-dev/README.md:12-18]()

## Core Concepts

```mermaid
graph TD
    A[LlamaIndex Project] --> B[Core Package: llama-index-core]
    A --> C[LLM Integrations]
    A --> D[Reader Integrations]
    A --> E[Callback Integrations]
    B --> F[VectorStoreIndex]
    B --> G[ServiceContext]
    B --> H[Document Loading]
```

The LlamaIndex framework consists of several key components:

| Component | Purpose |
|-----------|---------|
| `llama-index-core` | Core framework functionality including indexing and querying |
| LLM Integrations | Connectors for various language model providers |
| Reader Integrations | Data loaders for different document sources |
| Callback Integrations | Monitoring and logging capabilities |

## Package Management

### Querying Package Information

View information about specific packages in the monorepo:

```bash
# Get info for a specific package
llama-dev pkg info llama-index-core

# Get info for all packages
llama-dev pkg info --all
```

### Executing Commands in Package Directories

Run commands within the context of specific packages:

```bash
# Run a command in a specific package
llama-dev pkg exec --cmd "uv sync" llama-index-core

# Run a command in all packages
llama-dev pkg exec --cmd "uv sync" --all

# Exit at first error
llama-dev pkg exec --cmd "uv" --all --fail-fast
```

资料来源：[llama-dev/README.md:26-41]()

## Testing

### Running Tests Across the Monorepo

Execute tests for specific packages or across all packages:

```bash
# Run tests for a specific package
llama-dev pkg test llama-index-core

# Run tests for all packages
llama-dev pkg test --all
```

### Quick Test Verification

After making changes, verify core functionality:

```bash
llama-dev pkg exec --cmd "python -m pytest" llama-index-core
```

## Basic LLM Integration Usage

### Initializing an LLM

Different LLM providers follow similar initialization patterns:

```python
from llama_index.llms.ollama import Ollama

# Initialize Ollama LLM
llm = Ollama(model="llama3.1:latest", request_timeout=120.0)
```

```python
from llama_index.llms.mistralai import MistralAI

llm = MistralAI(api_key="<your-api-key>")
```

资料来源：[llama-index-integrations/llms/llama-index-llms-ollama/README.md:30-35]()
资料来源：[llama-index-integrations/llms/llama-index-llms-mistralai/README.md:16-18]()

### Generating Completions

```python
# Simple completion
resp = llm.complete("Who is Paul Graham?")
print(resp)
```

```python
# Chat completion with messages
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role=MessageRole.SYSTEM,
        content="You are a helpful assistant."
    ),
    ChatMessage(role=MessageRole.USER, content="How to make cake?"),
]
resp = llm.chat(messages)
print(resp)
```

资料来源：[llama-index-integrations/llms/llama-index-llms-modelscope/README.md:24-37]()

### Streaming Responses

```python
# Stream completions
resp = llm.stream_complete("Paul Graham is ")
for r in resp:
    print(r.delta, end="")
```

```python
# Stream chat responses
resp = llm.stream_chat([message])
for r in resp:
    print(r.delta, end="")
```

资料来源：[llama-index-integrations/llms/llama-index-llms-mistralai/README.md:40-48]()

## Building an Index from Documents

### Basic Index Creation

```python
from llama_index.core import VectorStoreIndex

# Create index from documents
index = VectorStoreIndex.from_documents(documents)
```

```python
# Query the index
query_engine = index.as_query_engine()
response = query_engine.query("What is the main topic?")
print(response)
```

### Loading Data from URLs

```python
from llama_index.readers.web import WholeSiteReader

# Initialize the scraper
scraper = WholeSiteReader(
    prefix="https://docs.llamaindex.ai/en/stable/",
    max_depth=10,
)

# Start scraping from a base URL
documents = scraper.load_data(
    base_url="https://docs.llamaindex.ai/en/stable/"
)

# Create index
index = VectorStoreIndex.from_documents(documents)
index.query("What language is on this website?")
```

资料来源：[llama-index-integrations/readers/llama-index-readers-web/llama_index/readers/web/whole_site/README.md:14-34]()

## Configuration Options

### Key Parameters

| Parameter | Description | Default Value |
|-----------|-------------|---------------|
| `model` | LLM model identifier | Required |
| `api_key` | API key for the provider | Required for cloud providers |
| `request_timeout` | Request timeout in seconds | 30.0 |
| `temperature` | Sampling temperature | 0.7 |
| `max_tokens` | Maximum tokens to generate | Provider-specific |
| `context_window` | Maximum context length | Provider-specific |

### Environment Variables

Set API keys as environment variables before initialization:

```bash
export KONKO_API_KEY=<your-api-key>
export OPENAI_API_KEY=<your-api-key>
```

```python
import os
os.environ["KONKO_API_KEY"] = "<your-api-key>"
```

资料来源：[llama-index-integrations/llms/llama-index-llms-konko/README.md:15-20]()

## Common Workflows

```mermaid
graph LR
    A[Setup Environment] --> B[Install llama-dev]
    B --> C[Explore Packages]
    C --> D{Development Goal}
    D -->|Testing| E[Run Tests]
    D -->|Integration| F[Configure LLM]
    D -->|Data Loading| G[Set up Readers]
    E --> H[Modify Code]
    F --> H
    G --> H
    H --> I[Verify Changes]
    I --> E
```

## Troubleshooting

### Common Issues

| Issue | Solution |
|-------|----------|
| CLI not found | Ensure virtual environment is activated |
| API key errors | Verify environment variables are set |
| Package import errors | Run `uv sync` in the package directory |
| Timeout errors | Increase `request_timeout` parameter |

### Verification Commands

```bash
# Check installation
llama-dev --version

# Verify package structure
llama-dev pkg info --all

# Test core imports
python -c "import llama_index; print(llama_index.__version__)"
```

## Next Steps

After completing this quick start guide:

1. Explore specific [LLM integrations](#core-llm-integrations) for your preferred provider
2. Review [reader integrations](#data-loading) for your data sources
3. Study the [core API documentation](#key-concepts) for advanced indexing strategies
4. Join the community for support and updates

---

<a id='core-architecture'></a>

## Core Architecture

### 相关页面

相关主题：[Introduction to LlamaIndex](#introduction), [Integration Architecture](#integration-architecture)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [llama-index-core/llama_index/core/base/llms/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/base/llms/base.py)
- [llama-index-core/llama_index/core/base/embeddings/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/base/embeddings/base.py)
- [llama-index-core/llama_index/core/base/response/schema.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/base/response/schema.py)
- [llama-index-core/llama_index/core/types.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/types.py)
- [llama-index-core/llama_index/core/indices/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/base.py)
- [llama-index-core/llama_index/core/storage/storage_context.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/storage/storage_context.py)
- [llama-index-core/llama_index/core/node_parser/node.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/node_parser/node.py)
- [llama-index-core/llama_index/core/response/__init__.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/response/__init__.py)
</details>

# Core Architecture

## Overview

LlamaIndex is a data framework for building LLM-powered applications. The Core Architecture establishes the fundamental building blocks that enable developers to connect large language models with their custom data sources. This architectural foundation provides a layered, modular approach where each component—from language model interfaces to response handling—follows consistent patterns and abstractions.

The core architecture serves as the abstraction layer between raw data ingestion and sophisticated LLM-powered querying. It separates concerns by defining clear interfaces for language models (LLMs), embedding services, document processing, indexing, and response generation. This design allows developers to swap implementations, extend functionality, and maintain clean separation between components.

## System Components

### High-Level Architecture Diagram

```mermaid
graph TD
    subgraph "Data Layer"
        Documents[Documents]
        Nodes[Nodes]
        Index[Index]
    end
    
    subgraph "Core Abstractions"
        LLMs[LLM Base]
        Embeddings[Embedding Base]
        Response[Response Schema]
    end
    
    subgraph "Service Layer"
        VectorStore[Vector Store]
        StorageContext[Storage Context]
    end
    
    subgraph "Application Layer"
        Query[Query Engine]
        Chat[Chat Engine]
        Agent[Agent]
    end
    
    Documents --> NodeParser
    NodeParser --> Nodes
    Nodes --> Index
    Index --> Query
    Query --> Response
    LLMs --> Query
    Embeddings --> Index
```

## Language Model (LLM) Abstraction

### Purpose and Role

The LLM base abstraction (`llama_index.core.base.llms.base`) defines the contract that all language model implementations must follow. This abstraction enables LlamaIndex to support multiple LLM providers—including OpenAI, Anthropic, local models, and custom implementations—through a unified interface.

资料来源：[llama-index-core/llama_index/core/base/llms/base.py:1-50]()

### Base LLM Interface

The `LLM` base class provides the following core methods:

| Method | Purpose | Parameters |
|--------|---------|------------|
| `complete()` | Synchronous text completion | `prompt: str`, `formatted: bool = False`, `**kwargs` |
| `stream_complete()` | Streaming text completion | `prompt: str`, `formatted: bool = False`, `**kwargs` |
| `chat()` | Synchronous chat completion | `messages: List[ChatMessage]`, `**kwargs` |
| `stream_chat()` | Streaming chat completion | `messages: List[ChatMessage]`, `**kwargs` |

### LLM Class Hierarchy

```mermaid
classDiagram
    class LLM {
        <<abstract>>
        +complete()
        +stream_complete()
        +chat()
        +stream_chat()
        +metadata: LLMMetadata
    }
    
    class LLMMetadata {
        +model: str
        +temperature: float
        +top_p: int
        +max_tokens: Optional[int]
        +context_window: int
        +is_chat_model: bool
        +is_function_calling_model: bool
    }
    
    class ChatMessage {
        +role: MessageRole
        +content: str
        +additional_kwargs: Dict
    }
    
    LLM --> LLMMetadata
    LLM --> ChatMessage
```

资料来源：[llama-index-core/llama_index/core/base/llms/base.py:50-120]()

### Message Roles

The `MessageRole` enum defines valid roles for chat messages:

| Role | Description |
|------|-------------|
| `SYSTEM` | System-level instructions |
| `USER` | User-generated content |
| `ASSISTANT` | Model-generated responses |
| `FUNCTION` | Function call results |

## Embedding Abstraction

### Purpose and Role

The embedding base (`llama_index.core.base.embeddings.base`) provides the interface for text vectorization. Embeddings transform textual content into numerical vectors that enable semantic similarity searches. This abstraction supports various embedding providers while maintaining a consistent API.

资料来源：[llama-index-core/llama_index/core/base/embeddings/base.py:1-60]()

### Embedding Interface Methods

| Method | Purpose | Return Type |
|--------|---------|-------------|
| `get_query_embedding()` | Embed a single query string | `List[float]` |
| `get_text_embedding()` | Embed a single text string | `List[float]` |
| `get_text_embedding_batch()` | Embed multiple texts in batch | `List[List[float]]` |
| `get_query_embedding_batch()` | Embed multiple queries in batch | `List[List[float]]]` |

### Embedding Configuration

```mermaid
graph LR
    A[Text Input] --> B[Embedding Model]
    B --> C[Dimension: 384-1536]
    C --> D[Normalized Vector]
```

资料来源：[llama-index-core/llama_index/core/base/embeddings/base.py:60-100]()

## Response Schema

### Purpose and Role

The response schema (`llama_index.core.base.response.schema`) defines the data structures used throughout LlamaIndex for returning query results, streaming responses, and structured outputs. This ensures consistent response handling across different query types and engines.

资料来源：[llama-index-core/llama_index/core/base/response/schema.py:1-80]()

### Core Response Models

| Class | Purpose |
|-------|---------|
| `Response` | Wraps text responses with sources |
| `StreamingResponse` | Handles streaming token outputs |
| `ResponseMode` | Enum for response generation modes |
| `Sources` | Container for source nodes and metadata |

### Response Mode Enumeration

```mermaid
graph TD
    A[Query] --> B{Response Mode}
    B --> C[default]
    B --> D[refine]
    B --> E[compact]
    B --> F[accumulate]
    B --> G[compact_accumulate]
    
    C --> H[Single pass response]
    D --> I[Iterative refinement]
    E --> J[Compact and respond]
    F --> K[Aggregate node responses]
    G --> L[Compact then accumulate]
```

资料来源：[llama-index-core/llama_index/core/base/response/schema.py:30-50]()

## Core Types System

### Type Definitions

The types module (`llama_index.core.types`) defines foundational enumerations and type aliases used throughout the framework:

| Type | Description |
|------|-------------|
| `ModelType` | Defines model categories (e.g., `LLM`, `EMBEDDING`) |
| `PromptType` | Categorizes prompts (e.g., `SUMMARY`, `QUERY`) |
| `NodeType` | Defines node kinds (e.g., `TEXT`, `DOCUMENT`) |

资料来源：[llama-index-core/llama_index/core/types.py:1-60]()

### Node Parser Types

```mermaid
classDiagram
    class Node {
        <<abstract>>
        +id_: str
        +embedding: Optional[List[float]]
        +metadata: Dict[str, Any]
        +relationships: Dict[NodeRelationship, Node]
        +excluded_embed_metadata_keys: List[str]
        +excluded_llm_metadata_keys: List[str]
    }
    
    class TextNode {
        +text: str
        +start_char_idx: Optional[int]
        +end_char_idx: Optional[int]
    }
    
    class Document {
        +text: str
        +doc_id: str
        +embedding: Optional[List[float]]
    }
    
    Node <|-- TextNode
    Node <|-- Document
```

## Document and Node Model

### Document Structure

Documents represent the top-level container for source data. Each document contains metadata and can be broken down into smaller nodes for indexing:

| Field | Type | Description |
|-------|------|-------------|
| `doc_id` | `str` | Unique document identifier |
| `text` | `str` | Full text content |
| `metadata` | `Dict[str, Any]` | Associated metadata |
| `embedding` | `Optional[List[float]]` | Pre-computed embedding |

### Node Relationships

Nodes maintain relationships with other nodes through the `NodeRelationship` enum:

| Relationship | Description |
|--------------|-------------|
| `SOURCE` | Parent document relationship |
| `PREVIOUS` | Previous sibling node |
| `NEXT` | Next sibling node |
| `PARENT` | Parent node in hierarchy |
| `CHILD` | Child node in hierarchy |

资料来源：[llama-index-core/llama_index/core/node_parser/node.py:30-80]()

## Storage Architecture

### Storage Context

The `StorageContext` manages persistence layers for various data components:

```mermaid
graph TD
    StorageContext --> VectorStore
    StorageContext --> DocStore
    StorageContext --> IndexStore
    StorageContext --> GraphStore
    
    VectorStore --> Milvus[Milvus]
    VectorStore --> Chroma[Chroma]
    VectorStore --> Pinecone[Pinecone]
    
    DocStore --> MongoDB[MongoDB]
    DocStore --> Redis[Redis]
    DocStore --> Simple[SimpleKVStore]
```

资料来源：[llama-index-core/llama_index/core/storage/storage_context.py:1-50]()

### Storage Components

| Component | Purpose |
|-----------|---------|
| `vector_store` | Stores embedding vectors for similarity search |
| `doc_store` | Stores serialized nodes and documents |
| `index_store` | Stores index metadata and configurations |
| `graph_store` | Stores knowledge graph relationships |

## Index Architecture

### Base Index Structure

Indexes provide the mechanism for organizing and querying documents. The base index class establishes the contract for all index implementations:

```mermaid
graph LR
    A[Documents] --> B[Index Construction]
    B --> C[Node Parsing]
    C --> D[Embedding Generation]
    D --> E[Vector Storage]
    E --> F[Queryable Index]
```

### Index Types

| Index Type | Use Case |
|------------|----------|
| `VectorStoreIndex` | Semantic search over embeddings |
| `SummaryIndex` | Document summarization |
| `KeywordTableIndex` | Keyword-based retrieval |
| `KnowledgeGraphIndex` | Graph-based knowledge representation |

资料来源：[llama-index-core/llama_index/core/indices/base.py:1-80]()

## Query Engine Architecture

### Query Flow

```mermaid
sequenceDiagram
    participant User
    participant QueryEngine
    participant Retriever
    participant LLM
    participant Response
    
    User->>QueryEngine: Query Request
    QueryEngine->>Retriever: Retrieve Nodes
    Retriever-->>QueryEngine: Source Nodes
    QueryEngine->>LLM: Synthesize Response
    LLM-->>QueryEngine: Response
    QueryEngine->>Response: Format Output
    Response-->>User: Formatted Answer
```

### Retriever Types

| Retriever | Description |
|-----------|-------------|
| `VectorRetriever` | Embedding-based similarity search |
| `KeywordRetriever` | BM25 or keyword matching |
| `HybridRetriever` | Combined vector and keyword search |
| `SentenceWindowRetriever` | Contextual window retrieval |

## Configuration and Extensibility

### Service Context

The `ServiceContext` bundles together the core service components:

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `llm` | `LLM` | `OpenAI()` | Language model instance |
| `embed_model` | `Embedding` | `OpenAIEmbedding()` | Embedding model instance |
| `node_parser` | `NodeParser` | `SentenceSplitter()` | Text chunking strategy |
| `prompt_helper` | `PromptHelper` | Auto-calculated | Prompt size optimization |

### Customization Patterns

```mermaid
graph TD
    subgraph "Extension Points"
        CustomLLM[Custom LLM Implementation]
        CustomEmbed[Custom Embedding Model]
        CustomParser[Custom Node Parser]
        CustomStore[Custom Storage Backend]
    end
    
    CustomLLM -->|inherits| LLMBase[LLM Base]
    CustomEmbed -->|inherits| EmbedBase[Embedding Base]
    CustomParser -->|inherits| NodeParserBase[NodeParser Base]
    CustomStore -->|inherits| StorageContextBase[StorageContext Base]
```

## Summary

The Core Architecture of LlamaIndex establishes a modular, extensible framework built on well-defined abstractions. The layered architecture—from base interfaces like `LLM` and `Embedding` through storage and indexing components to application-layer query engines—enables developers to:

1. **Swap implementations** without changing application code
2. **Extend functionality** through inheritance and composition
3. **Maintain clean separation** between concerns
4. **Support multiple providers** through unified interfaces

The architecture follows consistent patterns across components, making the framework predictable and learnable while supporting the diverse requirements of production LLM applications.

---

## See Also

- [LLM Integrations](../llms/README.md)
- [Embedding Integrations](../embeddings/README.md)
- [Index Examples](../examples/index.md)
- [Storage Integrations](../storage/README.md)

---

<a id='integration-architecture'></a>

## Integration Architecture

### 相关页面

相关主题：[Core Architecture](#core-architecture), [Retrieval and Reranking](#retrieval-reranking)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [llama-index-integrations/readers/llama-index-readers-preprocess/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/readers/llama-index-readers-preprocess/README.md)
- [llama-index-integrations/readers/llama-index-readers-remote-depth/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/readers/llama-index-readers-remote-depth/README.md)
- [llama-index-integrations/llms/llama-index-llms-contextual/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-contextual/README.md)
- [llama-index-integrations/llms/llama-index-llms-konko/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-konko/README.md)
- [llama-index-integrations/llms/llama-index-llms-lmstudio/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-lmstudio/README.md)
- [llama-index-integrations/readers/llama-index-readers-wikipedia/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/readers/llama-index-readers-wikipedia/README.md)
- [llama-index-integrations/llms/llama-index-llms-langchain/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-langchain/README.md)
</details>

# Integration Architecture

## Overview

LlamaIndex employs a modular integration architecture that extends the core framework's capabilities through a comprehensive ecosystem of pluggable components. The integration system allows developers to connect LlamaIndex with external services, APIs, local models, and specialized tools without modifying the core library. This architecture follows a provider-based pattern where each integration package implements standardized interfaces to ensure compatibility and consistent behavior across different external systems.

The integration architecture serves as the bridge between LlamaIndex's core data structures and the diverse landscape of LLM providers, embedding services, document loaders, and auxiliary tools. By maintaining well-defined contracts between components, the system enables seamless swapping of implementations while preserving the overall workflow of building retrieval-augmented generation (RAG) pipelines and query engines.

## Integration Categories

LlamaIndex organizes its integrations into distinct categories, each addressing a specific aspect of the LLM application development workflow. The categorization ensures logical separation of concerns and simplifies dependency management for end users.

### LLM Integrations

LLM (Large Language Model) integrations provide adapters for connecting to various language model providers. These integrations implement the unified LLM interface defined in `llama_index.core.llms`, allowing developers to switch between providers without changing application code. Each LLM integration handles provider-specific authentication, request formatting, response parsing, and streaming behavior.

| Integration Package | Provider | Key Features |
|---------------------|----------|--------------|
| `llama-index-llms-contextual` | Contextual | Contextual LLM wrapper |
| `llama-index-llms-konko` | Konko | Supports both Konko and OpenAI models |
| `llama-index-llms-lmstudio` | LM Studio | Local server integration |
| `llama-index-llms-monsterapi` | MonsterAPI | Private deployments and GA models |
| `llama-index-llms-modelscope` | ModelScope | Qwen and other ModelScope models |
| `llama-index-llms-langchain` | LangChain | LangChain LLM wrapper |
| `llama-index-llms-optimum-intel` | Intel Optimum | CPU-optimized inference |

资料来源：[llama-index-integrations/llms/llama-index-llms-contextual/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-contextual/README.md)

### Reader Integrations

Reader integrations enable data ingestion from various document sources and web content. These loaders transform external data formats into LlamaIndex's internal `Document` schema, providing a unified representation regardless of the source type.

| Reader Type | Source Format | Package |
|-------------|---------------|---------|
| Document Readers | PDF, DOCX, HTML | `llama-index-readers-docling` |
| Web Readers | URLs, Articles | `llama-index-readers-web` |
| Wikipedia | Wikipedia pages | `llama-index-readers-wikipedia` |
| Remote Content | Deep link crawling | `llama-index-readers-remote-depth` |
| Cloud Storage | Box files | `llama-index-readers-box` |
| Preprocessed | Chunks from Preprocess API | `llama-index-readers-preprocess` |

资料来源：[llama-index-integrations/readers/llama-index-readers-wikipedia/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/readers/llama-index-readers-wikipedia/README.md)

### Embedding Integrations

Embedding integrations provide vectorization capabilities through external embedding models. These components convert text into dense vector representations suitable for semantic search and similarity operations.

| Provider | Model Examples | Package |
|----------|---------------|---------|
| Ollama | `nomic-embed-text`, `embeddinggemma`, `mxbai-embed-large` | `llama-index-embeddings-ollama` |

资料来源：[llama-index-integrations/embeddings/llama-index-embeddings-ollama/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/embeddings/llama-index-embeddings-ollama/README.md)

### Index Integrations

Index integrations connect to managed vector search services, providing fully-hosted indexing and retrieval capabilities. These integrations abstract the complexity of distributed vector databases behind LlamaIndex's retriever interface.

| Managed Service | Package | Features |
|-----------------|---------|----------|
| Vectara | `llama-index-indices-managed-vectara` | RAG pipeline, retriever, query engine |

资料来源：[llama-index-integrations/indices/llama-index-indices-managed-vectara/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/indices/llama-index-indices-managed-vectara/README.md)

### Tool Integrations

Tool integrations extend LlamaIndex's agent capabilities by providing access to external services that can be invoked during agent execution.

| Tool Provider | Features | Package |
|---------------|----------|---------|
| Moss | Hybrid search (keyword + semantic) | `llama-index-tools-moss` |

### Callback Integrations

Callback integrations enable observability and feedback collection by integrating with external monitoring and evaluation platforms.

| Platform | Purpose | Package |
|----------|---------|---------|
| Argilla | Feedback loop, LLM monitoring | `llama-index-callbacks-argilla` |

资料来源：[llama-index-integrations/callbacks/llama-index-callbacks-argilla/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/callbacks/llama-index-callbacks-argilla/README.md)

## System Architecture

The integration architecture follows a layered approach where core abstractions define the contracts, and integration packages provide concrete implementations. This design enables horizontal scalability of integrations while maintaining vertical consistency with the core framework.

```mermaid
graph TD
    A[Application Layer] --> B[Core LlamaIndex]
    B --> C[Interface Abstractions]
    C --> D[LLM Abstraction]
    C --> E[Reader Abstraction]
    C --> F[Embedding Abstraction]
    C --> G[Retriever Abstraction]
    D --> H[LLM Integrations]
    E --> I[Reader Integrations]
    F --> J[Embedding Integrations]
    G --> K[Index Integrations]
    H --> L[Konko, LMStudio, MonsterAPI, etc.]
    I --> M[Docling, Wikipedia, Web, Box, etc.]
    J --> N[Ollama Embeddings]
    K --> O[Vectara]
```

## Common Integration Patterns

### LLM Integration Pattern

LLM integrations follow a consistent initialization pattern that accepts provider-specific configuration parameters. The typical constructor accepts a model identifier, base URL for API endpoints, and optional generation parameters such as temperature and maximum tokens.

```python
from llama_index.llms.provider_name import ProviderLLM

llm = ProviderLLM(
    model="model-identifier",
    api_key="your-api-key",
    temperature=0.7,
    max_tokens=256
)
```

资料来源：[llama-index-integrations/llms/llama-index-llms-konko/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-konko/README.md)

### Reader Integration Pattern

Reader integrations follow a loader pattern where initialization may require credentials, and the `load_data` method accepts source-specific parameters such as URLs, file paths, or query filters.

```python
from llama_index.readers.source_type import SourceReader

reader = SourceReader(credentials="your-credentials")
documents = reader.load_data(source="document-source")
```

资料来源：[llama-index-integrations/readers/llama-index-readers-remote-depth/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/readers/llama-index-readers-remote-depth/README.md)

## Data Flow Architecture

The integration architecture enables a complete RAG pipeline where each component plays a specific role in transforming input data into actionable insights.

```mermaid
graph LR
    A[Document Sources] --> B[Readers]
    B --> C[Documents]
    C --> D[Node Parsers]
    D --> E[Nodes]
    E --> F[Vector Index]
    E --> G[Storage Context]
    F --> H[Retriever]
    G --> H
    H --> I[Query Engine]
    I --> J[LLM]
    J --> K[Response]
```

## Installation and Dependency Management

Each integration package follows the naming convention `llama-index-{category}-{provider}` and can be installed independently via pip. This modular approach minimizes dependency overhead by allowing users to install only the packages required for their specific use case.

| Category | Package Naming Pattern | Installation Command |
|----------|----------------------|---------------------|
| LLM | `llama-index-llms-{provider}` | `pip install llama-index-llms-{provider}` |
| Reader | `llama-index-readers-{source}` | `pip install llama-index-readers-{source}` |
| Embedding | `llama-index-embeddings-{provider}` | `pip install llama-index-embeddings-{provider}` |
| Index | `llama-index-indices-{type}-{provider}` | `pip install llama-index-indices-{type}-{provider}` |
| Tool | `llama-index-tools-{provider}` | `pip install llama-index-tools-{provider}` |
| Callback | `llama-index-callbacks-{platform}` | `pip install llama-index-callbacks-{platform}` |

## Configuration Management

Integrations typically support configuration through both constructor parameters and environment variables. This dual approach accommodates both explicit configuration in code and secret management through environment-based configuration.

### Environment Variable Pattern

Many integrations follow a pattern where API keys can be set as environment variables for security and convenience:

```bash
export PROVIDER_API_KEY="your-api-key"
export OPENAI_API_KEY="your-openai-key"
```

### Constructor Parameter Pattern

Alternatively, credentials can be passed directly to the integration constructor:

```python
llm = ProviderLLM(
    model="model-name",
    api_key="explicit-api-key",
    base_url="https://api.provider.com"
)
```

资料来源：[llama-index-integrations/llms/llama-index-llms-lmstudio/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-lmstudio/README.md)

## Extending the Architecture

The integration architecture is designed for extensibility. New integrations can be created by implementing the appropriate abstract base classes defined in `llama_index.core`. Each integration category has its own interface specification that ensures consistency across implementations.

### Creating a New LLM Integration

To create a new LLM integration, implement the following interface contract:

1. Inherit from the base LLM class
2. Implement `complete()`, `chat()`, and streaming methods
3. Handle provider-specific authentication and error handling
4. Follow the naming convention for the package

### Creating a New Reader Integration

To create a new reader integration:

1. Implement a loader class with `load_data()` method
2. Transform source data into `Document` objects
3. Handle pagination, filtering, and error cases appropriately
4. Document supported source formats and parameters

## Integration Testing Considerations

Each integration package maintains its own test suite to verify compatibility with the external service. Integration tests typically require actual API credentials and network access, distinguishing them from unit tests that mock external dependencies.

## Best Practices

When working with LlamaIndex integrations, consider the following best practices:

1. **Dependency Isolation**: Install only required integration packages to minimize potential conflicts
2. **Credential Management**: Use environment variables for sensitive credentials in production
3. **Error Handling**: Implement appropriate retry logic and fallback strategies for external service calls
4. **Resource Management**: Close connections and release resources properly when using streaming responses
5. **Version Compatibility**: Check integration package versions against the core LlamaIndex version for compatibility

## Deprecated Integrations

Some integration packages may be discontinued over time as external services evolve or change their offerings. When an integration is deprecated, it will receive no further updates or support. Users should migrate to alternative solutions before removing deprecated packages from their projects.

资料来源：[llama-index-integrations/readers/llama-index-readers-preprocess/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/readers/llama-index-readers-preprocess/README.md)

## Conclusion

The integration architecture provides a flexible, extensible framework for connecting LlamaIndex with the broader ecosystem of LLM providers, data sources, and tools. By maintaining standardized interfaces while allowing provider-specific implementations, the architecture enables developers to build sophisticated RAG applications without being locked into a single vendor or service. The modular design supports incremental adoption, allowing teams to integrate new capabilities as their requirements evolve.

---

<a id='documents-nodes'></a>

## Documents and Nodes

### 相关页面

相关主题：[Storage Systems](#storage-systems), [Query Engines](#query-engines)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [llama-index-core/llama_index/core/schema.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/schema.py)
- [llama-index-core/llama_index/core/node_parser/interface.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/node_parser/interface.py)
- [llama-index-core/llama_index/core/readers/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/readers/base.py)
- [llama-index-core/llama_index/core/node_parser/text/sentence.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/node_parser/text/sentence.py)
- [llama-index-integrations/readers/llama-index-readers-docling/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/readers/llama-index-readers-docling/README.md)
- [llama-index-integrations/node_parser/llama-index-node-parser-docling/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/node_parser/llama-index-node-parser-docling/README.md)
</details>

# Documents and Nodes

## Overview

In LlamaIndex, **Documents** and **Nodes** are the fundamental data structures that represent information to be indexed, searched, and retrieved. Documents serve as the primary unit of input data, while Nodes are the granular chunks created during document processing for optimal embedding and retrieval.

## Document Model

### Purpose and Scope

A Document in LlamaIndex represents a single unit of data to be indexed. It encapsulates the content along with associated metadata that provides context about the source, type, and additional information useful for retrieval and processing.

### Core Document Schema

The Document model is defined in `llama-index-core/llama_index/core/schema.py` and includes the following key attributes:

| Attribute | Type | Description |
|-----------|------|-------------|
| `text` | str | The main text content of the document |
| `id_` | str | Unique identifier for the document |
| `metadata` | Dict[str, Any] | Additional metadata about the document |
| `mimetype` | str | MIME type of the document content |
| `relationships` | Dict[str, RelationshipType] | Relationships to other nodes/documents |

### Document Construction

Documents can be created with varying levels of detail:

```python
from llama_index.core import Document

# Basic document
doc = Document(text="Your content here")

# Document with metadata
doc = Document(
    text="Your content here",
    metadata={
        "source": "review.txt",
        "author": "John Doe",
        "date": "2024-01-15"
    }
)
```

## Node Model

### Purpose and Scope

Nodes are the result of parsing and chunking Documents into smaller, semantically coherent pieces. Each Node inherits document-like properties but adds relationship information linking back to its parent Document and sibling Nodes.

### Node Structure

Nodes extend the Document schema with additional attributes defined in `llama-index-core/llama_index/core/schema.py`:

| Attribute | Type | Description |
|-----------|------|-------------|
| `node_id` | str | Unique identifier for the node |
| `start_char_idx` | int | Starting character index in parent document |
| `end_char_idx` | int | Ending character index in parent document |
| `text_template` | str | Template for rendering the node text |
| `relationships` | Dict[RelationshipType, RelatedNodeType] | Relationships including PARENT, PREVIOUS, NEXT |

## Architecture Diagram

```mermaid
graph TD
    A[Raw Input Data] --> B[Document]
    B --> C[Node Parser]
    C --> D[Nodes]
    D --> E[Embedding Model]
    E --> F[Vector Index]
    
    G[Metadata] --> B
    H[Relationships] --> D
    
    B -->|PARENT| D
    D -->|CHILD| B
```

## Readers and Loading

### Base Reader Interface

Readers are responsible for loading data from various sources and converting them into Documents. The base reader interface is defined in `llama-index-core/llama_index/core/readers/base.py`.

| Method | Description |
|--------|-------------|
| `load_data()` | Load documents from a data source |
| `lazy_load_data()` | Lazily load documents for memory efficiency |

### Supported Reader Types

LlamaIndex provides numerous reader integrations for different data sources:

| Category | Reader | Description |
|----------|--------|-------------|
| Document | Docling Reader | PDF, DOCX, HTML extraction to Markdown or JSON |
| Document | MarkItDown Reader | Converts various formats to Markdown |
| Document | Docugami Loader | XML knowledge graph from PDF/DOCX |
| Web | NewsArticleReader | Parses news article URLs |
| Web | UnstructuredURLLoader | URL text extraction via Unstructured.io |
| Web | TrafilaturaWebReader | Web scraping with trafilatura |
| Web | MainContentExtractorReader | Main content extraction from websites |
| Web | ReadabilityWebPageReader | Readability-based web extraction |
| Web | RemoteDepthReader | Recursive URL loading with depth control |
| Web | WholeSiteReader | Full site scraping with prefix/depth |
| Academic | SemanticScholarReader | Scholarly articles and papers |
| Database | Chroma Reader | Loading from Chroma vector store |

### Usage Example

```python
from llama_index.readers.docling import DoclingReader

reader = DoclingReader()
docs = reader.load_data(file_path="document.pdf")
```

## Node Parsers

### Purpose and Scope

Node Parsers transform Documents into Nodes by splitting content based on semantic boundaries. The interface is defined in `llama-index-core/llama_index/core/node_parser/interface.py`.

### Core Interface Methods

| Method | Description |
|--------|-------------|
| `get_nodes_from_documents()` | Parse documents into nodes |
| `get_batch_nodes()` | Process documents in batches |

### Sentence Splitter Parser

The sentence-based node parser in `llama-index-core/llama_index/core/node_parser/text/sentence.py` provides configurable text chunking:

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `separator` | str | "\n\n" | Chunk separator |
| `chunk_size` | int | 1024 | Maximum characters per chunk |
| `chunk_overlap` | int | 0 | Overlap between chunks |
| `chunking_tokenizer` | callable | None | Custom tokenizer function |
| `callback_manager` | CallbackManager | None | Event callbacks |

### Docling Node Parser

The Docling Node Parser (`llama-index-integrations/node_parser/llama-index-node-parser-docling/README.md`) parses Docling JSON output into LlamaIndex nodes with rich metadata:

```python
from llama_index.node_parser.docling import DoclingNodeParser

node_parser = DoclingNodeParser()
nodes = node_parser.get_nodes_from_documents(documents=docs)
```

## Document-Node Relationships

### Relationship Types

Nodes maintain typed relationships to other components:

| Relationship | Description |
|--------------|-------------|
| `PARENT` | Link to parent Document |
| `CHILD` | Link to child elements |
| `PREVIOUS` | Previous sibling Node |
| `NEXT` | Next sibling Node |
| `SOURCE` | Source Document reference |

### Metadata Preservation

Nodes automatically inherit and extend document metadata:

```python
# Node metadata includes provenance information
{
    'doc_items': [{'self_ref': '#/main-text/21'}],
    'prov': [{'page_no': 2, 'bbox': {...}}],
    'headings': ['2 Getting Started']
}
```

## Workflow

```mermaid
graph LR
    A[Load Data] --> B[Create Document]
    B --> C[Parse Document]
    C --> D[Generate Nodes]
    D --> E[Create Embeddings]
    E --> F[Build Index]
    
    A1[Readers] --> A
    C1[Node Parsers] --> C
```

## Best Practices

### Document Creation

1. Always assign unique `id_` attributes for tracking
2. Include comprehensive metadata for filtering
3. Specify `mimetype` when content type matters

### Node Parsing

1. Choose appropriate `chunk_size` for your embedding model
2. Configure `chunk_overlap` for context continuity
3. Use semantic-aware parsers (Docling) for complex documents

### Memory Management

1. Use `lazy_load_data()` for large document collections
2. Consider batch processing for node parsing
3. Leverage streaming for very large files

## Related Integrations

| Integration | Use Case |
|-------------|----------|
| VectaraIndex | Managed semantic search |
| ChromaReader | Vector database loading |
| AlibabaCloud AISearch | Cloud-based document parsing |
| Ollama Embeddings | Local embedding generation |

## Summary

Documents serve as the primary data ingestion point in LlamaIndex, encapsulating raw content and metadata from various sources. Nodes are the processed, chunked representations optimized for embedding generation and retrieval. Together with Readers and Node Parsers, they form the foundation of the LlamaIndex data pipeline.

---

<a id='storage-systems'></a>

## Storage Systems

### 相关页面

相关主题：[Documents and Nodes](#documents-nodes), [Retrieval and Reranking](#retrieval-reranking)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [llama-index-core/llama_index/core/storage/storage_context.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/storage/storage_context.py)
- [llama-index-core/llama_index/core/storage/docstore/simple_docstore.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/storage/docstore/simple_docstore.py)
- [llama-index-core/llama_index/core/storage/index_store/simple_index_store.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/storage/index_store/simple_index_store.py)
- [llama-index-core/llama_index/core/storage/chat_store/simple_chat_store.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/storage/chat_store/simple_chat_store.py)
</details>

# Storage Systems

## Overview

LlamaIndex provides a comprehensive storage system that allows users to persist indexes, documents, and chat histories to disk for later retrieval and reuse. The storage architecture is built around the `StorageContext` class, which serves as the central coordinator for managing various storage backends including document stores, index stores, and chat stores.

The storage system enables:

- **Persistence**: Save index data to disk for long-term storage
- **Retrieval**: Reload previously persisted indexes without recomputation
- **In-memory fallback**: Default in-memory storage when persistence is not configured
- **Customizable backends**: Pluggable storage implementations for different use cases

## Architecture

```mermaid
graph TD
    A[StorageContext] --> B[VectorStore]
    A --> C[DocStore]
    A --> D[IndexStore]
    A --> E[ChatStore]
    A --> F[ImageStore]
    A --> G[GraphStore]
    
    C --> H[SimpleDocStore]
    C --> I[MongoDocStore]
    C --> J[KVDocStore]
    
    D --> K[SimpleIndexStore]
    D --> L[MongoIndexStore]
    D --> M[KVIndexStore]
    
    E --> N[SimpleChatStore]
    E --> O[MongoChatStore]
```

## StorageContext

The `StorageContext` class is the main entry point for configuring storage in LlamaIndex. It aggregates all storage components and provides methods for persistence and retrieval.

### Initialization

```python
from llama_index.core import StorageContext, load_index_from_storage

# Create with default in-memory stores
storage_context = StorageContext.from_defaults()

# Create with persistence to disk
storage_context = StorageContext.from_defaults(persist_dir="./storage")

# Load existing index from disk
index = load_index_from_storage(storage_context=storage_context)
```

### Configuration Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `persist_dir` | `str` | `None` | Directory path for persistence |
| `vector_store` | `BaseVectorStore` | `InMemoryVectorStore` | Vector storage backend |
| `docstore` | `BaseDocstore` | `SimpleDocumentStore` | Document storage backend |
| `index_store` | `BaseIndexStore` | `SimpleIndexStore` | Index metadata storage |
| `graph_store` | `BaseGraphStore` | `None` | Knowledge graph storage |
| `chat_store` | `BaseChatStore` | `SimpleChatStore` | Chat history storage |
| `image_store` | `BaseImageStore` | `None` | Image storage backend |

### Persistence Methods

| Method | Description |
|--------|-------------|
| `persist(persist_dir, ...)` | Save all storage components to disk |
| `from_defaults(**kwargs)` | Create context with default or specified settings |
| `load_index_from_storage()` | Class method to load index from persisted storage |

## Document Store

The document store manages the storage and retrieval of `BaseDocument` objects. LlamaIndex provides several document store implementations.

### SimpleDocumentStore

The default in-memory document store backed by SQLite for persistence.

```python
from llama_index.core.storage.docstore import SimpleDocumentStore

docstore = SimpleDocumentStore(
    persist_path="./docstore.json",
    redis_host="localhost",
    redis_port=6379,
    redis_password=None
)
```

### Document Store API

| Method | Description |
|--------|-------------|
| `add_documents(documents, batch_size)` | Add documents to the store |
| `get_document(doc_id)` | Retrieve a document by ID |
| `delete(doc_id)` | Remove a document by ID |
| `get_nodes(node_ids)` | Retrieve nodes by their IDs |
| `get_all_nodes()` | Retrieve all nodes from the store |
| `persist(persist_path)` | Persist the document store to disk |

### Data Model

Documents are stored with the following structure:

```python
class BaseDocument:
    id_: str              # Unique identifier
    embedding: List[float]  # Vector embedding
    metadata: Dict[str, Any]  # User-defined metadata
    text: str             # Document text content
    excluded_embed_metadata_keys: List[str]
    excluded_llm_metadata_keys: List[str]
    relationships: Dict[DocumentRelationship, str]
    hash: str             # Computed hash for caching
    __class__: type       # Document type (optional)
```

## Index Store

The index store manages index metadata and structure, enabling efficient retrieval of index components.

### SimpleIndexStore

The default index store implementation using JSON file storage.

```python
from llama_index.core.storage.index_store import SimpleIndexStore

index_store = SimpleIndexStore(
    persist_path="./index_store.json"
)
```

### Index Store API

| Method | Description |
|--------|-------------|
| `add_index_struct(index_struct)` | Store an index structure |
| `get_index_struct(struct_id)` | Retrieve index structure by ID |
| `get_index_structs()` | List all stored index structures |
| `delete_index_struct(struct_id)` | Remove an index structure |

### Supported Index Types

| Index Type | Description |
|------------|-------------|
| `VectorStoreIndex` | Dense vector-based retrieval |
| `SummaryIndex` | Summary-based indexing |
| `KeywordTableIndex` | Keyword-based retrieval |
| `KnowledgeGraphIndex` | Graph-based knowledge indexing |

## Chat Store

The chat store manages conversation history for multi-turn interactions with language models.

### SimpleChatStore

A persistent chat store implementation for storing and retrieving chat messages.

```python
from llama_index.core.storage.chat_store import SimpleChatStore

chat_store = SimpleChatStore(
    persist_path="./chat_store.json"
)
```

### Chat Store API

| Method | Description |
|--------|-------------|
| `add_message(chat_id, message, role)` | Append a message to a chat session |
| `get_messages(chat_id)` | Retrieve all messages for a chat |
| `get_chat(chat_id)` | Get chat session details |
| `delete_chat(chat_id)` | Remove a chat session |
| `persist(persist_path)` | Save chat history to disk |

### Message Structure

| Field | Type | Description |
|-------|------|-------------|
| `role` | `str` | Message role (user/assistant/system) |
| `content` | `str` | Message text content |
| `additional_kwargs` | `Dict` | Extra metadata for the message |

## Storage Workflow

```mermaid
graph LR
    A[Create Documents] --> B[Initialize StorageContext]
    B --> C{Configure Backends}
    C --> D[In-Memory]
    C --> E[Persistent]
    D --> F[Build Index]
    E --> F
    F --> G[Index Created]
    G --> H[Persist to Disk]
    H --> I[StorageContext.persist]
    
    J[Load Index] --> K[load_index_from_storage]
    K --> L[Index Ready]
```

## Usage Examples

### Basic Persistence

```python
from llama_index.core import VectorStoreIndex, StorageContext

# Create documents
documents = [...]

# Create index with storage context
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = VectorStoreIndex.from_documents(
    documents, 
    storage_context=storage_context
)

# Explicitly persist (optional - also happens on garbage collection)
index.storage_context.persist()
```

### Loading Persisted Index

```python
from llama_index.core import StorageContext, load_index_from_storage

# Rebuild storage context from disk
storage_context = StorageContext.from_defaults(persist_dir="./storage")

# Load existing index
index = load_index_from_storage(storage_context=storage_context)

# Query the loaded index
query_engine = index.as_query_engine()
response = query_engine.query("Your question here")
```

### Custom Storage Configuration

```python
from llama_index.core import StorageContext
from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.core.storage.index_store import SimpleIndexStore

# Create custom stores
docstore = SimpleDocumentStore(persist_path="./custom_docstore.json")
index_store = SimpleIndexStore(persist_path="./custom_index_store.json")

# Configure storage context with custom stores
storage_context = StorageContext(
    docstore=docstore,
    index_store=index_store,
    persist_dir="./custom_storage"
)

# Use with index
index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context
)
```

## Storage Backend Comparison

| Backend | Persistence | Performance | Scalability | Use Case |
|---------|-------------|-------------|-------------|----------|
| `SimpleDocumentStore` | JSON/SQLite | Medium | Low-Medium | Development, small datasets |
| `RedisDocumentStore` | Redis | High | High | Production, distributed systems |
| `MongoDocumentStore` | MongoDB | High | Very High | Large-scale deployments |
| `KVDocumentStore` | Key-Value | High | Medium-High | General purpose |

## Best Practices

1. **Always specify unique document IDs**: Prevents duplicate entries and enables predictable retrieval
   ```python
   Document(id_="unique_doc_1", text="content")
   ```

2. **Configure persistence early**: Set up storage context before building indexes to avoid data loss

3. **Use appropriate batch sizes**: When adding many documents, use batch operations for better performance

4. **Handle persistence errors**: Wrap persistence calls in try-except blocks for robustness

5. **Backup important data**: Regularly backup persisted storage directories

## Related Components

- **Vector Stores**: Manage embedding vectors for semantic search
- **Graph Stores**: Handle knowledge graph data structures
- **Image Stores**: Store image data for multimodal applications
- **Query Engines**: Use storage to retrieve relevant documents for queries
- **Retrievers**: Access stored data for retrieval-augmented generation

---

<a id='query-engines'></a>

## Query Engines

### 相关页面

相关主题：[Retrieval and Reranking](#retrieval-reranking), [Documents and Nodes](#documents-nodes)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [llama-index-core/llama_index/core/query_engine/retriever_query_engine.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/query_engine/retriever_query_engine.py)
- [llama-index-core/llama_index/core/query_engine/sub_question_query_engine.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/query_engine/sub_question_query_engine.py)
- [llama-index-core/llama_index/core/response_synthesizers/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/response_synthesizers/base.py)
- [llama-index-core/llama_index/core/indices/vector_store/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/vector_store/base.py)
- [llama-index-core/llama_index/core/query_engine/__init__.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/query_engine/__init__.py)
</details>

# Query Engines

Query Engines are the core components in LlamaIndex responsible for processing user queries and returning relevant responses by retrieving, synthesizing, and formatting information from indexed data.

## Overview

Query Engines serve as the primary interface for querying indexed documents in LlamaIndex. They coordinate the retrieval of relevant context from the index and synthesize this information into coherent, helpful responses using Large Language Models (LLMs).

**Key Responsibilities:**

- Receive user queries and transform them into retrieval operations
- Coordinate with retrievers to fetch relevant documents or data chunks
- Route queries to appropriate response synthesizers
- Handle query-time configuration such as similarity thresholds and response modes

资料来源：[llama-index-core/llama_index/core/query_engine/__init__.py]()

## Architecture

The query engine architecture follows a modular pipeline pattern where different components handle specific stages of query processing.

```mermaid
graph TD
    A[User Query] --> B[Query Engine]
    B --> C[Retriever]
    C --> D[Node Postprocessor]
    D --> E[Response Synthesizer]
    E --> F[LLM]
    F --> G[Response]
    
    H[Vector Store Index] --> C
    I[Summary Index] --> C
    J[Knowledge Graph Index] --> C
```

### Core Components

| Component | Purpose | Location |
|-----------|---------|----------|
| BaseQueryEngine | Abstract base class defining the query interface | `llama_index.core.query_engine` |
| RetrieverQueryEngine | Default query engine using retrievers | `retriever_query_engine.py` |
| SubQuestionQueryEngine | Decomposes complex queries into sub-questions | `sub_question_query_engine.py` |
| ResponseSynthesizer | Generates responses from retrieved context | `llama_index.core.response_synthesizers` |

资料来源：[llama-index-core/llama_index/core/query_engine/retriever_query_engine.py]()

## RetrieverQueryEngine

The `RetrieverQueryEngine` is the default query engine implementation that combines retrieval with response synthesis.

### Initialization

```python
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
```

### Constructor Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| retriever | BaseRetriever | Required | The retriever used to fetch relevant nodes |
| response_synthesizer | BaseSynthesizer | None | Synthesizer for generating responses |
| node_postprocessors | List[BaseNodePostprocessor] | [] | Post-processors applied after retrieval |
| callback_manager | CallbackManager | None | Manages callbacks for query events |

资料来源：[llama-index-core/llama_index/core/query_engine/retriever_query_engine.py:40-60]()

### Query Flow

```mermaid
sequenceDiagram
    participant User
    participant QueryEngine
    participant Retriever
    participant Postprocessor
    participant Synthesizer
    participant LLM
    
    User->>QueryEngine: query(question)
    QueryEngine->>Retriever: retrieve(query_str)
    Retriever-->>QueryEngine: nodes[]
    QueryEngine->>Postprocessor: postprocess(nodes)
    Postprocessor-->>QueryEngine: filtered_nodes[]
    QueryEngine->>Synthesizer: synthesize(query_str, nodes)
    Synthesizer->>LLM: generate(prompt)
    LLM-->>Synthesizer: response
    Synthesizer-->>QueryEngine: Response
    QueryEngine-->>User: Response
```

## SubQuestionQueryEngine

The `SubQuestionQueryEngine` handles complex queries by decomposing them into simpler sub-questions that can be answered independently.

### Use Cases

- Queries requiring information from multiple data sources
- Complex questions that benefit from step-by-step reasoning
- Multi-hop questions requiring logical deduction

### Configuration

```python
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.callbacks import CallbackManager

query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    callback_manager=CallbackManager([callback]),
    verbose=True
)
```

### Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| query_engine_tools | List[QueryEngineTool] | Required | List of query engines and their descriptions |
| response_synthesizer | BaseSynthesizer | None | Response synthesizer to use |
| sub_question_name | str | "sub_question" | Name for sub-question events |
| parent_name | str | "parent_question" | Name for parent question events |
| callback_manager | CallbackManager | None | Callback manager for events |
| verbose | bool | False | Enable verbose output |

资料来源：[llama-index-core/llama_index/core/query_engine/sub_question_query_engine.py:50-80]()

## Response Synthesizers

Response Synthesizers transform retrieved context into natural language responses.

### Available Synthesizer Types

| Synthesizer | Description | Use Case |
|-------------|-------------|----------|
| CompactAndRefine | Compacts retrieved context before generating | Large retrieval results |
| TreeSummarize | Hierarchically summarizes retrieved nodes | Comprehensive responses |
| SimpleSummarize | Direct concatenation and summarization | Quick, simple responses |
| Refine | Iteratively improves response quality | High-quality refinement |
| Accumulate | Combines responses from multiple sources | Multi-source queries |
| Generation | Direct LLM generation from context | Simple generation tasks |

### Base Interface

```python
class BaseSynthesizer(ABC):
    @abstractmethod
    async def synthesize(
        self,
        query: QueryBundle,
        nodes: List[NodeWithScore],
        **kwargs: Any
    ) -> Response:
        pass
```

资料来源：[llama-index-core/llama_index/core/response_synthesizers/base.py:30-50]()

## Vector Store Index Query Engine

The `VectorStoreIndex` provides built-in query engine creation through the `as_query_engine()` method.

### Factory Method Parameters

```python
index.as_query_engine(
    query_mode: str = "default",
    similarity_top_k: int = 10,
    vector_store_query_mode: str = "default",
    alpha: Optional[float] = None,
    **kwargs: Any
) -> BaseQueryEngine
```

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| query_mode | str | "default" | Query execution mode |
| similarity_top_k | int | 10 | Number of top results to retrieve |
| vector_store_query_mode | str | "default" | Vector store specific query mode |
| alpha | float | None | Hybrid search weight (0-1, default 0.5) |

资料来源：[llama-index-core/llama_index/core/indices/vector_store/base.py]()

### Query Modes

| Mode | Description |
|------|-------------|
| `default` | Standard retrieval based on similarity |
| `mmr` | Maximum Marginal Relevance for diverse results |
| `hybrid` | Combines sparse and dense retrieval |

## Query Engine Tool

For agent-based workflows, query engines can be wrapped as tools using the `QueryEngineTool` class.

```python
from llama_index.core.tools import QueryEngineTool

tool = QueryEngineTool(
    query_engine=query_engine,
    metadata=ToolMetadata(
        name="website_index",
        description="Useful for answering questions about text on websites",
    )
)
```

资料来源：[llama-index-core/llama_index/core/query_engine/sub_question_query_engine.py:100-120]()

## Advanced Configuration

### Node Post-processors

Post-processors filter and enhance retrieved nodes before synthesis.

```python
from llama_index.core.postprocessor import SimilarityPostprocessor

query_engine = index.as_query_engine(
    node_postprocessors=[
        SimilarityPostprocessor(similarity_cutoff=0.7)
    ]
)
```

### Custom Query Engines

Create custom query engines by extending the base class:

```python
from llama_index.core.query_engine import BaseQueryEngine

class CustomQueryEngine(BaseQueryEngine):
    def __init__(self, retriever, synthesizer):
        self._retriever = retriever
        self._synthesizer = synthesizer
    
    async def _aquery(self, query_bundle: QueryBundle) -> Response:
        nodes = await self._retriever.aretrieve(query_bundle)
        response = await self._synthesizer.synthesize(
            query_bundle, nodes
        )
        return response
```

## Async Query Execution

Query engines support both sync and async execution patterns:

```python
# Synchronous
response = query_engine.query("What is LlamaIndex?")

# Asynchronous
response = await query_engine.aquery("What is LlamaIndex?")
```

## Integration with Vector Indices

Query engines integrate with various index types:

| Index Type | Default Query Engine | Features |
|------------|---------------------|----------|
| VectorStoreIndex | RetrieverQueryEngine | Semantic similarity search |
| SummaryIndex | RetrieverQueryEngine | Full document retrieval |
| KnowledgeGraphIndex | RetrieverQueryEngine | Graph-based traversal |
| ComposableGraph | SubQuestionQueryEngine | Multi-index queries |

## Best Practices

1. **Choose appropriate top_k**: Balance between response quality and speed (typically 3-10 for most use cases)

2. **Use sub-question engine for complex queries**: When queries require reasoning across multiple sources

3. **Configure similarity thresholds**: Filter low-quality matches using post-processors

4. **Enable callbacks for debugging**: Monitor query execution flow and performance

5. **Select appropriate synthesizers**: Match the synthesizer type to your response quality requirements

## Summary

Query Engines in LlamaIndex provide a flexible, extensible framework for retrieving and synthesizing information from indexed data. The modular architecture allows for customization at every stage of the query pipeline, from retrieval configuration to response generation.

**Key Takeaways:**

- Query engines orchestrate the retrieval-synthesis pipeline
- `RetrieverQueryEngine` handles standard query flows
- `SubQuestionQueryEngine` decomposes complex queries
- Response synthesizers generate final output from context
- Extensive configuration options enable fine-tuned control

资料来源：[llama-index-core/llama_index/core/query_engine/retriever_query_engine.py]()
资料来源：[llama-index-core/llama_index/core/query_engine/sub_question_query_engine.py]()
资料来源：[llama-index-core/llama_index/core/response_synthesizers/base.py]()
资料来源：[llama-index-core/llama_index/core/indices/vector_store/base.py]()

---

<a id='retrieval-reranking'></a>

## Retrieval and Reranking

### 相关页面

相关主题：[Query Engines](#query-engines), [Storage Systems](#storage-systems)

<details>
<summary>Relevant Source Files</summary>

以下源码文件用于生成本页说明：

- [llama-index-core/llama_index/core/retrievers/recursive_retriever.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/retrievers/recursive_retriever.py)
- [llama-index-core/llama_index/core/postprocessor/llm_rerank.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/postprocessor/llm_rerank.py)
- [llama-index-core/llama_index/core/postprocessor/node.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/postprocessor/node.py)
- [llama-index-core/llama_index/core/indices/property_graph/retriever.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/property_graph/retriever.py)
- [llama-index-integrations/readers/llama-index-readers-docling/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/readers/llama-index-readers-docling/README.md)
- [llama-index-integrations/readers/llama-index-readers-web/llama_index/readers/web/simple_web/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/readers/llama-index-readers-web/llama_index/readers/web/simple_web/README.md)
</details>

# Retrieval and Reranking

## Overview

Retrieval and Reranking are fundamental components in LlamaIndex's architecture for building effective Retrieval-Augmented Generation (RAG) systems. The retrieval system identifies relevant context from various data sources, while the reranking system reorders retrieved results to optimize relevance using advanced techniques like LLM-based scoring.

In LlamaIndex, retrieval is handled through a flexible retriever abstraction that supports multiple retrieval strategies including vector-based search, keyword search, and hybrid approaches. Reranking serves as a post-processing step that improves result quality by reordering retrieved nodes based on more sophisticated relevance criteria.

## Architecture Overview

```mermaid
graph TD
    A[Query Input] --> B[Retrieval Phase]
    B --> C[Vector/Knowledge Graph Retrieval]
    C --> D[Initial Node Set]
    D --> E[Reranking Phase]
    E --> F[LLM Reranker]
    F --> G[Reordered Results]
    G --> H[Response Generation]
    
    I[Document Sources] --> J[Indexing]
    J --> K[Vector Store / Graph Store]
    K --> C
```

## Retrieval Components

### Retriever Abstraction

LlamaIndex provides a base `BaseRetriever` class that defines the interface for all retrieval implementations. Retrievers work in conjunction with indices to fetch relevant nodes from vector stores or knowledge graphs.

**Core Retriever Classes:**

| Component | File Path | Purpose |
|-----------|-----------|---------|
| `BaseRetriever` | `llama-index-core/llama_index/core/retrievers/` | Abstract base for all retrievers |
| `RecursiveRetriever` | `llama-index-core/llama_index/core/retrievers/recursive_retriever.py` | Multi-level recursive retrieval |
| `PropertyGraphRetriever` | `llama-index-core/llama_index/core/indices/property_graph/retriever.py` | Graph-based retrieval |

### Recursive Retriever

The `RecursiveRetriever` enables multi-level, hierarchical retrieval across different data sources and node types. It supports recursive traversal of indices and can fetch related nodes across different retrieval strategies.

**Key Features:**
- Recursive node resolution across index hierarchies
- Support for multiple retriever types in a chain
- Handling of nested document structures

**Source:** `llama-index-core/llama_index/core/retrievers/recursive_retriever.py`

### Property Graph Retriever

The Property Graph Retriever leverages knowledge graphs for retrieval, enabling structured queries over entity-relationship data. This retriever is particularly effective for complex queries requiring relationship-aware context.

**Capabilities:**
- Graph traversal-based retrieval
- Entity filtering and relationship queries
- Support for hybrid graph + vector search

**Source:** `llama-index-core/llama_index/core/indices/property_graph/retriever.py:1-100`

## Reranking System

### Purpose and Role

Reranking improves retrieval quality by reordering initially retrieved candidates using more sophisticated relevance models. After an initial retrieval pass identifies candidate nodes, rerankers evaluate and reorder these results to maximize relevance to the query.

### LLM Reranker

The `LLMRerank` post-processor uses a Language Model to score and reorder retrieved nodes based on semantic relevance. This approach provides higher quality ranking compared to simple vector similarity.

**Key Parameters:**

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `top_n` | `int` | `None` | Number of top results to return after reranking |
| `choice_batch_size` | `int` | `10` | Batch size for LLM ranking choices |
| `llm` | `BaseLLM` | `None` | Language model for scoring |
| `verbose` | `bool` | `False` | Enable verbose output |

**Source:** `llama-index-core/llama_index/core/postprocessor/llm_rerank.py`

### Node Post-Processors

The `NodePostprocessor` class provides additional filtering and transformation capabilities for retrieved nodes. These processors operate on the node level and can apply various transformations before final output.

**Common Post-Processing Operations:**
- Duplicate removal
- Similarity threshold filtering
- Metadata-based filtering

**Source:** `llama-index-core/llama_index/core/postprocessor/node.py`

## Data Flow

```mermaid
graph LR
    A[User Query] --> B[Vector Search]
    B --> C[Top-K Nodes]
    C --> D[Post-Processors]
    D --> E[LLM Reranker]
    E --> F[Reranked Nodes]
    F --> G[Context for LLM]
    
    H[Documents] --> I[Indexing Pipeline]
    I --> J[Embedding Model]
    J --> K[Vector Store]
    K --> B
```

## Integration with Data Loaders

LlamaIndex's retrieval system integrates seamlessly with various data loaders that prepare documents for indexing and retrieval.

### Supported Data Sources

| Reader | Use Case | Integration |
|--------|----------|-------------|
| `DoclingReader` | PDF, DOCX, HTML | `llama-index-readers-docling` |
| `SimpleWebPageReader` | Static websites | `llama-index-readers-web` |
| `RemoteDepthReader` | Multi-level URL crawling | `llama-index-readers-remote-depth` |
| `WikipediaReader` | Wikipedia articles | `llama-index-readers-wikipedia` |
| `SemanticScholarReader` | Academic papers | `llama-index-readers-semanticscholar` |

**Source:** `llama-index-integrations/readers/llama-index-readers-docling/README.md`

### Document Processing Pipeline

Documents loaded through readers undergo the following processing:

1. **Parsing** - Extract text content from various formats (PDF, DOCX, HTML)
2. **Node Parsing** - Split documents into semantic chunks (nodes)
3. **Embedding** - Generate vector embeddings for each node
4. **Indexing** - Store nodes and embeddings in appropriate stores
5. **Retrieval** - Fetch relevant nodes based on queries

**Source:** `llama-index-integrations/readers/llama-index-readers-web/llama_index/readers/web/simple_web/README.md`

## Usage Patterns

### Basic Retrieval with Reranking

```python
from llama_index.core import VectorStoreIndex
from llama_index.core.postprocessor import LLMRerank

# Load documents and create index
index = VectorStoreIndex.from_documents(documents)

# Configure reranking
reranker = LLMRerank(
    top_n=5,
    choice_batch_size=10
)

# Query with reranking
query_engine = index.as_query_engine(
    node_postprocessors=[reranker]
)

response = query_engine.query("Your question here")
```

### Recursive Retrieval

```python
from llama_index.core.retrievers import RecursiveRetriever

# Configure recursive retrieval across multiple levels
recursive_retriever = RecursiveRetriever(
    retriever_dict={
        "root": vector_retriever,
        "documents": document_retriever
    }
)
```

## Configuration Options

### Retrieval Configuration

| Option | Description | Applies To |
|--------|-------------|------------|
| `similarity_top_k` | Number of initial candidates | Vector retrieval |
| `retrieval_mode` | Vector, keyword, or hybrid | Hybrid search |
| `node_postprocessors` | List of post-processing steps | All retrievers |

### Reranking Configuration

| Option | Description | Default |
|--------|-------------|---------|
| `top_n` | Final number of results | 5 |
| `score_threshold` | Minimum relevance score | None |
| `model` | Reranking model | gpt-3.5-turbo |

## Advanced Topics

### Hybrid Retrieval with Reranking

Combining vector and keyword search with LLM reranking provides robust retrieval across diverse query types:

1. **Vector Search** - Captures semantic similarity
2. **Keyword Search** - Captures exact term matching
3. **LLM Reranking** - Optimizes final ordering

### Custom Retrievers

Developers can create custom retrievers by extending `BaseRetriever`:

```python
from llama_index.core.retrievers import BaseRetriever

class CustomRetriever(BaseRetriever):
    def _retrieve(self, query_bundle):
        # Custom retrieval logic
        pass
```

## Summary

Retrieval and Reranking in LlamaIndex form a two-phase system where initial retrieval identifies candidate nodes and reranking optimizes their ordering. The architecture supports multiple retrieval strategies (vector, graph, recursive) and leverages LLM-based reranking for improved result quality. Integration with various data loaders enables seamless indexing from diverse sources, while the post-processor abstraction allows flexible pipeline customization.

---

<a id='agent-framework'></a>

## Agent Framework

### 相关页面

相关主题：[Memory Systems](#memory-systems)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [llama-index-core/llama_index/core/agent/react/formatter.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/agent/react/formatter.py)
- [llama-index-core/llama_index/core/agent/workflow/base_agent.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/agent/workflow/base_agent.py)
- [llama-index-core/llama_index/core/agent/workflow/multi_agent_workflow.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/agent/workflow/multi_agent_workflow.py)
- [llama-index-core/llama_index/core/tools/function_tool.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/tools/function_tool.py)
</details>

# Agent Framework

## Overview

The LlamaIndex Agent Framework provides a flexible, extensible system for building AI agents that can reason, plan, and execute actions using tools. The framework enables the creation of both single-agent and multi-agent systems capable of interacting with external data sources, performing complex reasoning tasks, and orchestrating workflows.

Agents in LlamaIndex are designed to combine large language model (LLM) capabilities with structured tool usage, memory management, and workflow orchestration. The framework supports various agent types including ReAct (Reasoning + Acting) agents and workflow-based agents.

资料来源：[llama-index-core/llama_index/core/agent/workflow/base_agent.py:1-50]()

## Architecture Overview

```mermaid
graph TD
    A[User Query] --> B[Agent]
    B --> C[Reasoning Engine]
    C --> D[Tool System]
    D --> E[External Tools]
    C --> F[Memory]
    B --> G[Workflow Orchestrator]
    G --> H[Sub-Agents]
    H --> D
```

The framework is built on several key components that work together to enable sophisticated agent behaviors:

| Component | Purpose |
|-----------|---------|
| **Agent** | Core entity that processes queries and generates responses |
| **Reasoning Engine** | Handles thought processes and decision making |
| **Tool System** | Provides access to external functions and APIs |
| **Memory** | Stores conversation history and intermediate results |
| **Workflow Orchestrator** | Manages complex multi-step tasks |

资料来源：[llama-index-core/llama_index/core/agent/workflow/base_agent.py:50-100]()

## ReAct Agent

The ReAct (Synergizing Reasoning and Acting) agent implements a reasoning loop that combines thought processes with tool actions. This agent type is particularly effective for tasks requiring logical deduction and external information retrieval.

### ReAct Formatter

The ReAct formatter is responsible for constructing prompts that guide the agent through the reasoning-action-observation cycle. It defines the structure of thoughts, actions, and observations in the agent's prompt.

```mermaid
graph LR
    A[Thought] --> B[Action]
    B --> C[Observation]
    C --> A
```

#### Key Components

| Component | Description |
|-----------|-------------|
| `system_prompt` | Instructions for the agent's role and behavior |
| `tool_prompt` | Description of available tools |
| `formatter` | Defines the format for thoughts, actions, observations |
| `examples` | Few-shot examples for better performance |

资料来源：[llama-index-core/llama_index/core/agent/react/formatter.py:1-80]()

### ReAct Output Parsing

The ReAct agent uses specialized output parsers to extract structured information from LLM responses:

```python
class ReActOutputParser:
    def parse(self, output: str) -> ActionOutput:
        # Parse thought, action, and action input from output
        pass
```

This parsing enables the agent to:
1. Extract the reasoning thought process
2. Identify the tool to invoke
3. Extract the tool's input parameters
4. Process the tool's output as an observation

资料来源：[llama-index-core/llama_index/core/agent/react/formatter.py:80-150]()

## Workflow-Based Agents

Workflow-based agents provide a more structured approach to agent execution, using state machines and defined steps to process queries.

### Base Agent

The `BaseAgent` class provides the foundation for all agent implementations in the workflow system:

```mermaid
graph TD
    A[Input] --> B[State Machine]
    B --> C{Step Execution}
    C -->|Step 1| D[Process Step]
    D --> E[Update State]
    E --> C
    C -->|Complete| F[Generate Response]
```

#### Base Agent API

| Method | Description |
|--------|-------------|
| `run()` | Execute the agent with input |
| `reset()` | Reset agent state |
| `get_state()` | Retrieve current agent state |
| `set_state()` | Set agent state |

资料来源：[llama-index-core/llama_index/core/agent/workflow/base_agent.py:100-200]()

### Agent State Management

Agents maintain state throughout their execution, which includes:

| State Component | Type | Purpose |
|-----------------|------|---------|
| `input` | str | Original user input |
| `current_step` | int | Current execution step |
| `memory` | Memory | Conversation history |
| `context` | dict | Additional context data |
| `steps` | List[Step] | Executed steps |
| `output` | Any | Final output |

资料来源：[llama-index-core/llama_index/core/agent/workflow/base_agent.py:200-300]()

## Tool System

The Tool System enables agents to interact with external resources and perform actions beyond text generation.

### Function Tool

`FunctionTool` provides a decorator-based interface for creating tools from Python functions:

```python
from llama_index.core.tools import FunctionTool

@FunctionTool.from_defaults
def search_database(query: str) -> str:
    """Search the knowledge base for relevant information."""
    # Implementation here
    return results
```

#### FunctionTool Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `fn` | Callable | Required | The function to wrap |
| `name` | str | Function name | Tool identifier |
| `description` | str | Function docstring | Tool description for LLM |
| `fn_schema` | BaseModel | Auto-generated | Input schema |
| `return_direct` | bool | False | Return raw output |

资料来源：[llama-index-core/llama_index/core/tools/function_tool.py:1-100]()

### Tool Execution Flow

```mermaid
sequenceDiagram
    participant Agent
    participant ToolRegistry
    participant FunctionTool
    participant External

    Agent->>ToolRegistry: Request tool by name
    ToolRegistry->>FunctionTool: Get tool instance
    FunctionTool->>External: Execute function
    External-->>FunctionTool: Return result
    FunctionTool-->>Agent: Format response
```

### Creating Custom Tools

Tools can be created using the `@FunctionTool.from_defaults` decorator:

```python
@FunctionTool.from_defaults(name="calculator", description="Perform mathematical calculations")
def calculate(expression: str) -> str:
    """Evaluate a mathematical expression."""
    return str(eval(expression))
```

Or programmatically:

```python
from llama_index.core.tools import FunctionTool

def my_function(arg1: str, arg2: int) -> str:
    return f"{arg1} repeated {arg2} times"

tool = FunctionTool.from_defaults(
    fn=my_function,
    name="my_tool",
    description="Custom tool description"
)
```

资料来源：[llama-index-core/llama_index/core/tools/function_tool.py:100-200]()

## Multi-Agent Workflows

Multi-agent systems enable complex task decomposition where different specialized agents collaborate to solve problems.

### Multi-Agent Workflow Architecture

```mermaid
graph TD
    A[Coordinator Agent] --> B[Specialist Agent 1]
    A --> C[Specialist Agent 2]
    A --> D[Specialist Agent N]
    B --> E[Tool 1]
    C --> F[Tool 2]
    D --> G[Tool N]
    B --> A
    C --> A
    D --> A
```

### Workflow Communication

Agents communicate through a shared state and message-passing mechanism:

| Message Type | Direction | Purpose |
|--------------|-----------|---------|
| `task` | Coordinator → Specialist | Assign task |
| `result` | Specialist → Coordinator | Return results |
| `query` | Any → Any | Request information |
| `response` | Any → Any | Provide information |

资料来源：[llama-index-core/llama_index/core/agent/workflow/multi_agent_workflow.py:1-100]()

### Creating Multi-Agent Systems

```python
from llama_index.core.agent.workflow import MultiAgentWorkflow

# Create specialized agents
research_agent = ReActAgent.from_tools(tools=[search_tool], name="researcher")
analysis_agent = ReActAgent.from_tools(tools=[analysis_tool], name="analyst")

# Create multi-agent workflow
workflow = MultiAgentWorkflow(agents=[research_agent, analysis_agent])

# Execute workflow
result = workflow.run(user_input="Analyze the latest research on AI")
```

资料来源：[llama-index-core/llama_index/core/agent/workflow/multi_agent_workflow.py:100-200]()

## Tool Integration with LlamaIndex Readers

The Agent Framework integrates seamlessly with LlamaIndex's document readers, enabling agents to query and reason over loaded documents:

```python
from llama_index.core import VectorStoreIndex
from llama_index.core.agent import ReActAgent

# Load documents
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)

# Create query engine tool
query_tool = index.as_query_engine()

# Create agent with query tool
agent = ReActAgent.from_tools(tools=[query_tool])
response = agent.chat("What is the main topic of these documents?")
```

This integration allows agents to:
- Query vector databases
- Retrieve relevant context
- Synthesize information from multiple sources
- Perform RAG (Retrieval-Augmented Generation)

## Best Practices

### Designing Effective Tools

| Guideline | Rationale |
|-----------|-----------|
| Clear descriptions | Helps LLM understand when to use the tool |
| Structured outputs | Easier for agent to parse and use results |
| Error handling | Prevents agent crashes from tool failures |
| Idempotent operations | Enables safe retries |

### Agent Configuration

| Parameter | Recommendation |
|-----------|----------------|
| `max_iterations` | Set based on task complexity (default: 10) |
| `timeout` | Allow sufficient time for tool execution |
| `memory_type` | Use conversation memory for multi-turn interactions |
| `tool_retriever` | Implement for large tool collections |

### Debugging Agents

1. **Enable verbose mode** to see agent's reasoning traces
2. **Log tool inputs/outputs** to verify correct tool usage
3. **Test tools independently** before combining with agent
4. **Monitor token usage** to prevent excessive spending

## See Also

- [ReAct Agent Documentation](https://docs.llamaindex.ai/en/stable/module_guides/deploying/agents/)
- [Workflow Documentation](https://docs.llamaindex.ai/en/stable/module_guides/deploying/workflow/)
- [Tool System](https://docs.llamaindex.ai/en/stable/module_guides/deploying/tools/)
- [LlamaIndex Core](https://github.com/run-llama/llama_index/tree/main/llama-index-core)

---

<a id='memory-systems'></a>

## Memory Systems

### 相关页面

相关主题：[Agent Framework](#agent-framework), [Storage Systems](#storage-systems)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [llama-index-core/llama_index/core/memory/chat_memory_buffer.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/memory/chat_memory_buffer.py)
- [llama-index-core/llama_index/core/memory/chat_summary_memory_buffer.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/memory/chat_summary_memory_buffer.py)
- [llama-index-core/llama_index/core/memory/vector_memory.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/memory/vector_memory.py)
- [llama-index-core/llama_index/core/memory/simple_composable_memory.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/memory/simple_composable_memory.py)
- [llama-index-integrations/memory/llama-index-memory-mem0/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/memory/llama-index-memory-mem0/README.md)
</details>

# Memory Systems

Memory Systems in LlamaIndex provide persistent conversation history management for chat engines and agents. They enable AI applications to maintain context across multiple interactions, store user preferences, and retrieve relevant historical information during conversations.

## Architecture Overview

Memory Systems follow a modular architecture that allows different memory implementations to be composed and used interchangeably. The core memory system supports multiple storage strategies including buffer-based, summary-based, and vector-based retrieval.

```mermaid
graph TD
    A[Chat Engine / Agent] --> B[Memory System]
    B --> C[ChatMemoryBuffer]
    B --> D[ChatSummaryMemoryBuffer]
    B --> E[VectorMemory]
    B --> F[Mem0Memory]
    C --> G[SimpleComposableMemory]
    D --> G
    E --> G
    F --> H[External Memory Services]
    
    G --> I[Storage Backend]
    H --> J[Mem0 Platform API]
```

## Core Memory Components

### ChatMemoryBuffer

`ChatMemoryBuffer` is the foundational memory component that stores conversation history in a simple buffer structure. It maintains a list of chat messages and provides methods for adding, retrieving, and managing conversation context.

| Parameter | Type | Description |
|-----------|------|-------------|
| `chat_history` | `List[ChatMessage]` | List of conversation messages |
| `size` | `int` | Maximum number of messages to retain |
| `tokenizer` | `Callable` | Function to count tokens |

**资料来源：** [llama-index-core/llama_index/core/memory/chat_memory_buffer.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/memory/chat_memory_buffer.py)

### ChatSummaryMemoryBuffer

`ChatSummaryMemoryBuffer` extends the basic buffer with summarization capabilities. When the conversation exceeds the configured size, older messages are condensed into a summary rather than being discarded entirely.

| Parameter | Type | Description |
|-----------|------|-------------|
| `llm` | `LLM` | LLM instance for generating summaries |
| `chat_history` | `List[ChatMessage]` | Initial conversation history |
| `size` | `int` | Maximum buffer size before summarization |
| `summary_exists` | `bool` | Flag indicating if summary is generated |

**资料来源：** [llama-index-core/llama_index/core/memory/chat_summary_memory_buffer.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/memory/chat_summary_memory_buffer.py)

### VectorMemory

`VectorMemory` uses vector embeddings to store and retrieve conversation history. This enables semantic search within the conversation history, allowing the system to find relevant past messages based on meaning rather than exact matches.

| Parameter | Type | Description |
|-----------|------|-------------|
| `vector_store` | `VectorStore` | Storage backend for embeddings |
| `embed_model` | `EmbeddingModel` | Model for generating embeddings |
| `index` | `VectorStoreIndex` | Index for efficient retrieval |
| `retriever` | `BaseRetriever` | Retrieval mechanism |

**资料来源：** [llama-index-core/llama_index/core/memory/vector_memory.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/memory/vector_memory.py)

### SimpleComposableMemory

`SimpleComposableMemory` provides a framework for combining multiple memory types into a unified interface. This allows different memory strategies to work together, leveraging the strengths of each approach.

| Feature | Description |
|---------|-------------|
| Memory Composition | Combine buffer, summary, and vector memories |
| Unified Interface | Single API for all memory operations |
| Flexible Retrieval | Query multiple memory sources simultaneously |

**资料来源：** [llama-index-core/llama_index/core/memory/simple_composable_memory.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/memory/simple_composable_memory.py)

## Mem0 Memory Integration

The `Mem0Memory` integration provides access to the Mem0 Platform for advanced memory management. Mem0 offers enhanced capabilities for semantic memory storage, user preference tracking, and cross-session persistence.

### Configuration Options

#### Client-Based Initialization

```python
from llama_index.memory.mem0 import Mem0Memory

context = {"user_id": "user_1"}
memory = Mem0Memory.from_client(
    context=context,
    api_key="<your-mem0-api-key>",
    search_msg_limit=4,
)
```

#### Config Dictionary Initialization

```python
memory = Mem0Memory.from_config(
    context=context,
    config={
        "provider": "openai",
        "config": {"model": "text-embedding-3-small"},
        "version": "v1.1",
    },
    search_msg_limit=4,
)
```

### Context Parameters

The Mem0 context identifies the entity for which memory is stored:

| Parameter | Description |
|-----------|-------------|
| `user_id` | Unique identifier for the user |
| `agent_id` | Unique identifier for the agent |
| `run_id` | Unique identifier for the conversation run |

**资料来源：** [llama-index-integrations/memory/llama-index-memory-mem0/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/memory/llama-index-memory-mem0/README.md)

## Usage Patterns

### Integration with SimpleChatEngine

```python
from llama_index.core import SimpleChatEngine
from llama_index.memory.mem0 import Mem0Memory

memory = Mem0Memory.from_client(
    context={"user_id": "user_1"},
    api_key="<your-api-key>",
)

chat_engine = SimpleChatEngine.from_defaults(
    llm=llm,
    memory=memory
)

response = chat_engine.chat("Hi, My name is Mayank")
```

### Integration with FunctionAgent

```python
from llama_index.core.tools import FunctionTool
from llama_index.memory.mem0 import Mem0Memory

memory = Mem0Memory.from_client(
    context={"user_id": "user_1"},
    api_key="<your-api-key>",
)

# Use memory with agent for persistent context
agent = FunctionAgent(
    llm=llm,
    tools=[call_tool, email_tool],
    memory=memory
)
```

**资料来源：** [llama-index-integrations/memory/llama-index-memory-mem0/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/memory/llama-index-memory-mem0/README.md)

## Memory Workflow

```mermaid
sequenceDiagram
    participant User
    participant ChatEngine
    participant Memory
    participant Storage
    
    User->>ChatEngine: Send message
    ChatEngine->>Memory: Get context (search_msg_limit messages)
    Memory->>Storage: Query recent messages
    Storage-->>Memory: Return relevant messages
    Memory-->>ChatEngine: Context messages
    ChatEngine->>ChatEngine: Generate response
    ChatEngine->>Memory: Store new message
    Memory->>Storage: Persist message
    ChatEngine-->>User: Return response
```

## Comparison of Memory Types

| Memory Type | Storage Method | Use Case | Scalability |
|-------------|----------------|----------|-------------|
| ChatMemoryBuffer | List/Buffer | Short conversations | Limited by token size |
| ChatSummaryMemoryBuffer | Condensed summaries | Long conversations | Better for extended chats |
| VectorMemory | Embeddings | Semantic search | Scales with vector store |
| Mem0Memory | External API | Production applications | Cloud-native scaling |

## Environment Configuration

For Mem0 integration, set the API key as an environment variable:

```bash
export MEM0_API_KEY="<your-mem0-api-key>"
```

For LLM integration within memory operations:

```bash
export OPENAI_API_KEY="<your-openai-api-key>"
```

**资料来源：** [llama-index-integrations/memory/llama-index-memory-mem0/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/memory/llama-index-memory-mem0/README.md)

## Best Practices

1. **Choose Appropriate Memory Type**: Select based on conversation length and retrieval needs
2. **Configure Token Limits**: Set appropriate `search_msg_limit` to balance context and performance
3. **Use Context Parameters**: Always provide user_id, agent_id, or run_id for proper memory isolation
4. **Consider Composability**: Use `SimpleComposableMemory` for complex memory requirements
5. **Monitor API Costs**: When using Mem0, track API usage for cost optimization

---

---

## Doramagic 踩坑日志

项目：run-llama/llama_index

摘要：发现 6 个潜在踩坑项，其中 0 个为 high/blocking；最高优先级：能力坑 - 能力判断依赖假设。

## 1. 能力坑 · 能力判断依赖假设

- 严重度：medium
- 证据强度：source_linked
- 发现：README/documentation is current enough for a first validation pass.
- 对用户的影响：假设不成立时，用户拿不到承诺的能力。
- 建议检查：将假设转成下游验证清单。
- 防护动作：假设必须转成验证项；没有验证结果前不能写成事实。
- 证据：capability.assumptions | github_repo:560704231 | https://github.com/run-llama/llama_index | README/documentation is current enough for a first validation pass.

## 2. 维护坑 · 维护活跃度未知

- 严重度：medium
- 证据强度：source_linked
- 发现：未记录 last_activity_observed。
- 对用户的影响：新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- 建议检查：补 GitHub 最近 commit、release、issue/PR 响应信号。
- 防护动作：维护活跃度未知时，推荐强度不能标为高信任。
- 证据：evidence.maintainer_signals | github_repo:560704231 | https://github.com/run-llama/llama_index | last_activity_observed missing

## 3. 安全/权限坑 · 下游验证发现风险项

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：下游已经要求复核，不能在页面中弱化。
- 建议检查：进入安全/权限治理复核队列。
- 防护动作：下游风险存在时必须保持 review/recommendation 降级。
- 证据：downstream_validation.risk_items | github_repo:560704231 | https://github.com/run-llama/llama_index | no_demo; severity=medium

## 4. 安全/权限坑 · 存在评分风险

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：风险会影响是否适合普通用户安装。
- 建议检查：把风险写入边界卡，并确认是否需要人工复核。
- 防护动作：评分风险必须进入边界卡，不能只作为内部分数。
- 证据：risks.scoring_risks | github_repo:560704231 | https://github.com/run-llama/llama_index | no_demo; severity=medium

## 5. 维护坑 · issue/PR 响应质量未知

- 严重度：low
- 证据强度：source_linked
- 发现：issue_or_pr_quality=unknown。
- 对用户的影响：用户无法判断遇到问题后是否有人维护。
- 建议检查：抽样最近 issue/PR，判断是否长期无人处理。
- 防护动作：issue/PR 响应未知时，必须提示维护风险。
- 证据：evidence.maintainer_signals | github_repo:560704231 | https://github.com/run-llama/llama_index | issue_or_pr_quality=unknown

## 6. 维护坑 · 发布节奏不明确

- 严重度：low
- 证据强度：source_linked
- 发现：release_recency=unknown。
- 对用户的影响：安装命令和文档可能落后于代码，用户踩坑概率升高。
- 建议检查：确认最近 release/tag 和 README 安装命令是否一致。
- 防护动作：发布节奏未知或过期时，安装说明必须标注可能漂移。
- 证据：evidence.maintainer_signals | github_repo:560704231 | https://github.com/run-llama/llama_index | release_recency=unknown

<!-- canonical_name: run-llama/llama_index; human_manual_source: deepwiki_human_wiki -->