llama_index Manual Preview

Doramagic Project Pack · Human Manual

llama_index

LlamaIndex serves as the foundational layer for building AI applications that require sophisticated data ingestion, indexing, and querying capabilities. The framework enables developers to:

Introduction to LlamaIndex

Related topics: Core Architecture, Quick Start Guide

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Package Structure

Continue reading this section for the full explanation and source context.

Section Import Patterns

Continue reading this section for the full explanation and source context.

Section 1. Document Loaders

Continue reading this section for the full explanation and source context.

Related topics: Core Architecture, Quick Start Guide

Introduction to LlamaIndex

LlamaIndex is a comprehensive data framework designed for building LLM (Large Language Model) applications. It provides the essential tools, abstractions, and integrations needed to connect custom data sources to LLMs for retrieval-augmented generation (RAG), question-answering systems, and other AI-powered applications.

Overview

LlamaIndex serves as the foundational layer for building AI applications that require sophisticated data ingestion, indexing, and querying capabilities. The framework enables developers to:

Ingest data from various sources (PDFs, documents, websites, databases)
Process and chunk data into optimal segments for LLM consumption
Create vector indices for efficient semantic search
Build query engines and retrieval pipelines
Integrate with hundreds of external services and model providers

Sources: README.md:1-20

Core Architecture

The LlamaIndex framework follows a modular architecture with distinct components that work together to provide end-to-end data pipeline capabilities.

Package Structure

LlamaIndex offers two primary installation methods to accommodate different use cases:

Package	Description	Use Case
`llama-index`	Starter package with core + selected integrations	Quick start, common setups
`llama-index-core`	Core package only	Custom, minimal deployments

Sources: README.md:45-55

Import Patterns

The framework uses a namespaced import system that distinguishes between core modules and integration packages:

# Core modules (included in llama-index-core)
from llama_index.core.xxx import ClassABC

# Integration modules (from separate packages)
from llama_index.xxx.yyy import SubclassABC

# Concrete examples
from llama_index.core.llms import LLM
from llama_index.llms.openai import OpenAI

Sources: README.md:56-68

Data Flow Architecture

The following diagram illustrates the typical data flow in a LlamaIndex application:

graph TD
    A[Data Sources] --> B[Readers/Loaders]
    B --> C[Documents]
    C --> D[Node Parsers]
    D --> E[Nodes/Chunks]
    E --> F[Vector Index]
    F --> G[Retriever]
    G --> H[Query Engine]
    H --> I[LLM Response]
    
    A1[Web Pages] --> B
    A2[PDFs] --> B
    A3[Databases] --> B
    A4[APIs] --> B

Key Components

1. Document Loaders

Document loaders (Readers) are responsible for ingesting data from external sources. LlamaIndex provides a vast ecosystem of readers:

Reader	Purpose	Source
`WikipediaReader`	Load Wikipedia pages	llama-index-readers-wikipedia
`WholeSiteReader`	Scrape entire websites	llama-index-readers-web
`DoclingReader`	Parse PDFs, DOCX, HTML	llama-index-readers-docling
`RemoteDepthReader`	Extract from URLs recursively	llama-index-readers-remote-depth

#### Wikipedia Reader Example

from llama_index.readers.wikipedia import WikipediaReader

reader = WikipediaReader()
documents = reader.load_data(pages=["Page Title 1", "Page Title 2"])

Sources: llama-index-readers-wikipedia/README.md:1-25

#### Docling Reader Example

from llama_index.readers.docling import DoclingReader

reader = DoclingReader()
docs = reader.load_data(file_path="https://arxiv.org/pdf/2408.09869")

Sources: llama-index-readers-docling/README.md:1-30

2. Indices

Indices organize documents for efficient retrieval. LlamaIndex supports both managed indices and customizable self-hosted options.

#### Managed Indices

Managed indices like VectaraIndex provide fully hosted solutions:

from llama_index.indices.managed.vectara import VectaraIndex
from llama_index.core.schema import Document, MediaResource

docs = [
    Document(
        id_="doc1",
        text_resource=MediaResource(
            text="This is test text for Vectara integration.",
        ),
    ),
]
index = VectaraIndex.from_documents(docs)

Sources: llama-index-indices-managed-vectara/README.md:30-50

3. LLM Integrations

LlamaIndex provides integrations with numerous LLM providers through a standardized interface:

# Example: Contextual LLM Integration
from llama_index.llms.contextual import Contextual

llm = Contextual(model="contextual-clm", api_key="your_api_key")
response = llm.complete("Explain the importance of Grounded Language Models.")

Sources: llama-index-llms-contextual/README.md:1-20

Usage Patterns

Building a Simple RAG Pipeline

The most common pattern involves loading documents, creating an index, and querying it:

from llama_index.core import VectorStoreIndex
from llama_index.readers.docling import DoclingReader

# Step 1: Load documents
reader = DoclingReader()
documents = reader.load_data(file_path="document.pdf")

# Step 2: Create index
index = VectorStoreIndex.from_documents(documents)

# Step 3: Query
query_engine = index.as_query_engine()
response = query_engine.query("Summarize this document")

Retrieval-Only Pattern

For applications requiring only retrieval without generation:

retriever = index.as_retriever(similarity_top_k=2)
results = retriever.retrieve("How will users feel about this new tool?")

Sources: llama-index-indices-managed-vectara/README.md:50-65

LangChain Integration

LlamaIndex components can be used as tools within LangChain agents:

from langchain.agents import initialize_agent, Tool
from langchain.llms import OpenAI
from llama_index.core import VectorStoreIndex
from llama_index.readers.web import WholeSiteReader

# Initialize scraper
scraper = WholeSiteReader(prefix="https://docs.llamaindex.ai/en/stable/", max_depth=10)
documents = scraper.load_data(base_url="https://docs.llamaindex.ai/en/stable/")

# Create index
index = VectorStoreIndex.from_documents(documents)

# Define tools
tools = [
    Tool(
        name="Website Index",
        func=lambda q: index.query(q),
        description="Useful for answering questions about text on websites.",
    ),
]

Sources: llama-index-readers-web/llama_index/readers/web/whole_site/README.md:1-40

LlamaParse Platform

LlamaParse is a complementary platform (separate from the open-source LlamaIndex framework) focused on document agents and agentic OCR:

Component	Function
Parse	Agentic OCR and document parsing (130+ formats)
Extract	Structured data extraction from documents
Index	Ingest, index, and RAG pipelines
Split	Split large documents into subcategories

Sources: README.md:75-85

Ecosystem Overview

LlamaIndex maintains an extensive ecosystem with over 300 integration packages available through LlamaHub:

graph LR
    subgraph "Data Sources"
        Web[Web]
        PDFs[PDFs]
        DB[Databases]
        APIs[APIs]
    end
    
    subgraph "LlamaIndex Core"
        Docs[Documents]
        Nodes[Nodes]
        Indices[Indices]
    end
    
    subgraph "LLM Providers"
        OpenAI[OpenAI]
        HuggingFace[HF]
        Local[Local Models]
    end
    
    Web --> Docs
    PDFs --> Docs
    DB --> Docs
    APIs --> Docs
    Docs --> Indices
    Indices --> OpenAI
    Indices --> HuggingFace
    Indices --> Local

Configuration Options

Common Reader Configuration Parameters

Parameter	Type	Description	Example
`file_path`	str	Path to input file/URL	`"document.pdf"`
`prefix`	str	URL prefix for filtering	`"https://example.com/"`
`max_depth`	int	Maximum recursion depth	`10`
`where`	dict	Metadata filter condition	`{"category": "AI"}`
`query`	list	Search query text	`["search term"]`

Sources: llama-index-readers-chroma/README.md:1-20

Installation

Quick Start (Recommended)

pip install llama-index

Minimal Installation

pip install llama-index-core

Individual Integrations

pip install llama-index-readers-wikipedia
pip install llama-index-readers-docling
pip install llama-index-llms-openai

Citation

If you use LlamaIndex in academic work, cite as:

@software{Liu_LlamaIndex_2022,
author = {Liu, Jerry},
doi = {10.5281/zenodo.1234},
month = {11},
title = {{LlamaIndex}},
url = {https://github.com/jerryjliu/llama_index},
year = {2022}
}

Sources: README.md:95-105

Next Steps

To continue learning LlamaIndex:

Getting Started - Follow the starter example
Concepts - Understand core concepts like Documents, Nodes, and Indices
LlamaHub - Browse 300+ integrations for various data sources and LLM providers
Examples - Explore Jupyter notebooks for detailed use cases

Sources: [README.md:1-20](https://github.com/run-llama/llama_index/blob/main/README.md)

Quick Start Guide

Related topics: Introduction to LlamaIndex, Documents and Nodes

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Creating a Virtual Environment

Continue reading this section for the full explanation and source context.

Section Installing the Development CLI

Continue reading this section for the full explanation and source context.

Section Querying Package Information

Continue reading this section for the full explanation and source context.

Quick Start Guide

This guide provides a comprehensive introduction to getting started with LlamaIndex, covering environment setup, core installation methods, and essential development workflows.

Prerequisites

Before beginning, ensure your environment meets the following requirements:

Requirement	Version/Details
Python	3.8 or higher
Package Manager	`uv` (recommended) or `pip`
Operating System	Unix-like (Linux, macOS), Windows with WSL
Git	Latest stable version

Environment Setup

Creating a Virtual Environment

LlamaIndex recommends using uv for dependency management. Create a virtual environment as follows:

uv venv
source .venv/bin/activate

Sources: llama-dev/README.md:11

Installing the Development CLI

The llama-dev CLI tool is the official command-line interface for development, testing, and automation in the LlamaIndex monorepo.

Install it in editable mode:

uv pip install -e .

After installation, verify the CLI is available:

llama-dev --help

Sources: llama-dev/README.md:12-18

Core Concepts

graph TD
    A[LlamaIndex Project] --> B[Core Package: llama-index-core]
    A --> C[LLM Integrations]
    A --> D[Reader Integrations]
    A --> E[Callback Integrations]
    B --> F[VectorStoreIndex]
    B --> G[ServiceContext]
    B --> H[Document Loading]

The LlamaIndex framework consists of several key components:

Component	Purpose
`llama-index-core`	Core framework functionality including indexing and querying
LLM Integrations	Connectors for various language model providers
Reader Integrations	Data loaders for different document sources
Callback Integrations	Monitoring and logging capabilities

Package Management

Querying Package Information

View information about specific packages in the monorepo:

# Get info for a specific package
llama-dev pkg info llama-index-core

# Get info for all packages
llama-dev pkg info --all

Executing Commands in Package Directories

Run commands within the context of specific packages:

# Run a command in a specific package
llama-dev pkg exec --cmd "uv sync" llama-index-core

# Run a command in all packages
llama-dev pkg exec --cmd "uv sync" --all

# Exit at first error
llama-dev pkg exec --cmd "uv" --all --fail-fast

Sources: llama-dev/README.md:26-41

Testing

Running Tests Across the Monorepo

Execute tests for specific packages or across all packages:

# Run tests for a specific package
llama-dev pkg test llama-index-core

# Run tests for all packages
llama-dev pkg test --all

Quick Test Verification

After making changes, verify core functionality:

llama-dev pkg exec --cmd "python -m pytest" llama-index-core

Basic LLM Integration Usage

Initializing an LLM

Different LLM providers follow similar initialization patterns:

from llama_index.llms.ollama import Ollama

# Initialize Ollama LLM
llm = Ollama(model="llama3.1:latest", request_timeout=120.0)

from llama_index.llms.mistralai import MistralAI

llm = MistralAI(api_key="<your-api-key>")

Sources: llama-index-integrations/llms/llama-index-llms-ollama/README.md:30-35 Sources: llama-index-integrations/llms/llama-index-llms-mistralai/README.md:16-18

Generating Completions

# Simple completion
resp = llm.complete("Who is Paul Graham?")
print(resp)

# Chat completion with messages
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role=MessageRole.SYSTEM,
        content="You are a helpful assistant."
    ),
    ChatMessage(role=MessageRole.USER, content="How to make cake?"),
]
resp = llm.chat(messages)
print(resp)

Sources: llama-index-integrations/llms/llama-index-llms-modelscope/README.md:24-37

Streaming Responses

# Stream completions
resp = llm.stream_complete("Paul Graham is ")
for r in resp:
    print(r.delta, end="")

# Stream chat responses
resp = llm.stream_chat([message])
for r in resp:
    print(r.delta, end="")

Sources: llama-index-integrations/llms/llama-index-llms-mistralai/README.md:40-48

Building an Index from Documents

Basic Index Creation

from llama_index.core import VectorStoreIndex

# Create index from documents
index = VectorStoreIndex.from_documents(documents)

# Query the index
query_engine = index.as_query_engine()
response = query_engine.query("What is the main topic?")
print(response)

Loading Data from URLs

from llama_index.readers.web import WholeSiteReader

# Initialize the scraper
scraper = WholeSiteReader(
    prefix="https://docs.llamaindex.ai/en/stable/",
    max_depth=10,
)

# Start scraping from a base URL
documents = scraper.load_data(
    base_url="https://docs.llamaindex.ai/en/stable/"
)

# Create index
index = VectorStoreIndex.from_documents(documents)
index.query("What language is on this website?")

Sources: llama-index-integrations/readers/llama-index-readers-web/llama_index/readers/web/whole_site/README.md:14-34

Configuration Options

Key Parameters

Parameter	Description	Default Value
`model`	LLM model identifier	Required
`api_key`	API key for the provider	Required for cloud providers
`request_timeout`	Request timeout in seconds	30.0
`temperature`	Sampling temperature	0.7
`max_tokens`	Maximum tokens to generate	Provider-specific
`context_window`	Maximum context length	Provider-specific

Environment Variables

Set API keys as environment variables before initialization:

export KONKO_API_KEY=<your-api-key>
export OPENAI_API_KEY=<your-api-key>

import os
os.environ["KONKO_API_KEY"] = "<your-api-key>"

Sources: llama-index-integrations/llms/llama-index-llms-konko/README.md:15-20

Common Workflows

graph LR
    A[Setup Environment] --> B[Install llama-dev]
    B --> C[Explore Packages]
    C --> D{Development Goal}
    D -->|Testing| E[Run Tests]
    D -->|Integration| F[Configure LLM]
    D -->|Data Loading| G[Set up Readers]
    E --> H[Modify Code]
    F --> H
    G --> H
    H --> I[Verify Changes]
    I --> E

Troubleshooting

Common Issues

Issue	Solution
CLI not found	Ensure virtual environment is activated
API key errors	Verify environment variables are set
Package import errors	Run `uv sync` in the package directory
Timeout errors	Increase `request_timeout` parameter

Verification Commands

# Check installation
llama-dev --version

# Verify package structure
llama-dev pkg info --all

# Test core imports
python -c "import llama_index; print(llama_index.__version__)"

Next Steps

After completing this quick start guide:

Explore specific LLM integrations for your preferred provider
Review reader integrations for your data sources
Study the core API documentation for advanced indexing strategies
Join the community for support and updates

Sources: [llama-dev/README.md:11]()

Core Architecture

Related topics: Introduction to LlamaIndex, Integration Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Section High-Level Architecture Diagram

Continue reading this section for the full explanation and source context.

Section Purpose and Role

Continue reading this section for the full explanation and source context.

Section Base LLM Interface

Continue reading this section for the full explanation and source context.

Core Architecture

Overview

LlamaIndex is a data framework for building LLM-powered applications. The Core Architecture establishes the fundamental building blocks that enable developers to connect large language models with their custom data sources. This architectural foundation provides a layered, modular approach where each component—from language model interfaces to response handling—follows consistent patterns and abstractions.

The core architecture serves as the abstraction layer between raw data ingestion and sophisticated LLM-powered querying. It separates concerns by defining clear interfaces for language models (LLMs), embedding services, document processing, indexing, and response generation. This design allows developers to swap implementations, extend functionality, and maintain clean separation between components.

System Components

High-Level Architecture Diagram

graph TD
    subgraph "Data Layer"
        Documents[Documents]
        Nodes[Nodes]
        Index[Index]
    end
    
    subgraph "Core Abstractions"
        LLMs[LLM Base]
        Embeddings[Embedding Base]
        Response[Response Schema]
    end
    
    subgraph "Service Layer"
        VectorStore[Vector Store]
        StorageContext[Storage Context]
    end
    
    subgraph "Application Layer"
        Query[Query Engine]
        Chat[Chat Engine]
        Agent[Agent]
    end
    
    Documents --> NodeParser
    NodeParser --> Nodes
    Nodes --> Index
    Index --> Query
    Query --> Response
    LLMs --> Query
    Embeddings --> Index

Language Model (LLM) Abstraction

Purpose and Role

The LLM base abstraction (llama_index.core.base.llms.base) defines the contract that all language model implementations must follow. This abstraction enables LlamaIndex to support multiple LLM providers—including OpenAI, Anthropic, local models, and custom implementations—through a unified interface.

Sources: llama-index-core/llama_index/core/base/llms/base.py:1-50

Base LLM Interface

The LLM base class provides the following core methods:

Method	Purpose	Parameters
`complete()`	Synchronous text completion	`prompt: str`, `formatted: bool = False`, `**kwargs`
`stream_complete()`	Streaming text completion	`prompt: str`, `formatted: bool = False`, `**kwargs`
`chat()`	Synchronous chat completion	`messages: List[ChatMessage]`, `**kwargs`
`stream_chat()`	Streaming chat completion	`messages: List[ChatMessage]`, `**kwargs`

LLM Class Hierarchy

classDiagram
    class LLM {
        <<abstract>>
        +complete()
        +stream_complete()
        +chat()
        +stream_chat()
        +metadata: LLMMetadata
    }
    
    class LLMMetadata {
        +model: str
        +temperature: float
        +top_p: int
        +max_tokens: Optional[int]
        +context_window: int
        +is_chat_model: bool
        +is_function_calling_model: bool
    }
    
    class ChatMessage {
        +role: MessageRole
        +content: str
        +additional_kwargs: Dict
    }
    
    LLM --> LLMMetadata
    LLM --> ChatMessage

Sources: llama-index-core/llama_index/core/base/llms/base.py:50-120

Message Roles

The MessageRole enum defines valid roles for chat messages:

Role	Description
`SYSTEM`	System-level instructions
`USER`	User-generated content
`ASSISTANT`	Model-generated responses
`FUNCTION`	Function call results

Embedding Abstraction

Purpose and Role

The embedding base (llama_index.core.base.embeddings.base) provides the interface for text vectorization. Embeddings transform textual content into numerical vectors that enable semantic similarity searches. This abstraction supports various embedding providers while maintaining a consistent API.

Sources: llama-index-core/llama_index/core/base/embeddings/base.py:1-60

Embedding Interface Methods

Method	Purpose	Return Type
`get_query_embedding()`	Embed a single query string	`List[float]`
`get_text_embedding()`	Embed a single text string	`List[float]`
`get_text_embedding_batch()`	Embed multiple texts in batch	`List[List[float]]`
`get_query_embedding_batch()`	Embed multiple queries in batch	`List[List[float]]]`

Embedding Configuration

graph LR
    A[Text Input] --> B[Embedding Model]
    B --> C[Dimension: 384-1536]
    C --> D[Normalized Vector]

Sources: llama-index-core/llama_index/core/base/embeddings/base.py:60-100

Response Schema

Purpose and Role

The response schema (llama_index.core.base.response.schema) defines the data structures used throughout LlamaIndex for returning query results, streaming responses, and structured outputs. This ensures consistent response handling across different query types and engines.

Sources: llama-index-core/llama_index/core/base/response/schema.py:1-80

Core Response Models

Class	Purpose
`Response`	Wraps text responses with sources
`StreamingResponse`	Handles streaming token outputs
`ResponseMode`	Enum for response generation modes
`Sources`	Container for source nodes and metadata

Response Mode Enumeration

graph TD
    A[Query] --> B{Response Mode}
    B --> C[default]
    B --> D[refine]
    B --> E[compact]
    B --> F[accumulate]
    B --> G[compact_accumulate]
    
    C --> H[Single pass response]
    D --> I[Iterative refinement]
    E --> J[Compact and respond]
    F --> K[Aggregate node responses]
    G --> L[Compact then accumulate]

Sources: llama-index-core/llama_index/core/base/response/schema.py:30-50

Core Types System

Type Definitions

The types module (llama_index.core.types) defines foundational enumerations and type aliases used throughout the framework:

Type	Description
`ModelType`	Defines model categories (e.g., `LLM`, `EMBEDDING`)
`PromptType`	Categorizes prompts (e.g., `SUMMARY`, `QUERY`)
`NodeType`	Defines node kinds (e.g., `TEXT`, `DOCUMENT`)

Sources: llama-index-core/llama_index/core/types.py:1-60

Node Parser Types

classDiagram
    class Node {
        <<abstract>>
        +id_: str
        +embedding: Optional[List[float]]
        +metadata: Dict[str, Any]
        +relationships: Dict[NodeRelationship, Node]
        +excluded_embed_metadata_keys: List[str]
        +excluded_llm_metadata_keys: List[str]
    }
    
    class TextNode {
        +text: str
        +start_char_idx: Optional[int]
        +end_char_idx: Optional[int]
    }
    
    class Document {
        +text: str
        +doc_id: str
        +embedding: Optional[List[float]]
    }
    
    Node <|-- TextNode
    Node <|-- Document

Document and Node Model

Document Structure

Documents represent the top-level container for source data. Each document contains metadata and can be broken down into smaller nodes for indexing:

Field	Type	Description
`doc_id`	`str`	Unique document identifier
`text`	`str`	Full text content
`metadata`	`Dict[str, Any]`	Associated metadata
`embedding`	`Optional[List[float]]`	Pre-computed embedding

Node Relationships

Nodes maintain relationships with other nodes through the NodeRelationship enum:

Relationship	Description
`SOURCE`	Parent document relationship
`PREVIOUS`	Previous sibling node
`NEXT`	Next sibling node
`PARENT`	Parent node in hierarchy
`CHILD`	Child node in hierarchy

Sources: llama-index-core/llama_index/core/node_parser/node.py:30-80

Storage Architecture

Storage Context

The StorageContext manages persistence layers for various data components:

graph TD
    StorageContext --> VectorStore
    StorageContext --> DocStore
    StorageContext --> IndexStore
    StorageContext --> GraphStore
    
    VectorStore --> Milvus[Milvus]
    VectorStore --> Chroma[Chroma]
    VectorStore --> Pinecone[Pinecone]
    
    DocStore --> MongoDB[MongoDB]
    DocStore --> Redis[Redis]
    DocStore --> Simple[SimpleKVStore]

Sources: llama-index-core/llama_index/core/storage/storage_context.py:1-50

Storage Components

Component	Purpose
`vector_store`	Stores embedding vectors for similarity search
`doc_store`	Stores serialized nodes and documents
`index_store`	Stores index metadata and configurations
`graph_store`	Stores knowledge graph relationships

Index Architecture

Base Index Structure

Indexes provide the mechanism for organizing and querying documents. The base index class establishes the contract for all index implementations:

graph LR
    A[Documents] --> B[Index Construction]
    B --> C[Node Parsing]
    C --> D[Embedding Generation]
    D --> E[Vector Storage]
    E --> F[Queryable Index]

Index Types

Index Type	Use Case
`VectorStoreIndex`	Semantic search over embeddings
`SummaryIndex`	Document summarization
`KeywordTableIndex`	Keyword-based retrieval
`KnowledgeGraphIndex`	Graph-based knowledge representation

Sources: llama-index-core/llama_index/core/indices/base.py:1-80

Query Engine Architecture

Query Flow

sequenceDiagram
    participant User
    participant QueryEngine
    participant Retriever
    participant LLM
    participant Response
    
    User->>QueryEngine: Query Request
    QueryEngine->>Retriever: Retrieve Nodes
    Retriever-->>QueryEngine: Source Nodes
    QueryEngine->>LLM: Synthesize Response
    LLM-->>QueryEngine: Response
    QueryEngine->>Response: Format Output
    Response-->>User: Formatted Answer

Retriever Types

Retriever	Description
`VectorRetriever`	Embedding-based similarity search
`KeywordRetriever`	BM25 or keyword matching
`HybridRetriever`	Combined vector and keyword search
`SentenceWindowRetriever`	Contextual window retrieval

Configuration and Extensibility

Service Context

The ServiceContext bundles together the core service components:

Parameter	Type	Default	Description
`llm`	`LLM`	`OpenAI()`	Language model instance
`embed_model`	`Embedding`	`OpenAIEmbedding()`	Embedding model instance
`node_parser`	`NodeParser`	`SentenceSplitter()`	Text chunking strategy
`prompt_helper`	`PromptHelper`	Auto-calculated	Prompt size optimization

Customization Patterns

graph TD
    subgraph "Extension Points"
        CustomLLM[Custom LLM Implementation]
        CustomEmbed[Custom Embedding Model]
        CustomParser[Custom Node Parser]
        CustomStore[Custom Storage Backend]
    end
    
    CustomLLM -->|inherits| LLMBase[LLM Base]
    CustomEmbed -->|inherits| EmbedBase[Embedding Base]
    CustomParser -->|inherits| NodeParserBase[NodeParser Base]
    CustomStore -->|inherits| StorageContextBase[StorageContext Base]

Summary

The Core Architecture of LlamaIndex establishes a modular, extensible framework built on well-defined abstractions. The layered architecture—from base interfaces like LLM and Embedding through storage and indexing components to application-layer query engines—enables developers to:

Swap implementations without changing application code
Extend functionality through inheritance and composition
Maintain clean separation between concerns
Support multiple providers through unified interfaces

The architecture follows consistent patterns across components, making the framework predictable and learnable while supporting the diverse requirements of production LLM applications.

Sources: [llama-index-core/llama_index/core/base/llms/base.py:1-50]()

Integration Architecture

Related topics: Core Architecture, Retrieval and Reranking

Section Related Pages

Continue reading this section for the full explanation and source context.

Section LLM Integrations

Continue reading this section for the full explanation and source context.

Section Reader Integrations

Continue reading this section for the full explanation and source context.

Section Embedding Integrations

Continue reading this section for the full explanation and source context.

Related topics: Core Architecture, Retrieval and Reranking

Integration Architecture

Overview

LlamaIndex employs a modular integration architecture that extends the core framework's capabilities through a comprehensive ecosystem of pluggable components. The integration system allows developers to connect LlamaIndex with external services, APIs, local models, and specialized tools without modifying the core library. This architecture follows a provider-based pattern where each integration package implements standardized interfaces to ensure compatibility and consistent behavior across different external systems.

The integration architecture serves as the bridge between LlamaIndex's core data structures and the diverse landscape of LLM providers, embedding services, document loaders, and auxiliary tools. By maintaining well-defined contracts between components, the system enables seamless swapping of implementations while preserving the overall workflow of building retrieval-augmented generation (RAG) pipelines and query engines.

Integration Categories

LlamaIndex organizes its integrations into distinct categories, each addressing a specific aspect of the LLM application development workflow. The categorization ensures logical separation of concerns and simplifies dependency management for end users.

LLM Integrations

LLM (Large Language Model) integrations provide adapters for connecting to various language model providers. These integrations implement the unified LLM interface defined in llama_index.core.llms, allowing developers to switch between providers without changing application code. Each LLM integration handles provider-specific authentication, request formatting, response parsing, and streaming behavior.

Integration Package	Provider	Key Features
`llama-index-llms-contextual`	Contextual	Contextual LLM wrapper
`llama-index-llms-konko`	Konko	Supports both Konko and OpenAI models
`llama-index-llms-lmstudio`	LM Studio	Local server integration
`llama-index-llms-monsterapi`	MonsterAPI	Private deployments and GA models
`llama-index-llms-modelscope`	ModelScope	Qwen and other ModelScope models
`llama-index-llms-langchain`	LangChain	LangChain LLM wrapper
`llama-index-llms-optimum-intel`	Intel Optimum	CPU-optimized inference

Sources: llama-index-integrations/llms/llama-index-llms-contextual/README.md

Reader Integrations

Reader integrations enable data ingestion from various document sources and web content. These loaders transform external data formats into LlamaIndex's internal Document schema, providing a unified representation regardless of the source type.

Reader Type	Source Format	Package
Document Readers	PDF, DOCX, HTML	`llama-index-readers-docling`
Web Readers	URLs, Articles	`llama-index-readers-web`
Wikipedia	Wikipedia pages	`llama-index-readers-wikipedia`
Remote Content	Deep link crawling	`llama-index-readers-remote-depth`
Cloud Storage	Box files	`llama-index-readers-box`
Preprocessed	Chunks from Preprocess API	`llama-index-readers-preprocess`

Sources: llama-index-integrations/readers/llama-index-readers-wikipedia/README.md

Embedding Integrations

Embedding integrations provide vectorization capabilities through external embedding models. These components convert text into dense vector representations suitable for semantic search and similarity operations.

Provider	Model Examples	Package
Ollama	`nomic-embed-text`, `embeddinggemma`, `mxbai-embed-large`	`llama-index-embeddings-ollama`

Sources: llama-index-integrations/embeddings/llama-index-embeddings-ollama/README.md

Index Integrations

Index integrations connect to managed vector search services, providing fully-hosted indexing and retrieval capabilities. These integrations abstract the complexity of distributed vector databases behind LlamaIndex's retriever interface.

Managed Service	Package	Features
Vectara	`llama-index-indices-managed-vectara`	RAG pipeline, retriever, query engine

Sources: llama-index-integrations/indices/llama-index-indices-managed-vectara/README.md

Tool Integrations

Tool integrations extend LlamaIndex's agent capabilities by providing access to external services that can be invoked during agent execution.

Tool Provider	Features	Package
Moss	Hybrid search (keyword + semantic)	`llama-index-tools-moss`

Callback Integrations

Callback integrations enable observability and feedback collection by integrating with external monitoring and evaluation platforms.

Platform	Purpose	Package
Argilla	Feedback loop, LLM monitoring	`llama-index-callbacks-argilla`

Sources: llama-index-integrations/callbacks/llama-index-callbacks-argilla/README.md

System Architecture

The integration architecture follows a layered approach where core abstractions define the contracts, and integration packages provide concrete implementations. This design enables horizontal scalability of integrations while maintaining vertical consistency with the core framework.

graph TD
    A[Application Layer] --> B[Core LlamaIndex]
    B --> C[Interface Abstractions]
    C --> D[LLM Abstraction]
    C --> E[Reader Abstraction]
    C --> F[Embedding Abstraction]
    C --> G[Retriever Abstraction]
    D --> H[LLM Integrations]
    E --> I[Reader Integrations]
    F --> J[Embedding Integrations]
    G --> K[Index Integrations]
    H --> L[Konko, LMStudio, MonsterAPI, etc.]
    I --> M[Docling, Wikipedia, Web, Box, etc.]
    J --> N[Ollama Embeddings]
    K --> O[Vectara]

Common Integration Patterns

LLM Integration Pattern

LLM integrations follow a consistent initialization pattern that accepts provider-specific configuration parameters. The typical constructor accepts a model identifier, base URL for API endpoints, and optional generation parameters such as temperature and maximum tokens.

from llama_index.llms.provider_name import ProviderLLM

llm = ProviderLLM(
    model="model-identifier",
    api_key="your-api-key",
    temperature=0.7,
    max_tokens=256
)

Sources: llama-index-integrations/llms/llama-index-llms-konko/README.md

Reader Integration Pattern

Reader integrations follow a loader pattern where initialization may require credentials, and the load_data method accepts source-specific parameters such as URLs, file paths, or query filters.

from llama_index.readers.source_type import SourceReader

reader = SourceReader(credentials="your-credentials")
documents = reader.load_data(source="document-source")

Sources: llama-index-integrations/readers/llama-index-readers-remote-depth/README.md

Data Flow Architecture

The integration architecture enables a complete RAG pipeline where each component plays a specific role in transforming input data into actionable insights.

graph LR
    A[Document Sources] --> B[Readers]
    B --> C[Documents]
    C --> D[Node Parsers]
    D --> E[Nodes]
    E --> F[Vector Index]
    E --> G[Storage Context]
    F --> H[Retriever]
    G --> H
    H --> I[Query Engine]
    I --> J[LLM]
    J --> K[Response]

Installation and Dependency Management

Each integration package follows the naming convention llama-index-{category}-{provider} and can be installed independently via pip. This modular approach minimizes dependency overhead by allowing users to install only the packages required for their specific use case.

Category	Package Naming Pattern	Installation Command
LLM	`llama-index-llms-{provider}`	`pip install llama-index-llms-{provider}`
Reader	`llama-index-readers-{source}`	`pip install llama-index-readers-{source}`
Embedding	`llama-index-embeddings-{provider}`	`pip install llama-index-embeddings-{provider}`
Index	`llama-index-indices-{type}-{provider}`	`pip install llama-index-indices-{type}-{provider}`
Tool	`llama-index-tools-{provider}`	`pip install llama-index-tools-{provider}`
Callback	`llama-index-callbacks-{platform}`	`pip install llama-index-callbacks-{platform}`

Configuration Management

Integrations typically support configuration through both constructor parameters and environment variables. This dual approach accommodates both explicit configuration in code and secret management through environment-based configuration.

Environment Variable Pattern

Many integrations follow a pattern where API keys can be set as environment variables for security and convenience:

export PROVIDER_API_KEY="your-api-key"
export OPENAI_API_KEY="your-openai-key"

Constructor Parameter Pattern

Alternatively, credentials can be passed directly to the integration constructor:

llm = ProviderLLM(
    model="model-name",
    api_key="explicit-api-key",
    base_url="https://api.provider.com"
)

Sources: llama-index-integrations/llms/llama-index-llms-lmstudio/README.md

Extending the Architecture

The integration architecture is designed for extensibility. New integrations can be created by implementing the appropriate abstract base classes defined in llama_index.core. Each integration category has its own interface specification that ensures consistency across implementations.

Creating a New LLM Integration

To create a new LLM integration, implement the following interface contract:

Inherit from the base LLM class
Implement complete(), chat(), and streaming methods
Handle provider-specific authentication and error handling
Follow the naming convention for the package

Creating a New Reader Integration

To create a new reader integration:

Implement a loader class with load_data() method
Transform source data into Document objects
Handle pagination, filtering, and error cases appropriately
Document supported source formats and parameters

Integration Testing Considerations

Each integration package maintains its own test suite to verify compatibility with the external service. Integration tests typically require actual API credentials and network access, distinguishing them from unit tests that mock external dependencies.

Best Practices

When working with LlamaIndex integrations, consider the following best practices:

Dependency Isolation: Install only required integration packages to minimize potential conflicts
Credential Management: Use environment variables for sensitive credentials in production
Error Handling: Implement appropriate retry logic and fallback strategies for external service calls
Resource Management: Close connections and release resources properly when using streaming responses
Version Compatibility: Check integration package versions against the core LlamaIndex version for compatibility

Deprecated Integrations

Some integration packages may be discontinued over time as external services evolve or change their offerings. When an integration is deprecated, it will receive no further updates or support. Users should migrate to alternative solutions before removing deprecated packages from their projects.

Sources: llama-index-integrations/readers/llama-index-readers-preprocess/README.md

Conclusion

The integration architecture provides a flexible, extensible framework for connecting LlamaIndex with the broader ecosystem of LLM providers, data sources, and tools. By maintaining standardized interfaces while allowing provider-specific implementations, the architecture enables developers to build sophisticated RAG applications without being locked into a single vendor or service. The modular design supports incremental adoption, allowing teams to integrate new capabilities as their requirements evolve.

Sources: [llama-index-integrations/llms/llama-index-llms-contextual/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-contextual/README.md)

Documents and Nodes

Related topics: Storage Systems, Query Engines

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Purpose and Scope

Continue reading this section for the full explanation and source context.

Section Core Document Schema

Continue reading this section for the full explanation and source context.

Section Document Construction

Continue reading this section for the full explanation and source context.

Related topics: Storage Systems, Query Engines

Documents and Nodes

Overview

In LlamaIndex, Documents and Nodes are the fundamental data structures that represent information to be indexed, searched, and retrieved. Documents serve as the primary unit of input data, while Nodes are the granular chunks created during document processing for optimal embedding and retrieval.

Document Model

Purpose and Scope

A Document in LlamaIndex represents a single unit of data to be indexed. It encapsulates the content along with associated metadata that provides context about the source, type, and additional information useful for retrieval and processing.

Core Document Schema

The Document model is defined in llama-index-core/llama_index/core/schema.py and includes the following key attributes:

Attribute	Type	Description
`text`	str	The main text content of the document
`id_`	str	Unique identifier for the document
`metadata`	Dict[str, Any]	Additional metadata about the document
`mimetype`	str	MIME type of the document content
`relationships`	Dict[str, RelationshipType]	Relationships to other nodes/documents

Document Construction

Documents can be created with varying levels of detail:

from llama_index.core import Document

# Basic document
doc = Document(text="Your content here")

# Document with metadata
doc = Document(
    text="Your content here",
    metadata={
        "source": "review.txt",
        "author": "John Doe",
        "date": "2024-01-15"
    }
)

Node Model

Purpose and Scope

Nodes are the result of parsing and chunking Documents into smaller, semantically coherent pieces. Each Node inherits document-like properties but adds relationship information linking back to its parent Document and sibling Nodes.

Node Structure

Nodes extend the Document schema with additional attributes defined in llama-index-core/llama_index/core/schema.py:

Attribute	Type	Description
`node_id`	str	Unique identifier for the node
`start_char_idx`	int	Starting character index in parent document
`end_char_idx`	int	Ending character index in parent document
`text_template`	str	Template for rendering the node text
`relationships`	Dict[RelationshipType, RelatedNodeType]	Relationships including PARENT, PREVIOUS, NEXT

Architecture Diagram

graph TD
    A[Raw Input Data] --> B[Document]
    B --> C[Node Parser]
    C --> D[Nodes]
    D --> E[Embedding Model]
    E --> F[Vector Index]
    
    G[Metadata] --> B
    H[Relationships] --> D
    
    B -->|PARENT| D
    D -->|CHILD| B

Readers and Loading

Base Reader Interface

Readers are responsible for loading data from various sources and converting them into Documents. The base reader interface is defined in llama-index-core/llama_index/core/readers/base.py.

Method	Description
`load_data()`	Load documents from a data source
`lazy_load_data()`	Lazily load documents for memory efficiency

Supported Reader Types

LlamaIndex provides numerous reader integrations for different data sources:

Category	Reader	Description
Document	Docling Reader	PDF, DOCX, HTML extraction to Markdown or JSON
Document	MarkItDown Reader	Converts various formats to Markdown
Document	Docugami Loader	XML knowledge graph from PDF/DOCX
Web	NewsArticleReader	Parses news article URLs
Web	UnstructuredURLLoader	URL text extraction via Unstructured.io
Web	TrafilaturaWebReader	Web scraping with trafilatura
Web	MainContentExtractorReader	Main content extraction from websites
Web	ReadabilityWebPageReader	Readability-based web extraction
Web	RemoteDepthReader	Recursive URL loading with depth control
Web	WholeSiteReader	Full site scraping with prefix/depth
Academic	SemanticScholarReader	Scholarly articles and papers
Database	Chroma Reader	Loading from Chroma vector store

Usage Example

from llama_index.readers.docling import DoclingReader

reader = DoclingReader()
docs = reader.load_data(file_path="document.pdf")

Node Parsers

Purpose and Scope

Node Parsers transform Documents into Nodes by splitting content based on semantic boundaries. The interface is defined in llama-index-core/llama_index/core/node_parser/interface.py.

Core Interface Methods

Method	Description
`get_nodes_from_documents()`	Parse documents into nodes
`get_batch_nodes()`	Process documents in batches

Sentence Splitter Parser

The sentence-based node parser in llama-index-core/llama_index/core/node_parser/text/sentence.py provides configurable text chunking:

Parameter	Type	Default	Description
`separator`	str	"\n\n"	Chunk separator
`chunk_size`	int	1024	Maximum characters per chunk
`chunk_overlap`	int	0	Overlap between chunks
`chunking_tokenizer`	callable	None	Custom tokenizer function
`callback_manager`	CallbackManager	None	Event callbacks

Docling Node Parser

The Docling Node Parser (llama-index-integrations/node_parser/llama-index-node-parser-docling/README.md) parses Docling JSON output into LlamaIndex nodes with rich metadata:

from llama_index.node_parser.docling import DoclingNodeParser

node_parser = DoclingNodeParser()
nodes = node_parser.get_nodes_from_documents(documents=docs)

Document-Node Relationships

Relationship Types

Nodes maintain typed relationships to other components:

Relationship	Description
`PARENT`	Link to parent Document
`CHILD`	Link to child elements
`PREVIOUS`	Previous sibling Node
`NEXT`	Next sibling Node
`SOURCE`	Source Document reference

Metadata Preservation

Nodes automatically inherit and extend document metadata:

# Node metadata includes provenance information
{
    'doc_items': [{'self_ref': '#/main-text/21'}],
    'prov': [{'page_no': 2, 'bbox': {...}}],
    'headings': ['2 Getting Started']
}

Workflow

graph LR
    A[Load Data] --> B[Create Document]
    B --> C[Parse Document]
    C --> D[Generate Nodes]
    D --> E[Create Embeddings]
    E --> F[Build Index]
    
    A1[Readers] --> A
    C1[Node Parsers] --> C

Best Practices

Document Creation

Always assign unique id_ attributes for tracking
Include comprehensive metadata for filtering
Specify mimetype when content type matters

Node Parsing

Choose appropriate chunk_size for your embedding model
Configure chunk_overlap for context continuity
Use semantic-aware parsers (Docling) for complex documents

Memory Management

Use lazy_load_data() for large document collections
Consider batch processing for node parsing
Leverage streaming for very large files

Integration	Use Case
VectaraIndex	Managed semantic search
ChromaReader	Vector database loading
AlibabaCloud AISearch	Cloud-based document parsing
Ollama Embeddings	Local embedding generation

Summary

Documents serve as the primary data ingestion point in LlamaIndex, encapsulating raw content and metadata from various sources. Nodes are the processed, chunked representations optimized for embedding generation and retrieval. Together with Readers and Node Parsers, they form the foundation of the LlamaIndex data pipeline.

Source: https://github.com/run-llama/llama_index / Human Manual

Storage Systems

Related topics: Documents and Nodes, Retrieval and Reranking

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Initialization

Continue reading this section for the full explanation and source context.

Section Configuration Parameters

Continue reading this section for the full explanation and source context.

Section Persistence Methods

Continue reading this section for the full explanation and source context.

Storage Systems

Overview

LlamaIndex provides a comprehensive storage system that allows users to persist indexes, documents, and chat histories to disk for later retrieval and reuse. The storage architecture is built around the StorageContext class, which serves as the central coordinator for managing various storage backends including document stores, index stores, and chat stores.

The storage system enables:

Persistence: Save index data to disk for long-term storage
Retrieval: Reload previously persisted indexes without recomputation
In-memory fallback: Default in-memory storage when persistence is not configured
Customizable backends: Pluggable storage implementations for different use cases

Architecture

graph TD
    A[StorageContext] --> B[VectorStore]
    A --> C[DocStore]
    A --> D[IndexStore]
    A --> E[ChatStore]
    A --> F[ImageStore]
    A --> G[GraphStore]
    
    C --> H[SimpleDocStore]
    C --> I[MongoDocStore]
    C --> J[KVDocStore]
    
    D --> K[SimpleIndexStore]
    D --> L[MongoIndexStore]
    D --> M[KVIndexStore]
    
    E --> N[SimpleChatStore]
    E --> O[MongoChatStore]

StorageContext

The StorageContext class is the main entry point for configuring storage in LlamaIndex. It aggregates all storage components and provides methods for persistence and retrieval.

Initialization

from llama_index.core import StorageContext, load_index_from_storage

# Create with default in-memory stores
storage_context = StorageContext.from_defaults()

# Create with persistence to disk
storage_context = StorageContext.from_defaults(persist_dir="./storage")

# Load existing index from disk
index = load_index_from_storage(storage_context=storage_context)

Configuration Parameters

Parameter	Type	Default	Description
`persist_dir`	`str`	`None`	Directory path for persistence
`vector_store`	`BaseVectorStore`	`InMemoryVectorStore`	Vector storage backend
`docstore`	`BaseDocstore`	`SimpleDocumentStore`	Document storage backend
`index_store`	`BaseIndexStore`	`SimpleIndexStore`	Index metadata storage
`graph_store`	`BaseGraphStore`	`None`	Knowledge graph storage
`chat_store`	`BaseChatStore`	`SimpleChatStore`	Chat history storage
`image_store`	`BaseImageStore`	`None`	Image storage backend

Persistence Methods

Method	Description
`persist(persist_dir, ...)`	Save all storage components to disk
`from_defaults(**kwargs)`	Create context with default or specified settings
`load_index_from_storage()`	Class method to load index from persisted storage

Document Store

The document store manages the storage and retrieval of BaseDocument objects. LlamaIndex provides several document store implementations.

SimpleDocumentStore

The default in-memory document store backed by SQLite for persistence.

from llama_index.core.storage.docstore import SimpleDocumentStore

docstore = SimpleDocumentStore(
    persist_path="./docstore.json",
    redis_host="localhost",
    redis_port=6379,
    redis_password=None
)

Document Store API

Method	Description
`add_documents(documents, batch_size)`	Add documents to the store
`get_document(doc_id)`	Retrieve a document by ID
`delete(doc_id)`	Remove a document by ID
`get_nodes(node_ids)`	Retrieve nodes by their IDs
`get_all_nodes()`	Retrieve all nodes from the store
`persist(persist_path)`	Persist the document store to disk

Data Model

Documents are stored with the following structure:

class BaseDocument:
    id_: str              # Unique identifier
    embedding: List[float]  # Vector embedding
    metadata: Dict[str, Any]  # User-defined metadata
    text: str             # Document text content
    excluded_embed_metadata_keys: List[str]
    excluded_llm_metadata_keys: List[str]
    relationships: Dict[DocumentRelationship, str]
    hash: str             # Computed hash for caching
    __class__: type       # Document type (optional)

Index Store

The index store manages index metadata and structure, enabling efficient retrieval of index components.

SimpleIndexStore

The default index store implementation using JSON file storage.

from llama_index.core.storage.index_store import SimpleIndexStore

index_store = SimpleIndexStore(
    persist_path="./index_store.json"
)

Index Store API

Method	Description
`add_index_struct(index_struct)`	Store an index structure
`get_index_struct(struct_id)`	Retrieve index structure by ID
`get_index_structs()`	List all stored index structures
`delete_index_struct(struct_id)`	Remove an index structure

Supported Index Types

Index Type	Description
`VectorStoreIndex`	Dense vector-based retrieval
`SummaryIndex`	Summary-based indexing
`KeywordTableIndex`	Keyword-based retrieval
`KnowledgeGraphIndex`	Graph-based knowledge indexing

Chat Store

The chat store manages conversation history for multi-turn interactions with language models.

SimpleChatStore

A persistent chat store implementation for storing and retrieving chat messages.

from llama_index.core.storage.chat_store import SimpleChatStore

chat_store = SimpleChatStore(
    persist_path="./chat_store.json"
)

Chat Store API

Method	Description
`add_message(chat_id, message, role)`	Append a message to a chat session
`get_messages(chat_id)`	Retrieve all messages for a chat
`get_chat(chat_id)`	Get chat session details
`delete_chat(chat_id)`	Remove a chat session
`persist(persist_path)`	Save chat history to disk

Message Structure

Field	Type	Description
`role`	`str`	Message role (user/assistant/system)
`content`	`str`	Message text content
`additional_kwargs`	`Dict`	Extra metadata for the message

Storage Workflow

graph LR
    A[Create Documents] --> B[Initialize StorageContext]
    B --> C{Configure Backends}
    C --> D[In-Memory]
    C --> E[Persistent]
    D --> F[Build Index]
    E --> F
    F --> G[Index Created]
    G --> H[Persist to Disk]
    H --> I[StorageContext.persist]
    
    J[Load Index] --> K[load_index_from_storage]
    K --> L[Index Ready]

Usage Examples

Basic Persistence

from llama_index.core import VectorStoreIndex, StorageContext

# Create documents
documents = [...]

# Create index with storage context
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = VectorStoreIndex.from_documents(
    documents, 
    storage_context=storage_context
)

# Explicitly persist (optional - also happens on garbage collection)
index.storage_context.persist()

Loading Persisted Index

from llama_index.core import StorageContext, load_index_from_storage

# Rebuild storage context from disk
storage_context = StorageContext.from_defaults(persist_dir="./storage")

# Load existing index
index = load_index_from_storage(storage_context=storage_context)

# Query the loaded index
query_engine = index.as_query_engine()
response = query_engine.query("Your question here")

Custom Storage Configuration

from llama_index.core import StorageContext
from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.core.storage.index_store import SimpleIndexStore

# Create custom stores
docstore = SimpleDocumentStore(persist_path="./custom_docstore.json")
index_store = SimpleIndexStore(persist_path="./custom_index_store.json")

# Configure storage context with custom stores
storage_context = StorageContext(
    docstore=docstore,
    index_store=index_store,
    persist_dir="./custom_storage"
)

# Use with index
index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context
)

Storage Backend Comparison

Backend	Persistence	Performance	Scalability	Use Case
`SimpleDocumentStore`	JSON/SQLite	Medium	Low-Medium	Development, small datasets
`RedisDocumentStore`	Redis	High	High	Production, distributed systems
`MongoDocumentStore`	MongoDB	High	Very High	Large-scale deployments
`KVDocumentStore`	Key-Value	High	Medium-High	General purpose

Best Practices

``python Document(id_="unique_doc_1", text="content") ``

Always specify unique document IDs: Prevents duplicate entries and enables predictable retrieval

Configure persistence early: Set up storage context before building indexes to avoid data loss

Use appropriate batch sizes: When adding many documents, use batch operations for better performance

Handle persistence errors: Wrap persistence calls in try-except blocks for robustness

Backup important data: Regularly backup persisted storage directories

Vector Stores: Manage embedding vectors for semantic search
Graph Stores: Handle knowledge graph data structures
Image Stores: Store image data for multimodal applications
Query Engines: Use storage to retrieve relevant documents for queries
Retrievers: Access stored data for retrieval-augmented generation

Source: https://github.com/run-llama/llama_index / Human Manual

Query Engines

Related topics: Retrieval and Reranking, Documents and Nodes

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Components

Continue reading this section for the full explanation and source context.

Section Initialization

Continue reading this section for the full explanation and source context.

Section Constructor Parameters

Continue reading this section for the full explanation and source context.

Query Engines

Query Engines are the core components in LlamaIndex responsible for processing user queries and returning relevant responses by retrieving, synthesizing, and formatting information from indexed data.

Overview

Query Engines serve as the primary interface for querying indexed documents in LlamaIndex. They coordinate the retrieval of relevant context from the index and synthesize this information into coherent, helpful responses using Large Language Models (LLMs).

Key Responsibilities:

Receive user queries and transform them into retrieval operations
Coordinate with retrievers to fetch relevant documents or data chunks
Route queries to appropriate response synthesizers
Handle query-time configuration such as similarity thresholds and response modes

Sources: llama-index-core/llama_index/core/query_engine/__init__.py

Architecture

The query engine architecture follows a modular pipeline pattern where different components handle specific stages of query processing.

graph TD
    A[User Query] --> B[Query Engine]
    B --> C[Retriever]
    C --> D[Node Postprocessor]
    D --> E[Response Synthesizer]
    E --> F[LLM]
    F --> G[Response]
    
    H[Vector Store Index] --> C
    I[Summary Index] --> C
    J[Knowledge Graph Index] --> C

Core Components

Component	Purpose	Location
BaseQueryEngine	Abstract base class defining the query interface	`llama_index.core.query_engine`
RetrieverQueryEngine	Default query engine using retrievers	`retriever_query_engine.py`
SubQuestionQueryEngine	Decomposes complex queries into sub-questions	`sub_question_query_engine.py`
ResponseSynthesizer	Generates responses from retrieved context	`llama_index.core.response_synthesizers`

Sources: llama-index-core/llama_index/core/query_engine/retriever_query_engine.py

RetrieverQueryEngine

The RetrieverQueryEngine is the default query engine implementation that combines retrieval with response synthesis.

Initialization

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

Constructor Parameters

Parameter	Type	Default	Description
retriever	BaseRetriever	Required	The retriever used to fetch relevant nodes
response_synthesizer	BaseSynthesizer	None	Synthesizer for generating responses
node_postprocessors	List[BaseNodePostprocessor]	[]	Post-processors applied after retrieval
callback_manager	CallbackManager	None	Manages callbacks for query events

Sources: llama-index-core/llama_index/core/query_engine/retriever_query_engine.py:40-60

Query Flow

sequenceDiagram
    participant User
    participant QueryEngine
    participant Retriever
    participant Postprocessor
    participant Synthesizer
    participant LLM
    
    User->>QueryEngine: query(question)
    QueryEngine->>Retriever: retrieve(query_str)
    Retriever-->>QueryEngine: nodes[]
    QueryEngine->>Postprocessor: postprocess(nodes)
    Postprocessor-->>QueryEngine: filtered_nodes[]
    QueryEngine->>Synthesizer: synthesize(query_str, nodes)
    Synthesizer->>LLM: generate(prompt)
    LLM-->>Synthesizer: response
    Synthesizer-->>QueryEngine: Response
    QueryEngine-->>User: Response

SubQuestionQueryEngine

The SubQuestionQueryEngine handles complex queries by decomposing them into simpler sub-questions that can be answered independently.

Use Cases

Queries requiring information from multiple data sources
Complex questions that benefit from step-by-step reasoning
Multi-hop questions requiring logical deduction

Configuration

from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.callbacks import CallbackManager

query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    callback_manager=CallbackManager([callback]),
    verbose=True
)

Parameters

Parameter	Type	Default	Description
query_engine_tools	List[QueryEngineTool]	Required	List of query engines and their descriptions
response_synthesizer	BaseSynthesizer	None	Response synthesizer to use
sub_question_name	str	"sub_question"	Name for sub-question events
parent_name	str	"parent_question"	Name for parent question events
callback_manager	CallbackManager	None	Callback manager for events
verbose	bool	False	Enable verbose output

Sources: llama-index-core/llama_index/core/query_engine/sub_question_query_engine.py:50-80

Response Synthesizers

Response Synthesizers transform retrieved context into natural language responses.

Available Synthesizer Types

Synthesizer	Description	Use Case
CompactAndRefine	Compacts retrieved context before generating	Large retrieval results
TreeSummarize	Hierarchically summarizes retrieved nodes	Comprehensive responses
SimpleSummarize	Direct concatenation and summarization	Quick, simple responses
Refine	Iteratively improves response quality	High-quality refinement
Accumulate	Combines responses from multiple sources	Multi-source queries
Generation	Direct LLM generation from context	Simple generation tasks

Base Interface

class BaseSynthesizer(ABC):
    @abstractmethod
    async def synthesize(
        self,
        query: QueryBundle,
        nodes: List[NodeWithScore],
        **kwargs: Any
    ) -> Response:
        pass

Sources: llama-index-core/llama_index/core/response_synthesizers/base.py:30-50

Vector Store Index Query Engine

The VectorStoreIndex provides built-in query engine creation through the as_query_engine() method.

Factory Method Parameters

index.as_query_engine(
    query_mode: str = "default",
    similarity_top_k: int = 10,
    vector_store_query_mode: str = "default",
    alpha: Optional[float] = None,
    **kwargs: Any
) -> BaseQueryEngine

Parameter	Type	Default	Description
query_mode	str	"default"	Query execution mode
similarity_top_k	int	10	Number of top results to retrieve
vector_store_query_mode	str	"default"	Vector store specific query mode
alpha	float	None	Hybrid search weight (0-1, default 0.5)

Sources: llama-index-core/llama_index/core/indices/vector_store/base.py

Query Modes

Mode	Description
`default`	Standard retrieval based on similarity
`mmr`	Maximum Marginal Relevance for diverse results
`hybrid`	Combines sparse and dense retrieval

Query Engine Tool

For agent-based workflows, query engines can be wrapped as tools using the QueryEngineTool class.

from llama_index.core.tools import QueryEngineTool

tool = QueryEngineTool(
    query_engine=query_engine,
    metadata=ToolMetadata(
        name="website_index",
        description="Useful for answering questions about text on websites",
    )
)

Sources: llama-index-core/llama_index/core/query_engine/sub_question_query_engine.py:100-120

Advanced Configuration

Node Post-processors

Post-processors filter and enhance retrieved nodes before synthesis.

from llama_index.core.postprocessor import SimilarityPostprocessor

query_engine = index.as_query_engine(
    node_postprocessors=[
        SimilarityPostprocessor(similarity_cutoff=0.7)
    ]
)

Custom Query Engines

Create custom query engines by extending the base class:

from llama_index.core.query_engine import BaseQueryEngine

class CustomQueryEngine(BaseQueryEngine):
    def __init__(self, retriever, synthesizer):
        self._retriever = retriever
        self._synthesizer = synthesizer
    
    async def _aquery(self, query_bundle: QueryBundle) -> Response:
        nodes = await self._retriever.aretrieve(query_bundle)
        response = await self._synthesizer.synthesize(
            query_bundle, nodes
        )
        return response

Async Query Execution

Query engines support both sync and async execution patterns:

# Synchronous
response = query_engine.query("What is LlamaIndex?")

# Asynchronous
response = await query_engine.aquery("What is LlamaIndex?")

Integration with Vector Indices

Query engines integrate with various index types:

Index Type	Default Query Engine	Features
VectorStoreIndex	RetrieverQueryEngine	Semantic similarity search
SummaryIndex	RetrieverQueryEngine	Full document retrieval
KnowledgeGraphIndex	RetrieverQueryEngine	Graph-based traversal
ComposableGraph	SubQuestionQueryEngine	Multi-index queries

Best Practices

Choose appropriate top_k: Balance between response quality and speed (typically 3-10 for most use cases)

Use sub-question engine for complex queries: When queries require reasoning across multiple sources

Configure similarity thresholds: Filter low-quality matches using post-processors

Enable callbacks for debugging: Monitor query execution flow and performance

Select appropriate synthesizers: Match the synthesizer type to your response quality requirements

Summary

Query Engines in LlamaIndex provide a flexible, extensible framework for retrieving and synthesizing information from indexed data. The modular architecture allows for customization at every stage of the query pipeline, from retrieval configuration to response generation.

Key Takeaways:

Query engines orchestrate the retrieval-synthesis pipeline
RetrieverQueryEngine handles standard query flows
SubQuestionQueryEngine decomposes complex queries
Response synthesizers generate final output from context
Extensive configuration options enable fine-tuned control

Sources: llama-index-core/llama_index/core/query_engine/retriever_query_engine.py Sources: llama-index-core/llama_index/core/query_engine/sub_question_query_engine.py Sources: llama-index-core/llama_index/core/response_synthesizers/base.py Sources: llama-index-core/llama_index/core/indices/vector_store/base.py

Sources: [llama-index-core/llama_index/core/query_engine/__init__.py]()

Retrieval and Reranking

Related topics: Query Engines, Storage Systems

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Retriever Abstraction

Continue reading this section for the full explanation and source context.

Section Recursive Retriever

Continue reading this section for the full explanation and source context.

Section Property Graph Retriever

Continue reading this section for the full explanation and source context.

Related topics: Query Engines, Storage Systems

Retrieval and Reranking

Overview

Retrieval and Reranking are fundamental components in LlamaIndex's architecture for building effective Retrieval-Augmented Generation (RAG) systems. The retrieval system identifies relevant context from various data sources, while the reranking system reorders retrieved results to optimize relevance using advanced techniques like LLM-based scoring.

In LlamaIndex, retrieval is handled through a flexible retriever abstraction that supports multiple retrieval strategies including vector-based search, keyword search, and hybrid approaches. Reranking serves as a post-processing step that improves result quality by reordering retrieved nodes based on more sophisticated relevance criteria.

Architecture Overview

graph TD
    A[Query Input] --> B[Retrieval Phase]
    B --> C[Vector/Knowledge Graph Retrieval]
    C --> D[Initial Node Set]
    D --> E[Reranking Phase]
    E --> F[LLM Reranker]
    F --> G[Reordered Results]
    G --> H[Response Generation]
    
    I[Document Sources] --> J[Indexing]
    J --> K[Vector Store / Graph Store]
    K --> C

Retrieval Components

Retriever Abstraction

LlamaIndex provides a base BaseRetriever class that defines the interface for all retrieval implementations. Retrievers work in conjunction with indices to fetch relevant nodes from vector stores or knowledge graphs.

Core Retriever Classes:

Component	File Path	Purpose
`BaseRetriever`	`llama-index-core/llama_index/core/retrievers/`	Abstract base for all retrievers
`RecursiveRetriever`	`llama-index-core/llama_index/core/retrievers/recursive_retriever.py`	Multi-level recursive retrieval
`PropertyGraphRetriever`	`llama-index-core/llama_index/core/indices/property_graph/retriever.py`	Graph-based retrieval

Recursive Retriever

The RecursiveRetriever enables multi-level, hierarchical retrieval across different data sources and node types. It supports recursive traversal of indices and can fetch related nodes across different retrieval strategies.

Key Features:

Recursive node resolution across index hierarchies
Support for multiple retriever types in a chain
Handling of nested document structures

Source: llama-index-core/llama_index/core/retrievers/recursive_retriever.py

Property Graph Retriever

The Property Graph Retriever leverages knowledge graphs for retrieval, enabling structured queries over entity-relationship data. This retriever is particularly effective for complex queries requiring relationship-aware context.

Capabilities:

Graph traversal-based retrieval
Entity filtering and relationship queries
Support for hybrid graph + vector search

Source: llama-index-core/llama_index/core/indices/property_graph/retriever.py:1-100

Reranking System

Purpose and Role

Reranking improves retrieval quality by reordering initially retrieved candidates using more sophisticated relevance models. After an initial retrieval pass identifies candidate nodes, rerankers evaluate and reorder these results to maximize relevance to the query.

LLM Reranker

The LLMRerank post-processor uses a Language Model to score and reorder retrieved nodes based on semantic relevance. This approach provides higher quality ranking compared to simple vector similarity.

Key Parameters:

Parameter	Type	Default	Description
`top_n`	`int`	`None`	Number of top results to return after reranking
`choice_batch_size`	`int`	`10`	Batch size for LLM ranking choices
`llm`	`BaseLLM`	`None`	Language model for scoring
`verbose`	`bool`	`False`	Enable verbose output

Source: llama-index-core/llama_index/core/postprocessor/llm_rerank.py

Node Post-Processors

The NodePostprocessor class provides additional filtering and transformation capabilities for retrieved nodes. These processors operate on the node level and can apply various transformations before final output.

Common Post-Processing Operations:

Duplicate removal
Similarity threshold filtering
Metadata-based filtering

Source: llama-index-core/llama_index/core/postprocessor/node.py

Data Flow

graph LR
    A[User Query] --> B[Vector Search]
    B --> C[Top-K Nodes]
    C --> D[Post-Processors]
    D --> E[LLM Reranker]
    E --> F[Reranked Nodes]
    F --> G[Context for LLM]
    
    H[Documents] --> I[Indexing Pipeline]
    I --> J[Embedding Model]
    J --> K[Vector Store]
    K --> B

Integration with Data Loaders

LlamaIndex's retrieval system integrates seamlessly with various data loaders that prepare documents for indexing and retrieval.

Supported Data Sources

Reader	Use Case	Integration
`DoclingReader`	PDF, DOCX, HTML	`llama-index-readers-docling`
`SimpleWebPageReader`	Static websites	`llama-index-readers-web`
`RemoteDepthReader`	Multi-level URL crawling	`llama-index-readers-remote-depth`
`WikipediaReader`	Wikipedia articles	`llama-index-readers-wikipedia`
`SemanticScholarReader`	Academic papers	`llama-index-readers-semanticscholar`

Source: llama-index-integrations/readers/llama-index-readers-docling/README.md

Document Processing Pipeline

Documents loaded through readers undergo the following processing:

Parsing - Extract text content from various formats (PDF, DOCX, HTML)
Node Parsing - Split documents into semantic chunks (nodes)
Embedding - Generate vector embeddings for each node
Indexing - Store nodes and embeddings in appropriate stores
Retrieval - Fetch relevant nodes based on queries

Source: llama-index-integrations/readers/llama-index-readers-web/llama_index/readers/web/simple_web/README.md

Usage Patterns

Basic Retrieval with Reranking

from llama_index.core import VectorStoreIndex
from llama_index.core.postprocessor import LLMRerank

# Load documents and create index
index = VectorStoreIndex.from_documents(documents)

# Configure reranking
reranker = LLMRerank(
    top_n=5,
    choice_batch_size=10
)

# Query with reranking
query_engine = index.as_query_engine(
    node_postprocessors=[reranker]
)

response = query_engine.query("Your question here")

Recursive Retrieval

from llama_index.core.retrievers import RecursiveRetriever

# Configure recursive retrieval across multiple levels
recursive_retriever = RecursiveRetriever(
    retriever_dict={
        "root": vector_retriever,
        "documents": document_retriever
    }
)

Configuration Options

Retrieval Configuration

Option	Description	Applies To
`similarity_top_k`	Number of initial candidates	Vector retrieval
`retrieval_mode`	Vector, keyword, or hybrid	Hybrid search
`node_postprocessors`	List of post-processing steps	All retrievers

Reranking Configuration

Option	Description	Default
`top_n`	Final number of results	5
`score_threshold`	Minimum relevance score	None
`model`	Reranking model	gpt-3.5-turbo

Advanced Topics

Hybrid Retrieval with Reranking

Combining vector and keyword search with LLM reranking provides robust retrieval across diverse query types:

Vector Search - Captures semantic similarity
Keyword Search - Captures exact term matching
LLM Reranking - Optimizes final ordering

Custom Retrievers

Developers can create custom retrievers by extending BaseRetriever:

from llama_index.core.retrievers import BaseRetriever

class CustomRetriever(BaseRetriever):
    def _retrieve(self, query_bundle):
        # Custom retrieval logic
        pass

Summary

Retrieval and Reranking in LlamaIndex form a two-phase system where initial retrieval identifies candidate nodes and reranking optimizes their ordering. The architecture supports multiple retrieval strategies (vector, graph, recursive) and leverages LLM-based reranking for improved result quality. Integration with various data loaders enables seamless indexing from diverse sources, while the post-processor abstraction allows flexible pipeline customization.

Source: https://github.com/run-llama/llama_index / Human Manual

Agent Framework

Overview

The LlamaIndex Agent Framework provides a flexible, extensible system for building AI agents that can reason, plan, and execute actions using tools. The framework enables the creation of both single-agent and multi-agent systems capable of interacting with external data sources, performing complex reasoning tasks, and orchestrating workflows.

Agents in LlamaIndex are designed to combine large language model (LLM) capabilities with structured tool usage, memory management, and workflow orchestration. The framework supports various agent types including ReAct (Reasoning + Acting) agents and workflow-based agents.

Sources: llama-index-core/llama_index/core/agent/workflow/base_agent.py:1-50

Architecture Overview

graph TD
    A[User Query] --> B[Agent]
    B --> C[Reasoning Engine]
    C --> D[Tool System]
    D --> E[External Tools]
    C --> F[Memory]
    B --> G[Workflow Orchestrator]
    G --> H[Sub-Agents]
    H --> D

The framework is built on several key components that work together to enable sophisticated agent behaviors:

Component	Purpose
Agent	Core entity that processes queries and generates responses
Reasoning Engine	Handles thought processes and decision making
Tool System	Provides access to external functions and APIs
Memory	Stores conversation history and intermediate results
Workflow Orchestrator	Manages complex multi-step tasks

Sources: llama-index-core/llama_index/core/agent/workflow/base_agent.py:50-100

ReAct Agent

The ReAct (Synergizing Reasoning and Acting) agent implements a reasoning loop that combines thought processes with tool actions. This agent type is particularly effective for tasks requiring logical deduction and external information retrieval.

ReAct Formatter

The ReAct formatter is responsible for constructing prompts that guide the agent through the reasoning-action-observation cycle. It defines the structure of thoughts, actions, and observations in the agent's prompt.

graph LR
    A[Thought] --> B[Action]
    B --> C[Observation]
    C --> A

#### Key Components

Component	Description
`system_prompt`	Instructions for the agent's role and behavior
`tool_prompt`	Description of available tools
`formatter`	Defines the format for thoughts, actions, observations
`examples`	Few-shot examples for better performance

Sources: llama-index-core/llama_index/core/agent/react/formatter.py:1-80

ReAct Output Parsing

The ReAct agent uses specialized output parsers to extract structured information from LLM responses:

class ReActOutputParser:
    def parse(self, output: str) -> ActionOutput:
        # Parse thought, action, and action input from output
        pass

This parsing enables the agent to:

Extract the reasoning thought process
Identify the tool to invoke
Extract the tool's input parameters
Process the tool's output as an observation

Sources: llama-index-core/llama_index/core/agent/react/formatter.py:80-150

Workflow-Based Agents

Workflow-based agents provide a more structured approach to agent execution, using state machines and defined steps to process queries.

Base Agent

The BaseAgent class provides the foundation for all agent implementations in the workflow system:

graph TD
    A[Input] --> B[State Machine]
    B --> C{Step Execution}
    C -->|Step 1| D[Process Step]
    D --> E[Update State]
    E --> C
    C -->|Complete| F[Generate Response]

#### Base Agent API

Method	Description
`run()`	Execute the agent with input
`reset()`	Reset agent state
`get_state()`	Retrieve current agent state
`set_state()`	Set agent state

Sources: llama-index-core/llama_index/core/agent/workflow/base_agent.py:100-200

Agent State Management

Agents maintain state throughout their execution, which includes:

State Component	Type	Purpose
`input`	str	Original user input
`current_step`	int	Current execution step
`memory`	Memory	Conversation history
`context`	dict	Additional context data
`steps`	List[Step]	Executed steps
`output`	Any	Final output

Sources: llama-index-core/llama_index/core/agent/workflow/base_agent.py:200-300

Tool System

The Tool System enables agents to interact with external resources and perform actions beyond text generation.

Function Tool

FunctionTool provides a decorator-based interface for creating tools from Python functions:

from llama_index.core.tools import FunctionTool

@FunctionTool.from_defaults
def search_database(query: str) -> str:
    """Search the knowledge base for relevant information."""
    # Implementation here
    return results

#### FunctionTool Parameters

Parameter	Type	Default	Description
`fn`	Callable	Required	The function to wrap
`name`	str	Function name	Tool identifier
`description`	str	Function docstring	Tool description for LLM
`fn_schema`	BaseModel	Auto-generated	Input schema
`return_direct`	bool	False	Return raw output

Sources: llama-index-core/llama_index/core/tools/function_tool.py:1-100

Tool Execution Flow

sequenceDiagram
    participant Agent
    participant ToolRegistry
    participant FunctionTool
    participant External

    Agent->>ToolRegistry: Request tool by name
    ToolRegistry->>FunctionTool: Get tool instance
    FunctionTool->>External: Execute function
    External-->>FunctionTool: Return result
    FunctionTool-->>Agent: Format response

Creating Custom Tools

Tools can be created using the @FunctionTool.from_defaults decorator:

@FunctionTool.from_defaults(name="calculator", description="Perform mathematical calculations")
def calculate(expression: str) -> str:
    """Evaluate a mathematical expression."""
    return str(eval(expression))

Or programmatically:

from llama_index.core.tools import FunctionTool

def my_function(arg1: str, arg2: int) -> str:
    return f"{arg1} repeated {arg2} times"

tool = FunctionTool.from_defaults(
    fn=my_function,
    name="my_tool",
    description="Custom tool description"
)

Sources: llama-index-core/llama_index/core/tools/function_tool.py:100-200

Multi-Agent Workflows

Multi-agent systems enable complex task decomposition where different specialized agents collaborate to solve problems.

Multi-Agent Workflow Architecture

graph TD
    A[Coordinator Agent] --> B[Specialist Agent 1]
    A --> C[Specialist Agent 2]
    A --> D[Specialist Agent N]
    B --> E[Tool 1]
    C --> F[Tool 2]
    D --> G[Tool N]
    B --> A
    C --> A
    D --> A

Workflow Communication

Agents communicate through a shared state and message-passing mechanism:

Message Type	Direction	Purpose
`task`	Coordinator → Specialist	Assign task
`result`	Specialist → Coordinator	Return results
`query`	Any → Any	Request information
`response`	Any → Any	Provide information

Sources: llama-index-core/llama_index/core/agent/workflow/multi_agent_workflow.py:1-100

Creating Multi-Agent Systems

from llama_index.core.agent.workflow import MultiAgentWorkflow

# Create specialized agents
research_agent = ReActAgent.from_tools(tools=[search_tool], name="researcher")
analysis_agent = ReActAgent.from_tools(tools=[analysis_tool], name="analyst")

# Create multi-agent workflow
workflow = MultiAgentWorkflow(agents=[research_agent, analysis_agent])

# Execute workflow
result = workflow.run(user_input="Analyze the latest research on AI")

Sources: llama-index-core/llama_index/core/agent/workflow/multi_agent_workflow.py:100-200

Tool Integration with LlamaIndex Readers

The Agent Framework integrates seamlessly with LlamaIndex's document readers, enabling agents to query and reason over loaded documents:

from llama_index.core import VectorStoreIndex
from llama_index.core.agent import ReActAgent

# Load documents
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)

# Create query engine tool
query_tool = index.as_query_engine()

# Create agent with query tool
agent = ReActAgent.from_tools(tools=[query_tool])
response = agent.chat("What is the main topic of these documents?")

This integration allows agents to:

Query vector databases
Retrieve relevant context
Synthesize information from multiple sources
Perform RAG (Retrieval-Augmented Generation)

Best Practices

Designing Effective Tools

Guideline	Rationale
Clear descriptions	Helps LLM understand when to use the tool
Structured outputs	Easier for agent to parse and use results
Error handling	Prevents agent crashes from tool failures
Idempotent operations	Enables safe retries

Agent Configuration

Parameter	Recommendation
`max_iterations`	Set based on task complexity (default: 10)
`timeout`	Allow sufficient time for tool execution
`memory_type`	Use conversation memory for multi-turn interactions
`tool_retriever`	Implement for large tool collections

Debugging Agents

Enable verbose mode to see agent's reasoning traces
Log tool inputs/outputs to verify correct tool usage
Test tools independently before combining with agent
Monitor token usage to prevent excessive spending

Memory Systems

Related topics: Agent Framework, Storage Systems

Section Related Pages

Continue reading this section for the full explanation and source context.

Section ChatMemoryBuffer

Continue reading this section for the full explanation and source context.

Section ChatSummaryMemoryBuffer

Continue reading this section for the full explanation and source context.

Section VectorMemory

Continue reading this section for the full explanation and source context.

Related topics: Agent Framework, Storage Systems

Memory Systems

Memory Systems in LlamaIndex provide persistent conversation history management for chat engines and agents. They enable AI applications to maintain context across multiple interactions, store user preferences, and retrieve relevant historical information during conversations.

Architecture Overview

Memory Systems follow a modular architecture that allows different memory implementations to be composed and used interchangeably. The core memory system supports multiple storage strategies including buffer-based, summary-based, and vector-based retrieval.

graph TD
    A[Chat Engine / Agent] --> B[Memory System]
    B --> C[ChatMemoryBuffer]
    B --> D[ChatSummaryMemoryBuffer]
    B --> E[VectorMemory]
    B --> F[Mem0Memory]
    C --> G[SimpleComposableMemory]
    D --> G
    E --> G
    F --> H[External Memory Services]
    
    G --> I[Storage Backend]
    H --> J[Mem0 Platform API]

Core Memory Components

ChatMemoryBuffer

ChatMemoryBuffer is the foundational memory component that stores conversation history in a simple buffer structure. It maintains a list of chat messages and provides methods for adding, retrieving, and managing conversation context.

Parameter	Type	Description
`chat_history`	`List[ChatMessage]`	List of conversation messages
`size`	`int`	Maximum number of messages to retain
`tokenizer`	`Callable`	Function to count tokens

Sources: llama-index-core/llama_index/core/memory/chat_memory_buffer.py

ChatSummaryMemoryBuffer

ChatSummaryMemoryBuffer extends the basic buffer with summarization capabilities. When the conversation exceeds the configured size, older messages are condensed into a summary rather than being discarded entirely.

Parameter	Type	Description
`llm`	`LLM`	LLM instance for generating summaries
`chat_history`	`List[ChatMessage]`	Initial conversation history
`size`	`int`	Maximum buffer size before summarization
`summary_exists`	`bool`	Flag indicating if summary is generated

Sources: llama-index-core/llama_index/core/memory/chat_summary_memory_buffer.py

VectorMemory

VectorMemory uses vector embeddings to store and retrieve conversation history. This enables semantic search within the conversation history, allowing the system to find relevant past messages based on meaning rather than exact matches.

Parameter	Type	Description
`vector_store`	`VectorStore`	Storage backend for embeddings
`embed_model`	`EmbeddingModel`	Model for generating embeddings
`index`	`VectorStoreIndex`	Index for efficient retrieval
`retriever`	`BaseRetriever`	Retrieval mechanism

Sources: llama-index-core/llama_index/core/memory/vector_memory.py

SimpleComposableMemory

SimpleComposableMemory provides a framework for combining multiple memory types into a unified interface. This allows different memory strategies to work together, leveraging the strengths of each approach.

Feature	Description
Memory Composition	Combine buffer, summary, and vector memories
Unified Interface	Single API for all memory operations
Flexible Retrieval	Query multiple memory sources simultaneously

Sources: llama-index-core/llama_index/core/memory/simple_composable_memory.py

Mem0 Memory Integration

The Mem0Memory integration provides access to the Mem0 Platform for advanced memory management. Mem0 offers enhanced capabilities for semantic memory storage, user preference tracking, and cross-session persistence.

Configuration Options

#### Client-Based Initialization

from llama_index.memory.mem0 import Mem0Memory

context = {"user_id": "user_1"}
memory = Mem0Memory.from_client(
    context=context,
    api_key="<your-mem0-api-key>",
    search_msg_limit=4,
)

#### Config Dictionary Initialization

memory = Mem0Memory.from_config(
    context=context,
    config={
        "provider": "openai",
        "config": {"model": "text-embedding-3-small"},
        "version": "v1.1",
    },
    search_msg_limit=4,
)

Context Parameters

The Mem0 context identifies the entity for which memory is stored:

Parameter	Description
`user_id`	Unique identifier for the user
`agent_id`	Unique identifier for the agent
`run_id`	Unique identifier for the conversation run

Sources: llama-index-integrations/memory/llama-index-memory-mem0/README.md

Usage Patterns

Integration with SimpleChatEngine

from llama_index.core import SimpleChatEngine
from llama_index.memory.mem0 import Mem0Memory

memory = Mem0Memory.from_client(
    context={"user_id": "user_1"},
    api_key="<your-api-key>",
)

chat_engine = SimpleChatEngine.from_defaults(
    llm=llm,
    memory=memory
)

response = chat_engine.chat("Hi, My name is Mayank")

Integration with FunctionAgent

from llama_index.core.tools import FunctionTool
from llama_index.memory.mem0 import Mem0Memory

memory = Mem0Memory.from_client(
    context={"user_id": "user_1"},
    api_key="<your-api-key>",
)

# Use memory with agent for persistent context
agent = FunctionAgent(
    llm=llm,
    tools=[call_tool, email_tool],
    memory=memory
)

Sources: llama-index-integrations/memory/llama-index-memory-mem0/README.md

Memory Workflow

sequenceDiagram
    participant User
    participant ChatEngine
    participant Memory
    participant Storage
    
    User->>ChatEngine: Send message
    ChatEngine->>Memory: Get context (search_msg_limit messages)
    Memory->>Storage: Query recent messages
    Storage-->>Memory: Return relevant messages
    Memory-->>ChatEngine: Context messages
    ChatEngine->>ChatEngine: Generate response
    ChatEngine->>Memory: Store new message
    Memory->>Storage: Persist message
    ChatEngine-->>User: Return response

Comparison of Memory Types

Memory Type	Storage Method	Use Case	Scalability
ChatMemoryBuffer	List/Buffer	Short conversations	Limited by token size
ChatSummaryMemoryBuffer	Condensed summaries	Long conversations	Better for extended chats
VectorMemory	Embeddings	Semantic search	Scales with vector store
Mem0Memory	External API	Production applications	Cloud-native scaling

Environment Configuration

For Mem0 integration, set the API key as an environment variable:

export MEM0_API_KEY="<your-mem0-api-key>"

For LLM integration within memory operations:

export OPENAI_API_KEY="<your-openai-api-key>"

Sources: llama-index-integrations/memory/llama-index-memory-mem0/README.md

Best Practices

Choose Appropriate Memory Type: Select based on conversation length and retrieval needs
Configure Token Limits: Set appropriate search_msg_limit to balance context and performance
Use Context Parameters: Always provide user_id, agent_id, or run_id for proper memory isolation
Consider Composability: Use SimpleComposableMemory for complex memory requirements
Monitor API Costs: When using Mem0, track API usage for cost optimization

Source: https://github.com/run-llama/llama_index / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

medium README/documentation is current enough for a first validation pass.

The project should not be treated as fully validated until this signal is reviewed.

medium Maintainer activity is unknown

Users cannot judge support quality until recent activity, releases, and issue response are checked.

medium no_demo

The project may affect permissions, credentials, data exposure, or host boundaries.

medium no_demo

The project may affect permissions, credentials, data exposure, or host boundaries.

Doramagic Pitfall Log

Doramagic extracted 6 source-linked risk signals. Review them before installing or handing real data to the project.

1. Capability assumption: README/documentation is current enough for a first validation pass.

Severity: medium
Finding: README/documentation is current enough for a first validation pass.
User impact: The project should not be treated as fully validated until this signal is reviewed.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: capability.assumptions | github_repo:560704231 | https://github.com/run-llama/llama_index | README/documentation is current enough for a first validation pass.

2. Maintenance risk: Maintainer activity is unknown

Severity: medium
Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: evidence.maintainer_signals | github_repo:560704231 | https://github.com/run-llama/llama_index | last_activity_observed missing

3. Security or permission risk: no_demo

Severity: medium
Finding: no_demo
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: downstream_validation.risk_items | github_repo:560704231 | https://github.com/run-llama/llama_index | no_demo; severity=medium

4. Security or permission risk: no_demo

Severity: medium
Finding: no_demo
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: risks.scoring_risks | github_repo:560704231 | https://github.com/run-llama/llama_index | no_demo; severity=medium

5. Maintenance risk: issue_or_pr_quality=unknown

Severity: low
Finding: issue_or_pr_quality=unknown。
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: evidence.maintainer_signals | github_repo:560704231 | https://github.com/run-llama/llama_index | issue_or_pr_quality=unknown

6. Maintenance risk: release_recency=unknown

Severity: low
Finding: release_recency=unknown。
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: evidence.maintainer_signals | github_repo:560704231 | https://github.com/run-llama/llama_index | release_recency=unknown

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using llama_index with real data or production workflows.

[[Feature Request]: Built-in LLM Failover for Reliability](https://github.com/run-llama/llama_index/issues/19631) - github / github_issue
[[Feature Request]: add (detailed) usage info to raw when using Structure](https://github.com/run-llama/llama_index/issues/19845) - github / github_issue
[[Bug]: thinking_delta not populated on AgentStream events when thinkin](https://github.com/run-llama/llama_index/issues/20349) - github / github_issue
[[Bug]: [llama-index-core] async_acquire() in TokenBucketRateLimiter and](https://github.com/run-llama/llama_index/issues/21603) - github / github_issue
[[Question]: how to add human-in-the-loop capability to ReActAgent?](https://github.com/run-llama/llama_index/issues/21599) - github / github_issue
Proposal: Agent Threat Rules detection integration for LlamaIndex - github / github_issue
Improve developer error message for unrecognized embedding names in `loa - github / github_issue
[[Bug]: Bedrock Converse streaming produces string tool_kwargs in `Tool](https://github.com/run-llama/llama_index/issues/21579) - github / github_issue
[[Bug]: Breaking Image/Index node fetching behavior after refactor](https://github.com/run-llama/llama_index/issues/19499) - github / github_issue
[[Bug]: PydanticUserError: The __modify_schema__ method is not supporte](https://github.com/run-llama/llama_index/issues/16540) - github / github_issue
[[Bug]: gemini-embedding-2 task instructions not implemented (task_type d](https://github.com/run-llama/llama_index/issues/21535) - github / github_issue
v0.14.21 - github / github_release

Source: Project Pack community evidence and pitfall evidence