Doramagic Project Pack · Human Manual

llama_index

LlamaIndex serves as the foundational layer for building AI applications that require sophisticated data ingestion, indexing, and querying capabilities. The framework enables developers to:

Introduction to LlamaIndex

Related topics: Core Architecture, Quick Start Guide

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Package Structure

Continue reading this section for the full explanation and source context.

Section Import Patterns

Continue reading this section for the full explanation and source context.

Section 1. Document Loaders

Continue reading this section for the full explanation and source context.

Related topics: Core Architecture, Quick Start Guide

Introduction to LlamaIndex

LlamaIndex is a comprehensive data framework designed for building LLM (Large Language Model) applications. It provides the essential tools, abstractions, and integrations needed to connect custom data sources to LLMs for retrieval-augmented generation (RAG), question-answering systems, and other AI-powered applications.

Overview

LlamaIndex serves as the foundational layer for building AI applications that require sophisticated data ingestion, indexing, and querying capabilities. The framework enables developers to:

  • Ingest data from various sources (PDFs, documents, websites, databases)
  • Process and chunk data into optimal segments for LLM consumption
  • Create vector indices for efficient semantic search
  • Build query engines and retrieval pipelines
  • Integrate with hundreds of external services and model providers

Sources: README.md:1-20

Core Architecture

The LlamaIndex framework follows a modular architecture with distinct components that work together to provide end-to-end data pipeline capabilities.

Package Structure

LlamaIndex offers two primary installation methods to accommodate different use cases:

PackageDescriptionUse Case
llama-indexStarter package with core + selected integrationsQuick start, common setups
llama-index-coreCore package onlyCustom, minimal deployments

Sources: README.md:45-55

Import Patterns

The framework uses a namespaced import system that distinguishes between core modules and integration packages:

# Core modules (included in llama-index-core)
from llama_index.core.xxx import ClassABC

# Integration modules (from separate packages)
from llama_index.xxx.yyy import SubclassABC

# Concrete examples
from llama_index.core.llms import LLM
from llama_index.llms.openai import OpenAI

Sources: README.md:56-68

Data Flow Architecture

The following diagram illustrates the typical data flow in a LlamaIndex application:

graph TD
    A[Data Sources] --> B[Readers/Loaders]
    B --> C[Documents]
    C --> D[Node Parsers]
    D --> E[Nodes/Chunks]
    E --> F[Vector Index]
    F --> G[Retriever]
    G --> H[Query Engine]
    H --> I[LLM Response]
    
    A1[Web Pages] --> B
    A2[PDFs] --> B
    A3[Databases] --> B
    A4[APIs] --> B

Key Components

1. Document Loaders

Document loaders (Readers) are responsible for ingesting data from external sources. LlamaIndex provides a vast ecosystem of readers:

ReaderPurposeSource
WikipediaReaderLoad Wikipedia pagesllama-index-readers-wikipedia
WholeSiteReaderScrape entire websitesllama-index-readers-web
DoclingReaderParse PDFs, DOCX, HTMLllama-index-readers-docling
RemoteDepthReaderExtract from URLs recursivelyllama-index-readers-remote-depth

#### Wikipedia Reader Example

from llama_index.readers.wikipedia import WikipediaReader

reader = WikipediaReader()
documents = reader.load_data(pages=["Page Title 1", "Page Title 2"])

Sources: llama-index-readers-wikipedia/README.md:1-25

#### Docling Reader Example

from llama_index.readers.docling import DoclingReader

reader = DoclingReader()
docs = reader.load_data(file_path="https://arxiv.org/pdf/2408.09869")

Sources: llama-index-readers-docling/README.md:1-30

2. Indices

Indices organize documents for efficient retrieval. LlamaIndex supports both managed indices and customizable self-hosted options.

#### Managed Indices

Managed indices like VectaraIndex provide fully hosted solutions:

from llama_index.indices.managed.vectara import VectaraIndex
from llama_index.core.schema import Document, MediaResource

docs = [
    Document(
        id_="doc1",
        text_resource=MediaResource(
            text="This is test text for Vectara integration.",
        ),
    ),
]
index = VectaraIndex.from_documents(docs)

Sources: llama-index-indices-managed-vectara/README.md:30-50

3. LLM Integrations

LlamaIndex provides integrations with numerous LLM providers through a standardized interface:

# Example: Contextual LLM Integration
from llama_index.llms.contextual import Contextual

llm = Contextual(model="contextual-clm", api_key="your_api_key")
response = llm.complete("Explain the importance of Grounded Language Models.")

Sources: llama-index-llms-contextual/README.md:1-20

Usage Patterns

Building a Simple RAG Pipeline

The most common pattern involves loading documents, creating an index, and querying it:

from llama_index.core import VectorStoreIndex
from llama_index.readers.docling import DoclingReader

# Step 1: Load documents
reader = DoclingReader()
documents = reader.load_data(file_path="document.pdf")

# Step 2: Create index
index = VectorStoreIndex.from_documents(documents)

# Step 3: Query
query_engine = index.as_query_engine()
response = query_engine.query("Summarize this document")

Retrieval-Only Pattern

For applications requiring only retrieval without generation:

retriever = index.as_retriever(similarity_top_k=2)
results = retriever.retrieve("How will users feel about this new tool?")

Sources: llama-index-indices-managed-vectara/README.md:50-65

LangChain Integration

LlamaIndex components can be used as tools within LangChain agents:

from langchain.agents import initialize_agent, Tool
from langchain.llms import OpenAI
from llama_index.core import VectorStoreIndex
from llama_index.readers.web import WholeSiteReader

# Initialize scraper
scraper = WholeSiteReader(prefix="https://docs.llamaindex.ai/en/stable/", max_depth=10)
documents = scraper.load_data(base_url="https://docs.llamaindex.ai/en/stable/")

# Create index
index = VectorStoreIndex.from_documents(documents)

# Define tools
tools = [
    Tool(
        name="Website Index",
        func=lambda q: index.query(q),
        description="Useful for answering questions about text on websites.",
    ),
]

Sources: llama-index-readers-web/llama_index/readers/web/whole_site/README.md:1-40

LlamaParse Platform

LlamaParse is a complementary platform (separate from the open-source LlamaIndex framework) focused on document agents and agentic OCR:

ComponentFunction
ParseAgentic OCR and document parsing (130+ formats)
ExtractStructured data extraction from documents
IndexIngest, index, and RAG pipelines
SplitSplit large documents into subcategories

Sources: README.md:75-85

Ecosystem Overview

LlamaIndex maintains an extensive ecosystem with over 300 integration packages available through LlamaHub:

graph LR
    subgraph "Data Sources"
        Web[Web]
        PDFs[PDFs]
        DB[Databases]
        APIs[APIs]
    end
    
    subgraph "LlamaIndex Core"
        Docs[Documents]
        Nodes[Nodes]
        Indices[Indices]
    end
    
    subgraph "LLM Providers"
        OpenAI[OpenAI]
        HuggingFace[HF]
        Local[Local Models]
    end
    
    Web --> Docs
    PDFs --> Docs
    DB --> Docs
    APIs --> Docs
    Docs --> Indices
    Indices --> OpenAI
    Indices --> HuggingFace
    Indices --> Local

Configuration Options

Common Reader Configuration Parameters

ParameterTypeDescriptionExample
file_pathstrPath to input file/URL"document.pdf"
prefixstrURL prefix for filtering"https://example.com/"
max_depthintMaximum recursion depth10
wheredictMetadata filter condition{"category": "AI"}
querylistSearch query text["search term"]

Sources: llama-index-readers-chroma/README.md:1-20

Installation

pip install llama-index

Minimal Installation

pip install llama-index-core

Individual Integrations

pip install llama-index-readers-wikipedia
pip install llama-index-readers-docling
pip install llama-index-llms-openai

Citation

If you use LlamaIndex in academic work, cite as:

@software{Liu_LlamaIndex_2022,
author = {Liu, Jerry},
doi = {10.5281/zenodo.1234},
month = {11},
title = {{LlamaIndex}},
url = {https://github.com/jerryjliu/llama_index},
year = {2022}
}

Sources: README.md:95-105

Next Steps

To continue learning LlamaIndex:

  1. Getting Started - Follow the starter example
  2. Concepts - Understand core concepts like Documents, Nodes, and Indices
  3. LlamaHub - Browse 300+ integrations for various data sources and LLM providers
  4. Examples - Explore Jupyter notebooks for detailed use cases

Sources: [README.md:1-20](https://github.com/run-llama/llama_index/blob/main/README.md)

Quick Start Guide

Related topics: Introduction to LlamaIndex, Documents and Nodes

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Creating a Virtual Environment

Continue reading this section for the full explanation and source context.

Section Installing the Development CLI

Continue reading this section for the full explanation and source context.

Section Querying Package Information

Continue reading this section for the full explanation and source context.

Related topics: Introduction to LlamaIndex, Documents and Nodes

Quick Start Guide

This guide provides a comprehensive introduction to getting started with LlamaIndex, covering environment setup, core installation methods, and essential development workflows.

Prerequisites

Before beginning, ensure your environment meets the following requirements:

RequirementVersion/Details
Python3.8 or higher
Package Manageruv (recommended) or pip
Operating SystemUnix-like (Linux, macOS), Windows with WSL
GitLatest stable version

Environment Setup

Creating a Virtual Environment

LlamaIndex recommends using uv for dependency management. Create a virtual environment as follows:

uv venv
source .venv/bin/activate

Sources: llama-dev/README.md:11

Installing the Development CLI

The llama-dev CLI tool is the official command-line interface for development, testing, and automation in the LlamaIndex monorepo.

Install it in editable mode:

uv pip install -e .

After installation, verify the CLI is available:

llama-dev --help

Sources: llama-dev/README.md:12-18

Core Concepts

graph TD
    A[LlamaIndex Project] --> B[Core Package: llama-index-core]
    A --> C[LLM Integrations]
    A --> D[Reader Integrations]
    A --> E[Callback Integrations]
    B --> F[VectorStoreIndex]
    B --> G[ServiceContext]
    B --> H[Document Loading]

The LlamaIndex framework consists of several key components:

ComponentPurpose
llama-index-coreCore framework functionality including indexing and querying
LLM IntegrationsConnectors for various language model providers
Reader IntegrationsData loaders for different document sources
Callback IntegrationsMonitoring and logging capabilities

Package Management

Querying Package Information

View information about specific packages in the monorepo:

# Get info for a specific package
llama-dev pkg info llama-index-core

# Get info for all packages
llama-dev pkg info --all

Executing Commands in Package Directories

Run commands within the context of specific packages:

# Run a command in a specific package
llama-dev pkg exec --cmd "uv sync" llama-index-core

# Run a command in all packages
llama-dev pkg exec --cmd "uv sync" --all

# Exit at first error
llama-dev pkg exec --cmd "uv" --all --fail-fast

Sources: llama-dev/README.md:26-41

Testing

Running Tests Across the Monorepo

Execute tests for specific packages or across all packages:

# Run tests for a specific package
llama-dev pkg test llama-index-core

# Run tests for all packages
llama-dev pkg test --all

Quick Test Verification

After making changes, verify core functionality:

llama-dev pkg exec --cmd "python -m pytest" llama-index-core

Basic LLM Integration Usage

Initializing an LLM

Different LLM providers follow similar initialization patterns:

from llama_index.llms.ollama import Ollama

# Initialize Ollama LLM
llm = Ollama(model="llama3.1:latest", request_timeout=120.0)
from llama_index.llms.mistralai import MistralAI

llm = MistralAI(api_key="<your-api-key>")

Sources: llama-index-integrations/llms/llama-index-llms-ollama/README.md:30-35 Sources: llama-index-integrations/llms/llama-index-llms-mistralai/README.md:16-18

Generating Completions

# Simple completion
resp = llm.complete("Who is Paul Graham?")
print(resp)
# Chat completion with messages
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role=MessageRole.SYSTEM,
        content="You are a helpful assistant."
    ),
    ChatMessage(role=MessageRole.USER, content="How to make cake?"),
]
resp = llm.chat(messages)
print(resp)

Sources: llama-index-integrations/llms/llama-index-llms-modelscope/README.md:24-37

Streaming Responses

# Stream completions
resp = llm.stream_complete("Paul Graham is ")
for r in resp:
    print(r.delta, end="")
# Stream chat responses
resp = llm.stream_chat([message])
for r in resp:
    print(r.delta, end="")

Sources: llama-index-integrations/llms/llama-index-llms-mistralai/README.md:40-48

Building an Index from Documents

Basic Index Creation

from llama_index.core import VectorStoreIndex

# Create index from documents
index = VectorStoreIndex.from_documents(documents)
# Query the index
query_engine = index.as_query_engine()
response = query_engine.query("What is the main topic?")
print(response)

Loading Data from URLs

from llama_index.readers.web import WholeSiteReader

# Initialize the scraper
scraper = WholeSiteReader(
    prefix="https://docs.llamaindex.ai/en/stable/",
    max_depth=10,
)

# Start scraping from a base URL
documents = scraper.load_data(
    base_url="https://docs.llamaindex.ai/en/stable/"
)

# Create index
index = VectorStoreIndex.from_documents(documents)
index.query("What language is on this website?")

Sources: llama-index-integrations/readers/llama-index-readers-web/llama_index/readers/web/whole_site/README.md:14-34

Configuration Options

Key Parameters

ParameterDescriptionDefault Value
modelLLM model identifierRequired
api_keyAPI key for the providerRequired for cloud providers
request_timeoutRequest timeout in seconds30.0
temperatureSampling temperature0.7
max_tokensMaximum tokens to generateProvider-specific
context_windowMaximum context lengthProvider-specific

Environment Variables

Set API keys as environment variables before initialization:

export KONKO_API_KEY=<your-api-key>
export OPENAI_API_KEY=<your-api-key>
import os
os.environ["KONKO_API_KEY"] = "<your-api-key>"

Sources: llama-index-integrations/llms/llama-index-llms-konko/README.md:15-20

Common Workflows

graph LR
    A[Setup Environment] --> B[Install llama-dev]
    B --> C[Explore Packages]
    C --> D{Development Goal}
    D -->|Testing| E[Run Tests]
    D -->|Integration| F[Configure LLM]
    D -->|Data Loading| G[Set up Readers]
    E --> H[Modify Code]
    F --> H
    G --> H
    H --> I[Verify Changes]
    I --> E

Troubleshooting

Common Issues

IssueSolution
CLI not foundEnsure virtual environment is activated
API key errorsVerify environment variables are set
Package import errorsRun uv sync in the package directory
Timeout errorsIncrease request_timeout parameter

Verification Commands

# Check installation
llama-dev --version

# Verify package structure
llama-dev pkg info --all

# Test core imports
python -c "import llama_index; print(llama_index.__version__)"

Next Steps

After completing this quick start guide:

  1. Explore specific LLM integrations for your preferred provider
  2. Review reader integrations for your data sources
  3. Study the core API documentation for advanced indexing strategies
  4. Join the community for support and updates

Sources: [llama-dev/README.md:11]()

Core Architecture

Related topics: Introduction to LlamaIndex, Integration Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Section High-Level Architecture Diagram

Continue reading this section for the full explanation and source context.

Section Purpose and Role

Continue reading this section for the full explanation and source context.

Section Base LLM Interface

Continue reading this section for the full explanation and source context.

Related topics: Introduction to LlamaIndex, Integration Architecture

Core Architecture

Overview

LlamaIndex is a data framework for building LLM-powered applications. The Core Architecture establishes the fundamental building blocks that enable developers to connect large language models with their custom data sources. This architectural foundation provides a layered, modular approach where each component—from language model interfaces to response handling—follows consistent patterns and abstractions.

The core architecture serves as the abstraction layer between raw data ingestion and sophisticated LLM-powered querying. It separates concerns by defining clear interfaces for language models (LLMs), embedding services, document processing, indexing, and response generation. This design allows developers to swap implementations, extend functionality, and maintain clean separation between components.

System Components

High-Level Architecture Diagram

graph TD
    subgraph "Data Layer"
        Documents[Documents]
        Nodes[Nodes]
        Index[Index]
    end
    
    subgraph "Core Abstractions"
        LLMs[LLM Base]
        Embeddings[Embedding Base]
        Response[Response Schema]
    end
    
    subgraph "Service Layer"
        VectorStore[Vector Store]
        StorageContext[Storage Context]
    end
    
    subgraph "Application Layer"
        Query[Query Engine]
        Chat[Chat Engine]
        Agent[Agent]
    end
    
    Documents --> NodeParser
    NodeParser --> Nodes
    Nodes --> Index
    Index --> Query
    Query --> Response
    LLMs --> Query
    Embeddings --> Index

Language Model (LLM) Abstraction

Purpose and Role

The LLM base abstraction (llama_index.core.base.llms.base) defines the contract that all language model implementations must follow. This abstraction enables LlamaIndex to support multiple LLM providers—including OpenAI, Anthropic, local models, and custom implementations—through a unified interface.

Sources: llama-index-core/llama_index/core/base/llms/base.py:1-50

Base LLM Interface

The LLM base class provides the following core methods:

MethodPurposeParameters
complete()Synchronous text completionprompt: str, formatted: bool = False, **kwargs
stream_complete()Streaming text completionprompt: str, formatted: bool = False, **kwargs
chat()Synchronous chat completionmessages: List[ChatMessage], **kwargs
stream_chat()Streaming chat completionmessages: List[ChatMessage], **kwargs

LLM Class Hierarchy

classDiagram
    class LLM {
        <<abstract>>
        +complete()
        +stream_complete()
        +chat()
        +stream_chat()
        +metadata: LLMMetadata
    }
    
    class LLMMetadata {
        +model: str
        +temperature: float
        +top_p: int
        +max_tokens: Optional[int]
        +context_window: int
        +is_chat_model: bool
        +is_function_calling_model: bool
    }
    
    class ChatMessage {
        +role: MessageRole
        +content: str
        +additional_kwargs: Dict
    }
    
    LLM --> LLMMetadata
    LLM --> ChatMessage

Sources: llama-index-core/llama_index/core/base/llms/base.py:50-120

Message Roles

The MessageRole enum defines valid roles for chat messages:

RoleDescription
SYSTEMSystem-level instructions
USERUser-generated content
ASSISTANTModel-generated responses
FUNCTIONFunction call results

Embedding Abstraction

Purpose and Role

The embedding base (llama_index.core.base.embeddings.base) provides the interface for text vectorization. Embeddings transform textual content into numerical vectors that enable semantic similarity searches. This abstraction supports various embedding providers while maintaining a consistent API.

Sources: llama-index-core/llama_index/core/base/embeddings/base.py:1-60

Embedding Interface Methods

MethodPurposeReturn Type
get_query_embedding()Embed a single query stringList[float]
get_text_embedding()Embed a single text stringList[float]
get_text_embedding_batch()Embed multiple texts in batchList[List[float]]
get_query_embedding_batch()Embed multiple queries in batchList[List[float]]]

Embedding Configuration

graph LR
    A[Text Input] --> B[Embedding Model]
    B --> C[Dimension: 384-1536]
    C --> D[Normalized Vector]

Sources: llama-index-core/llama_index/core/base/embeddings/base.py:60-100

Response Schema

Purpose and Role

The response schema (llama_index.core.base.response.schema) defines the data structures used throughout LlamaIndex for returning query results, streaming responses, and structured outputs. This ensures consistent response handling across different query types and engines.

Sources: llama-index-core/llama_index/core/base/response/schema.py:1-80

Core Response Models

ClassPurpose
ResponseWraps text responses with sources
StreamingResponseHandles streaming token outputs
ResponseModeEnum for response generation modes
SourcesContainer for source nodes and metadata

Response Mode Enumeration

graph TD
    A[Query] --> B{Response Mode}
    B --> C[default]
    B --> D[refine]
    B --> E[compact]
    B --> F[accumulate]
    B --> G[compact_accumulate]
    
    C --> H[Single pass response]
    D --> I[Iterative refinement]
    E --> J[Compact and respond]
    F --> K[Aggregate node responses]
    G --> L[Compact then accumulate]

Sources: llama-index-core/llama_index/core/base/response/schema.py:30-50

Core Types System

Type Definitions

The types module (llama_index.core.types) defines foundational enumerations and type aliases used throughout the framework:

TypeDescription
ModelTypeDefines model categories (e.g., LLM, EMBEDDING)
PromptTypeCategorizes prompts (e.g., SUMMARY, QUERY)
NodeTypeDefines node kinds (e.g., TEXT, DOCUMENT)

Sources: llama-index-core/llama_index/core/types.py:1-60

Node Parser Types

classDiagram
    class Node {
        <<abstract>>
        +id_: str
        +embedding: Optional[List[float]]
        +metadata: Dict[str, Any]
        +relationships: Dict[NodeRelationship, Node]
        +excluded_embed_metadata_keys: List[str]
        +excluded_llm_metadata_keys: List[str]
    }
    
    class TextNode {
        +text: str
        +start_char_idx: Optional[int]
        +end_char_idx: Optional[int]
    }
    
    class Document {
        +text: str
        +doc_id: str
        +embedding: Optional[List[float]]
    }
    
    Node <|-- TextNode
    Node <|-- Document

Document and Node Model

Document Structure

Documents represent the top-level container for source data. Each document contains metadata and can be broken down into smaller nodes for indexing:

FieldTypeDescription
doc_idstrUnique document identifier
textstrFull text content
metadataDict[str, Any]Associated metadata
embeddingOptional[List[float]]Pre-computed embedding

Node Relationships

Nodes maintain relationships with other nodes through the NodeRelationship enum:

RelationshipDescription
SOURCEParent document relationship
PREVIOUSPrevious sibling node
NEXTNext sibling node
PARENTParent node in hierarchy
CHILDChild node in hierarchy

Sources: llama-index-core/llama_index/core/node_parser/node.py:30-80

Storage Architecture

Storage Context

The StorageContext manages persistence layers for various data components:

graph TD
    StorageContext --> VectorStore
    StorageContext --> DocStore
    StorageContext --> IndexStore
    StorageContext --> GraphStore
    
    VectorStore --> Milvus[Milvus]
    VectorStore --> Chroma[Chroma]
    VectorStore --> Pinecone[Pinecone]
    
    DocStore --> MongoDB[MongoDB]
    DocStore --> Redis[Redis]
    DocStore --> Simple[SimpleKVStore]

Sources: llama-index-core/llama_index/core/storage/storage_context.py:1-50

Storage Components

ComponentPurpose
vector_storeStores embedding vectors for similarity search
doc_storeStores serialized nodes and documents
index_storeStores index metadata and configurations
graph_storeStores knowledge graph relationships

Index Architecture

Base Index Structure

Indexes provide the mechanism for organizing and querying documents. The base index class establishes the contract for all index implementations:

graph LR
    A[Documents] --> B[Index Construction]
    B --> C[Node Parsing]
    C --> D[Embedding Generation]
    D --> E[Vector Storage]
    E --> F[Queryable Index]

Index Types

Index TypeUse Case
VectorStoreIndexSemantic search over embeddings
SummaryIndexDocument summarization
KeywordTableIndexKeyword-based retrieval
KnowledgeGraphIndexGraph-based knowledge representation

Sources: llama-index-core/llama_index/core/indices/base.py:1-80

Query Engine Architecture

Query Flow

sequenceDiagram
    participant User
    participant QueryEngine
    participant Retriever
    participant LLM
    participant Response
    
    User->>QueryEngine: Query Request
    QueryEngine->>Retriever: Retrieve Nodes
    Retriever-->>QueryEngine: Source Nodes
    QueryEngine->>LLM: Synthesize Response
    LLM-->>QueryEngine: Response
    QueryEngine->>Response: Format Output
    Response-->>User: Formatted Answer

Retriever Types

RetrieverDescription
VectorRetrieverEmbedding-based similarity search
KeywordRetrieverBM25 or keyword matching
HybridRetrieverCombined vector and keyword search
SentenceWindowRetrieverContextual window retrieval

Configuration and Extensibility

Service Context

The ServiceContext bundles together the core service components:

ParameterTypeDefaultDescription
llmLLMOpenAI()Language model instance
embed_modelEmbeddingOpenAIEmbedding()Embedding model instance
node_parserNodeParserSentenceSplitter()Text chunking strategy
prompt_helperPromptHelperAuto-calculatedPrompt size optimization

Customization Patterns

graph TD
    subgraph "Extension Points"
        CustomLLM[Custom LLM Implementation]
        CustomEmbed[Custom Embedding Model]
        CustomParser[Custom Node Parser]
        CustomStore[Custom Storage Backend]
    end
    
    CustomLLM -->|inherits| LLMBase[LLM Base]
    CustomEmbed -->|inherits| EmbedBase[Embedding Base]
    CustomParser -->|inherits| NodeParserBase[NodeParser Base]
    CustomStore -->|inherits| StorageContextBase[StorageContext Base]

Summary

The Core Architecture of LlamaIndex establishes a modular, extensible framework built on well-defined abstractions. The layered architecture—from base interfaces like LLM and Embedding through storage and indexing components to application-layer query engines—enables developers to:

  1. Swap implementations without changing application code
  2. Extend functionality through inheritance and composition
  3. Maintain clean separation between concerns
  4. Support multiple providers through unified interfaces

The architecture follows consistent patterns across components, making the framework predictable and learnable while supporting the diverse requirements of production LLM applications.

Sources: [llama-index-core/llama_index/core/base/llms/base.py:1-50]()

Integration Architecture

Related topics: Core Architecture, Retrieval and Reranking

Section Related Pages

Continue reading this section for the full explanation and source context.

Section LLM Integrations

Continue reading this section for the full explanation and source context.

Section Reader Integrations

Continue reading this section for the full explanation and source context.

Section Embedding Integrations

Continue reading this section for the full explanation and source context.

Related topics: Core Architecture, Retrieval and Reranking

Integration Architecture

Overview

LlamaIndex employs a modular integration architecture that extends the core framework's capabilities through a comprehensive ecosystem of pluggable components. The integration system allows developers to connect LlamaIndex with external services, APIs, local models, and specialized tools without modifying the core library. This architecture follows a provider-based pattern where each integration package implements standardized interfaces to ensure compatibility and consistent behavior across different external systems.

The integration architecture serves as the bridge between LlamaIndex's core data structures and the diverse landscape of LLM providers, embedding services, document loaders, and auxiliary tools. By maintaining well-defined contracts between components, the system enables seamless swapping of implementations while preserving the overall workflow of building retrieval-augmented generation (RAG) pipelines and query engines.

Integration Categories

LlamaIndex organizes its integrations into distinct categories, each addressing a specific aspect of the LLM application development workflow. The categorization ensures logical separation of concerns and simplifies dependency management for end users.

LLM Integrations

LLM (Large Language Model) integrations provide adapters for connecting to various language model providers. These integrations implement the unified LLM interface defined in llama_index.core.llms, allowing developers to switch between providers without changing application code. Each LLM integration handles provider-specific authentication, request formatting, response parsing, and streaming behavior.

Integration PackageProviderKey Features
llama-index-llms-contextualContextualContextual LLM wrapper
llama-index-llms-konkoKonkoSupports both Konko and OpenAI models
llama-index-llms-lmstudioLM StudioLocal server integration
llama-index-llms-monsterapiMonsterAPIPrivate deployments and GA models
llama-index-llms-modelscopeModelScopeQwen and other ModelScope models
llama-index-llms-langchainLangChainLangChain LLM wrapper
llama-index-llms-optimum-intelIntel OptimumCPU-optimized inference

Sources: llama-index-integrations/llms/llama-index-llms-contextual/README.md

Reader Integrations

Reader integrations enable data ingestion from various document sources and web content. These loaders transform external data formats into LlamaIndex's internal Document schema, providing a unified representation regardless of the source type.

Reader TypeSource FormatPackage
Document ReadersPDF, DOCX, HTMLllama-index-readers-docling
Web ReadersURLs, Articlesllama-index-readers-web
WikipediaWikipedia pagesllama-index-readers-wikipedia
Remote ContentDeep link crawlingllama-index-readers-remote-depth
Cloud StorageBox filesllama-index-readers-box
PreprocessedChunks from Preprocess APIllama-index-readers-preprocess

Sources: llama-index-integrations/readers/llama-index-readers-wikipedia/README.md

Embedding Integrations

Embedding integrations provide vectorization capabilities through external embedding models. These components convert text into dense vector representations suitable for semantic search and similarity operations.

ProviderModel ExamplesPackage
Ollamanomic-embed-text, embeddinggemma, mxbai-embed-largellama-index-embeddings-ollama

Sources: llama-index-integrations/embeddings/llama-index-embeddings-ollama/README.md

Index Integrations

Index integrations connect to managed vector search services, providing fully-hosted indexing and retrieval capabilities. These integrations abstract the complexity of distributed vector databases behind LlamaIndex's retriever interface.

Managed ServicePackageFeatures
Vectarallama-index-indices-managed-vectaraRAG pipeline, retriever, query engine

Sources: llama-index-integrations/indices/llama-index-indices-managed-vectara/README.md

Tool Integrations

Tool integrations extend LlamaIndex's agent capabilities by providing access to external services that can be invoked during agent execution.

Tool ProviderFeaturesPackage
MossHybrid search (keyword + semantic)llama-index-tools-moss

Callback Integrations

Callback integrations enable observability and feedback collection by integrating with external monitoring and evaluation platforms.

PlatformPurposePackage
ArgillaFeedback loop, LLM monitoringllama-index-callbacks-argilla

Sources: llama-index-integrations/callbacks/llama-index-callbacks-argilla/README.md

System Architecture

The integration architecture follows a layered approach where core abstractions define the contracts, and integration packages provide concrete implementations. This design enables horizontal scalability of integrations while maintaining vertical consistency with the core framework.

graph TD
    A[Application Layer] --> B[Core LlamaIndex]
    B --> C[Interface Abstractions]
    C --> D[LLM Abstraction]
    C --> E[Reader Abstraction]
    C --> F[Embedding Abstraction]
    C --> G[Retriever Abstraction]
    D --> H[LLM Integrations]
    E --> I[Reader Integrations]
    F --> J[Embedding Integrations]
    G --> K[Index Integrations]
    H --> L[Konko, LMStudio, MonsterAPI, etc.]
    I --> M[Docling, Wikipedia, Web, Box, etc.]
    J --> N[Ollama Embeddings]
    K --> O[Vectara]

Common Integration Patterns

LLM Integration Pattern

LLM integrations follow a consistent initialization pattern that accepts provider-specific configuration parameters. The typical constructor accepts a model identifier, base URL for API endpoints, and optional generation parameters such as temperature and maximum tokens.

from llama_index.llms.provider_name import ProviderLLM

llm = ProviderLLM(
    model="model-identifier",
    api_key="your-api-key",
    temperature=0.7,
    max_tokens=256
)

Sources: llama-index-integrations/llms/llama-index-llms-konko/README.md

Reader Integration Pattern

Reader integrations follow a loader pattern where initialization may require credentials, and the load_data method accepts source-specific parameters such as URLs, file paths, or query filters.

from llama_index.readers.source_type import SourceReader

reader = SourceReader(credentials="your-credentials")
documents = reader.load_data(source="document-source")

Sources: llama-index-integrations/readers/llama-index-readers-remote-depth/README.md

Data Flow Architecture

The integration architecture enables a complete RAG pipeline where each component plays a specific role in transforming input data into actionable insights.

graph LR
    A[Document Sources] --> B[Readers]
    B --> C[Documents]
    C --> D[Node Parsers]
    D --> E[Nodes]
    E --> F[Vector Index]
    E --> G[Storage Context]
    F --> H[Retriever]
    G --> H
    H --> I[Query Engine]
    I --> J[LLM]
    J --> K[Response]

Installation and Dependency Management

Each integration package follows the naming convention llama-index-{category}-{provider} and can be installed independently via pip. This modular approach minimizes dependency overhead by allowing users to install only the packages required for their specific use case.

CategoryPackage Naming PatternInstallation Command
LLMllama-index-llms-{provider}pip install llama-index-llms-{provider}
Readerllama-index-readers-{source}pip install llama-index-readers-{source}
Embeddingllama-index-embeddings-{provider}pip install llama-index-embeddings-{provider}
Indexllama-index-indices-{type}-{provider}pip install llama-index-indices-{type}-{provider}
Toolllama-index-tools-{provider}pip install llama-index-tools-{provider}
Callbackllama-index-callbacks-{platform}pip install llama-index-callbacks-{platform}

Configuration Management

Integrations typically support configuration through both constructor parameters and environment variables. This dual approach accommodates both explicit configuration in code and secret management through environment-based configuration.

Environment Variable Pattern

Many integrations follow a pattern where API keys can be set as environment variables for security and convenience:

export PROVIDER_API_KEY="your-api-key"
export OPENAI_API_KEY="your-openai-key"

Constructor Parameter Pattern

Alternatively, credentials can be passed directly to the integration constructor:

llm = ProviderLLM(
    model="model-name",
    api_key="explicit-api-key",
    base_url="https://api.provider.com"
)

Sources: llama-index-integrations/llms/llama-index-llms-lmstudio/README.md

Extending the Architecture

The integration architecture is designed for extensibility. New integrations can be created by implementing the appropriate abstract base classes defined in llama_index.core. Each integration category has its own interface specification that ensures consistency across implementations.

Creating a New LLM Integration

To create a new LLM integration, implement the following interface contract:

  1. Inherit from the base LLM class
  2. Implement complete(), chat(), and streaming methods
  3. Handle provider-specific authentication and error handling
  4. Follow the naming convention for the package

Creating a New Reader Integration

To create a new reader integration:

  1. Implement a loader class with load_data() method
  2. Transform source data into Document objects
  3. Handle pagination, filtering, and error cases appropriately
  4. Document supported source formats and parameters

Integration Testing Considerations

Each integration package maintains its own test suite to verify compatibility with the external service. Integration tests typically require actual API credentials and network access, distinguishing them from unit tests that mock external dependencies.

Best Practices

When working with LlamaIndex integrations, consider the following best practices:

  1. Dependency Isolation: Install only required integration packages to minimize potential conflicts
  2. Credential Management: Use environment variables for sensitive credentials in production
  3. Error Handling: Implement appropriate retry logic and fallback strategies for external service calls
  4. Resource Management: Close connections and release resources properly when using streaming responses
  5. Version Compatibility: Check integration package versions against the core LlamaIndex version for compatibility

Deprecated Integrations

Some integration packages may be discontinued over time as external services evolve or change their offerings. When an integration is deprecated, it will receive no further updates or support. Users should migrate to alternative solutions before removing deprecated packages from their projects.

Sources: llama-index-integrations/readers/llama-index-readers-preprocess/README.md

Conclusion

The integration architecture provides a flexible, extensible framework for connecting LlamaIndex with the broader ecosystem of LLM providers, data sources, and tools. By maintaining standardized interfaces while allowing provider-specific implementations, the architecture enables developers to build sophisticated RAG applications without being locked into a single vendor or service. The modular design supports incremental adoption, allowing teams to integrate new capabilities as their requirements evolve.

Sources: [llama-index-integrations/llms/llama-index-llms-contextual/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-contextual/README.md)

Documents and Nodes

Related topics: Storage Systems, Query Engines

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Purpose and Scope

Continue reading this section for the full explanation and source context.

Section Core Document Schema

Continue reading this section for the full explanation and source context.

Section Document Construction

Continue reading this section for the full explanation and source context.

Related topics: Storage Systems, Query Engines

Documents and Nodes

Overview

In LlamaIndex, Documents and Nodes are the fundamental data structures that represent information to be indexed, searched, and retrieved. Documents serve as the primary unit of input data, while Nodes are the granular chunks created during document processing for optimal embedding and retrieval.

Document Model

Purpose and Scope

A Document in LlamaIndex represents a single unit of data to be indexed. It encapsulates the content along with associated metadata that provides context about the source, type, and additional information useful for retrieval and processing.

Core Document Schema

The Document model is defined in llama-index-core/llama_index/core/schema.py and includes the following key attributes:

AttributeTypeDescription
textstrThe main text content of the document
id_strUnique identifier for the document
metadataDict[str, Any]Additional metadata about the document
mimetypestrMIME type of the document content
relationshipsDict[str, RelationshipType]Relationships to other nodes/documents

Document Construction

Documents can be created with varying levels of detail:

from llama_index.core import Document

# Basic document
doc = Document(text="Your content here")

# Document with metadata
doc = Document(
    text="Your content here",
    metadata={
        "source": "review.txt",
        "author": "John Doe",
        "date": "2024-01-15"
    }
)

Node Model

Purpose and Scope

Nodes are the result of parsing and chunking Documents into smaller, semantically coherent pieces. Each Node inherits document-like properties but adds relationship information linking back to its parent Document and sibling Nodes.

Node Structure

Nodes extend the Document schema with additional attributes defined in llama-index-core/llama_index/core/schema.py:

AttributeTypeDescription
node_idstrUnique identifier for the node
start_char_idxintStarting character index in parent document
end_char_idxintEnding character index in parent document
text_templatestrTemplate for rendering the node text
relationshipsDict[RelationshipType, RelatedNodeType]Relationships including PARENT, PREVIOUS, NEXT

Architecture Diagram

graph TD
    A[Raw Input Data] --> B[Document]
    B --> C[Node Parser]
    C --> D[Nodes]
    D --> E[Embedding Model]
    E --> F[Vector Index]
    
    G[Metadata] --> B
    H[Relationships] --> D
    
    B -->|PARENT| D
    D -->|CHILD| B

Readers and Loading

Base Reader Interface

Readers are responsible for loading data from various sources and converting them into Documents. The base reader interface is defined in llama-index-core/llama_index/core/readers/base.py.

MethodDescription
load_data()Load documents from a data source
lazy_load_data()Lazily load documents for memory efficiency

Supported Reader Types

LlamaIndex provides numerous reader integrations for different data sources:

CategoryReaderDescription
DocumentDocling ReaderPDF, DOCX, HTML extraction to Markdown or JSON
DocumentMarkItDown ReaderConverts various formats to Markdown
DocumentDocugami LoaderXML knowledge graph from PDF/DOCX
WebNewsArticleReaderParses news article URLs
WebUnstructuredURLLoaderURL text extraction via Unstructured.io
WebTrafilaturaWebReaderWeb scraping with trafilatura
WebMainContentExtractorReaderMain content extraction from websites
WebReadabilityWebPageReaderReadability-based web extraction
WebRemoteDepthReaderRecursive URL loading with depth control
WebWholeSiteReaderFull site scraping with prefix/depth
AcademicSemanticScholarReaderScholarly articles and papers
DatabaseChroma ReaderLoading from Chroma vector store

Usage Example

from llama_index.readers.docling import DoclingReader

reader = DoclingReader()
docs = reader.load_data(file_path="document.pdf")

Node Parsers

Purpose and Scope

Node Parsers transform Documents into Nodes by splitting content based on semantic boundaries. The interface is defined in llama-index-core/llama_index/core/node_parser/interface.py.

Core Interface Methods

MethodDescription
get_nodes_from_documents()Parse documents into nodes
get_batch_nodes()Process documents in batches

Sentence Splitter Parser

The sentence-based node parser in llama-index-core/llama_index/core/node_parser/text/sentence.py provides configurable text chunking:

ParameterTypeDefaultDescription
separatorstr"\n\n"Chunk separator
chunk_sizeint1024Maximum characters per chunk
chunk_overlapint0Overlap between chunks
chunking_tokenizercallableNoneCustom tokenizer function
callback_managerCallbackManagerNoneEvent callbacks

Docling Node Parser

The Docling Node Parser (llama-index-integrations/node_parser/llama-index-node-parser-docling/README.md) parses Docling JSON output into LlamaIndex nodes with rich metadata:

from llama_index.node_parser.docling import DoclingNodeParser

node_parser = DoclingNodeParser()
nodes = node_parser.get_nodes_from_documents(documents=docs)

Document-Node Relationships

Relationship Types

Nodes maintain typed relationships to other components:

RelationshipDescription
PARENTLink to parent Document
CHILDLink to child elements
PREVIOUSPrevious sibling Node
NEXTNext sibling Node
SOURCESource Document reference

Metadata Preservation

Nodes automatically inherit and extend document metadata:

# Node metadata includes provenance information
{
    'doc_items': [{'self_ref': '#/main-text/21'}],
    'prov': [{'page_no': 2, 'bbox': {...}}],
    'headings': ['2 Getting Started']
}

Workflow

graph LR
    A[Load Data] --> B[Create Document]
    B --> C[Parse Document]
    C --> D[Generate Nodes]
    D --> E[Create Embeddings]
    E --> F[Build Index]
    
    A1[Readers] --> A
    C1[Node Parsers] --> C

Best Practices

Document Creation

  1. Always assign unique id_ attributes for tracking
  2. Include comprehensive metadata for filtering
  3. Specify mimetype when content type matters

Node Parsing

  1. Choose appropriate chunk_size for your embedding model
  2. Configure chunk_overlap for context continuity
  3. Use semantic-aware parsers (Docling) for complex documents

Memory Management

  1. Use lazy_load_data() for large document collections
  2. Consider batch processing for node parsing
  3. Leverage streaming for very large files
IntegrationUse Case
VectaraIndexManaged semantic search
ChromaReaderVector database loading
AlibabaCloud AISearchCloud-based document parsing
Ollama EmbeddingsLocal embedding generation

Summary

Documents serve as the primary data ingestion point in LlamaIndex, encapsulating raw content and metadata from various sources. Nodes are the processed, chunked representations optimized for embedding generation and retrieval. Together with Readers and Node Parsers, they form the foundation of the LlamaIndex data pipeline.

Source: https://github.com/run-llama/llama_index / Human Manual

Storage Systems

Related topics: Documents and Nodes, Retrieval and Reranking

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Initialization

Continue reading this section for the full explanation and source context.

Section Configuration Parameters

Continue reading this section for the full explanation and source context.

Section Persistence Methods

Continue reading this section for the full explanation and source context.

Related topics: Documents and Nodes, Retrieval and Reranking

Storage Systems

Overview

LlamaIndex provides a comprehensive storage system that allows users to persist indexes, documents, and chat histories to disk for later retrieval and reuse. The storage architecture is built around the StorageContext class, which serves as the central coordinator for managing various storage backends including document stores, index stores, and chat stores.

The storage system enables:

  • Persistence: Save index data to disk for long-term storage
  • Retrieval: Reload previously persisted indexes without recomputation
  • In-memory fallback: Default in-memory storage when persistence is not configured
  • Customizable backends: Pluggable storage implementations for different use cases

Architecture

graph TD
    A[StorageContext] --> B[VectorStore]
    A --> C[DocStore]
    A --> D[IndexStore]
    A --> E[ChatStore]
    A --> F[ImageStore]
    A --> G[GraphStore]
    
    C --> H[SimpleDocStore]
    C --> I[MongoDocStore]
    C --> J[KVDocStore]
    
    D --> K[SimpleIndexStore]
    D --> L[MongoIndexStore]
    D --> M[KVIndexStore]
    
    E --> N[SimpleChatStore]
    E --> O[MongoChatStore]

StorageContext

The StorageContext class is the main entry point for configuring storage in LlamaIndex. It aggregates all storage components and provides methods for persistence and retrieval.

Initialization

from llama_index.core import StorageContext, load_index_from_storage

# Create with default in-memory stores
storage_context = StorageContext.from_defaults()

# Create with persistence to disk
storage_context = StorageContext.from_defaults(persist_dir="./storage")

# Load existing index from disk
index = load_index_from_storage(storage_context=storage_context)

Configuration Parameters

ParameterTypeDefaultDescription
persist_dirstrNoneDirectory path for persistence
vector_storeBaseVectorStoreInMemoryVectorStoreVector storage backend
docstoreBaseDocstoreSimpleDocumentStoreDocument storage backend
index_storeBaseIndexStoreSimpleIndexStoreIndex metadata storage
graph_storeBaseGraphStoreNoneKnowledge graph storage
chat_storeBaseChatStoreSimpleChatStoreChat history storage
image_storeBaseImageStoreNoneImage storage backend

Persistence Methods

MethodDescription
persist(persist_dir, ...)Save all storage components to disk
from_defaults(**kwargs)Create context with default or specified settings
load_index_from_storage()Class method to load index from persisted storage

Document Store

The document store manages the storage and retrieval of BaseDocument objects. LlamaIndex provides several document store implementations.

SimpleDocumentStore

The default in-memory document store backed by SQLite for persistence.

from llama_index.core.storage.docstore import SimpleDocumentStore

docstore = SimpleDocumentStore(
    persist_path="./docstore.json",
    redis_host="localhost",
    redis_port=6379,
    redis_password=None
)

Document Store API

MethodDescription
add_documents(documents, batch_size)Add documents to the store
get_document(doc_id)Retrieve a document by ID
delete(doc_id)Remove a document by ID
get_nodes(node_ids)Retrieve nodes by their IDs
get_all_nodes()Retrieve all nodes from the store
persist(persist_path)Persist the document store to disk

Data Model

Documents are stored with the following structure:

class BaseDocument:
    id_: str              # Unique identifier
    embedding: List[float]  # Vector embedding
    metadata: Dict[str, Any]  # User-defined metadata
    text: str             # Document text content
    excluded_embed_metadata_keys: List[str]
    excluded_llm_metadata_keys: List[str]
    relationships: Dict[DocumentRelationship, str]
    hash: str             # Computed hash for caching
    __class__: type       # Document type (optional)

Index Store

The index store manages index metadata and structure, enabling efficient retrieval of index components.

SimpleIndexStore

The default index store implementation using JSON file storage.

from llama_index.core.storage.index_store import SimpleIndexStore

index_store = SimpleIndexStore(
    persist_path="./index_store.json"
)

Index Store API

MethodDescription
add_index_struct(index_struct)Store an index structure
get_index_struct(struct_id)Retrieve index structure by ID
get_index_structs()List all stored index structures
delete_index_struct(struct_id)Remove an index structure

Supported Index Types

Index TypeDescription
VectorStoreIndexDense vector-based retrieval
SummaryIndexSummary-based indexing
KeywordTableIndexKeyword-based retrieval
KnowledgeGraphIndexGraph-based knowledge indexing

Chat Store

The chat store manages conversation history for multi-turn interactions with language models.

SimpleChatStore

A persistent chat store implementation for storing and retrieving chat messages.

from llama_index.core.storage.chat_store import SimpleChatStore

chat_store = SimpleChatStore(
    persist_path="./chat_store.json"
)

Chat Store API

MethodDescription
add_message(chat_id, message, role)Append a message to a chat session
get_messages(chat_id)Retrieve all messages for a chat
get_chat(chat_id)Get chat session details
delete_chat(chat_id)Remove a chat session
persist(persist_path)Save chat history to disk

Message Structure

FieldTypeDescription
rolestrMessage role (user/assistant/system)
contentstrMessage text content
additional_kwargsDictExtra metadata for the message

Storage Workflow

graph LR
    A[Create Documents] --> B[Initialize StorageContext]
    B --> C{Configure Backends}
    C --> D[In-Memory]
    C --> E[Persistent]
    D --> F[Build Index]
    E --> F
    F --> G[Index Created]
    G --> H[Persist to Disk]
    H --> I[StorageContext.persist]
    
    J[Load Index] --> K[load_index_from_storage]
    K --> L[Index Ready]

Usage Examples

Basic Persistence

from llama_index.core import VectorStoreIndex, StorageContext

# Create documents
documents = [...]

# Create index with storage context
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = VectorStoreIndex.from_documents(
    documents, 
    storage_context=storage_context
)

# Explicitly persist (optional - also happens on garbage collection)
index.storage_context.persist()

Loading Persisted Index

from llama_index.core import StorageContext, load_index_from_storage

# Rebuild storage context from disk
storage_context = StorageContext.from_defaults(persist_dir="./storage")

# Load existing index
index = load_index_from_storage(storage_context=storage_context)

# Query the loaded index
query_engine = index.as_query_engine()
response = query_engine.query("Your question here")

Custom Storage Configuration

from llama_index.core import StorageContext
from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.core.storage.index_store import SimpleIndexStore

# Create custom stores
docstore = SimpleDocumentStore(persist_path="./custom_docstore.json")
index_store = SimpleIndexStore(persist_path="./custom_index_store.json")

# Configure storage context with custom stores
storage_context = StorageContext(
    docstore=docstore,
    index_store=index_store,
    persist_dir="./custom_storage"
)

# Use with index
index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context
)

Storage Backend Comparison

BackendPersistencePerformanceScalabilityUse Case
SimpleDocumentStoreJSON/SQLiteMediumLow-MediumDevelopment, small datasets
RedisDocumentStoreRedisHighHighProduction, distributed systems
MongoDocumentStoreMongoDBHighVery HighLarge-scale deployments
KVDocumentStoreKey-ValueHighMedium-HighGeneral purpose

Best Practices

``python Document(id_="unique_doc_1", text="content") ``

  1. Always specify unique document IDs: Prevents duplicate entries and enables predictable retrieval
  1. Configure persistence early: Set up storage context before building indexes to avoid data loss
  1. Use appropriate batch sizes: When adding many documents, use batch operations for better performance
  1. Handle persistence errors: Wrap persistence calls in try-except blocks for robustness
  1. Backup important data: Regularly backup persisted storage directories
  • Vector Stores: Manage embedding vectors for semantic search
  • Graph Stores: Handle knowledge graph data structures
  • Image Stores: Store image data for multimodal applications
  • Query Engines: Use storage to retrieve relevant documents for queries
  • Retrievers: Access stored data for retrieval-augmented generation

Source: https://github.com/run-llama/llama_index / Human Manual

Query Engines

Related topics: Retrieval and Reranking, Documents and Nodes

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Components

Continue reading this section for the full explanation and source context.

Section Initialization

Continue reading this section for the full explanation and source context.

Section Constructor Parameters

Continue reading this section for the full explanation and source context.

Related topics: Retrieval and Reranking, Documents and Nodes

Query Engines

Query Engines are the core components in LlamaIndex responsible for processing user queries and returning relevant responses by retrieving, synthesizing, and formatting information from indexed data.

Overview

Query Engines serve as the primary interface for querying indexed documents in LlamaIndex. They coordinate the retrieval of relevant context from the index and synthesize this information into coherent, helpful responses using Large Language Models (LLMs).

Key Responsibilities:

  • Receive user queries and transform them into retrieval operations
  • Coordinate with retrievers to fetch relevant documents or data chunks
  • Route queries to appropriate response synthesizers
  • Handle query-time configuration such as similarity thresholds and response modes

Sources: llama-index-core/llama_index/core/query_engine/__init__.py

Architecture

The query engine architecture follows a modular pipeline pattern where different components handle specific stages of query processing.

graph TD
    A[User Query] --> B[Query Engine]
    B --> C[Retriever]
    C --> D[Node Postprocessor]
    D --> E[Response Synthesizer]
    E --> F[LLM]
    F --> G[Response]
    
    H[Vector Store Index] --> C
    I[Summary Index] --> C
    J[Knowledge Graph Index] --> C

Core Components

ComponentPurposeLocation
BaseQueryEngineAbstract base class defining the query interfacellama_index.core.query_engine
RetrieverQueryEngineDefault query engine using retrieversretriever_query_engine.py
SubQuestionQueryEngineDecomposes complex queries into sub-questionssub_question_query_engine.py
ResponseSynthesizerGenerates responses from retrieved contextllama_index.core.response_synthesizers

Sources: llama-index-core/llama_index/core/query_engine/retriever_query_engine.py

RetrieverQueryEngine

The RetrieverQueryEngine is the default query engine implementation that combines retrieval with response synthesis.

Initialization

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

Constructor Parameters

ParameterTypeDefaultDescription
retrieverBaseRetrieverRequiredThe retriever used to fetch relevant nodes
response_synthesizerBaseSynthesizerNoneSynthesizer for generating responses
node_postprocessorsList[BaseNodePostprocessor][]Post-processors applied after retrieval
callback_managerCallbackManagerNoneManages callbacks for query events

Sources: llama-index-core/llama_index/core/query_engine/retriever_query_engine.py:40-60

Query Flow

sequenceDiagram
    participant User
    participant QueryEngine
    participant Retriever
    participant Postprocessor
    participant Synthesizer
    participant LLM
    
    User->>QueryEngine: query(question)
    QueryEngine->>Retriever: retrieve(query_str)
    Retriever-->>QueryEngine: nodes[]
    QueryEngine->>Postprocessor: postprocess(nodes)
    Postprocessor-->>QueryEngine: filtered_nodes[]
    QueryEngine->>Synthesizer: synthesize(query_str, nodes)
    Synthesizer->>LLM: generate(prompt)
    LLM-->>Synthesizer: response
    Synthesizer-->>QueryEngine: Response
    QueryEngine-->>User: Response

SubQuestionQueryEngine

The SubQuestionQueryEngine handles complex queries by decomposing them into simpler sub-questions that can be answered independently.

Use Cases

  • Queries requiring information from multiple data sources
  • Complex questions that benefit from step-by-step reasoning
  • Multi-hop questions requiring logical deduction

Configuration

from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.callbacks import CallbackManager

query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    callback_manager=CallbackManager([callback]),
    verbose=True
)

Parameters

ParameterTypeDefaultDescription
query_engine_toolsList[QueryEngineTool]RequiredList of query engines and their descriptions
response_synthesizerBaseSynthesizerNoneResponse synthesizer to use
sub_question_namestr"sub_question"Name for sub-question events
parent_namestr"parent_question"Name for parent question events
callback_managerCallbackManagerNoneCallback manager for events
verboseboolFalseEnable verbose output

Sources: llama-index-core/llama_index/core/query_engine/sub_question_query_engine.py:50-80

Response Synthesizers

Response Synthesizers transform retrieved context into natural language responses.

Available Synthesizer Types

SynthesizerDescriptionUse Case
CompactAndRefineCompacts retrieved context before generatingLarge retrieval results
TreeSummarizeHierarchically summarizes retrieved nodesComprehensive responses
SimpleSummarizeDirect concatenation and summarizationQuick, simple responses
RefineIteratively improves response qualityHigh-quality refinement
AccumulateCombines responses from multiple sourcesMulti-source queries
GenerationDirect LLM generation from contextSimple generation tasks

Base Interface

class BaseSynthesizer(ABC):
    @abstractmethod
    async def synthesize(
        self,
        query: QueryBundle,
        nodes: List[NodeWithScore],
        **kwargs: Any
    ) -> Response:
        pass

Sources: llama-index-core/llama_index/core/response_synthesizers/base.py:30-50

Vector Store Index Query Engine

The VectorStoreIndex provides built-in query engine creation through the as_query_engine() method.

Factory Method Parameters

index.as_query_engine(
    query_mode: str = "default",
    similarity_top_k: int = 10,
    vector_store_query_mode: str = "default",
    alpha: Optional[float] = None,
    **kwargs: Any
) -> BaseQueryEngine
ParameterTypeDefaultDescription
query_modestr"default"Query execution mode
similarity_top_kint10Number of top results to retrieve
vector_store_query_modestr"default"Vector store specific query mode
alphafloatNoneHybrid search weight (0-1, default 0.5)

Sources: llama-index-core/llama_index/core/indices/vector_store/base.py

Query Modes

ModeDescription
defaultStandard retrieval based on similarity
mmrMaximum Marginal Relevance for diverse results
hybridCombines sparse and dense retrieval

Query Engine Tool

For agent-based workflows, query engines can be wrapped as tools using the QueryEngineTool class.

from llama_index.core.tools import QueryEngineTool

tool = QueryEngineTool(
    query_engine=query_engine,
    metadata=ToolMetadata(
        name="website_index",
        description="Useful for answering questions about text on websites",
    )
)

Sources: llama-index-core/llama_index/core/query_engine/sub_question_query_engine.py:100-120

Advanced Configuration

Node Post-processors

Post-processors filter and enhance retrieved nodes before synthesis.

from llama_index.core.postprocessor import SimilarityPostprocessor

query_engine = index.as_query_engine(
    node_postprocessors=[
        SimilarityPostprocessor(similarity_cutoff=0.7)
    ]
)

Custom Query Engines

Create custom query engines by extending the base class:

from llama_index.core.query_engine import BaseQueryEngine

class CustomQueryEngine(BaseQueryEngine):
    def __init__(self, retriever, synthesizer):
        self._retriever = retriever
        self._synthesizer = synthesizer
    
    async def _aquery(self, query_bundle: QueryBundle) -> Response:
        nodes = await self._retriever.aretrieve(query_bundle)
        response = await self._synthesizer.synthesize(
            query_bundle, nodes
        )
        return response

Async Query Execution

Query engines support both sync and async execution patterns:

# Synchronous
response = query_engine.query("What is LlamaIndex?")

# Asynchronous
response = await query_engine.aquery("What is LlamaIndex?")

Integration with Vector Indices

Query engines integrate with various index types:

Index TypeDefault Query EngineFeatures
VectorStoreIndexRetrieverQueryEngineSemantic similarity search
SummaryIndexRetrieverQueryEngineFull document retrieval
KnowledgeGraphIndexRetrieverQueryEngineGraph-based traversal
ComposableGraphSubQuestionQueryEngineMulti-index queries

Best Practices

  1. Choose appropriate top_k: Balance between response quality and speed (typically 3-10 for most use cases)
  1. Use sub-question engine for complex queries: When queries require reasoning across multiple sources
  1. Configure similarity thresholds: Filter low-quality matches using post-processors
  1. Enable callbacks for debugging: Monitor query execution flow and performance
  1. Select appropriate synthesizers: Match the synthesizer type to your response quality requirements

Summary

Query Engines in LlamaIndex provide a flexible, extensible framework for retrieving and synthesizing information from indexed data. The modular architecture allows for customization at every stage of the query pipeline, from retrieval configuration to response generation.

Key Takeaways:

  • Query engines orchestrate the retrieval-synthesis pipeline
  • RetrieverQueryEngine handles standard query flows
  • SubQuestionQueryEngine decomposes complex queries
  • Response synthesizers generate final output from context
  • Extensive configuration options enable fine-tuned control

Sources: llama-index-core/llama_index/core/query_engine/retriever_query_engine.py Sources: llama-index-core/llama_index/core/query_engine/sub_question_query_engine.py Sources: llama-index-core/llama_index/core/response_synthesizers/base.py Sources: llama-index-core/llama_index/core/indices/vector_store/base.py

Sources: [llama-index-core/llama_index/core/query_engine/__init__.py]()

Retrieval and Reranking

Related topics: Query Engines, Storage Systems

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Retriever Abstraction

Continue reading this section for the full explanation and source context.

Section Recursive Retriever

Continue reading this section for the full explanation and source context.

Section Property Graph Retriever

Continue reading this section for the full explanation and source context.

Related topics: Query Engines, Storage Systems

Retrieval and Reranking

Overview

Retrieval and Reranking are fundamental components in LlamaIndex's architecture for building effective Retrieval-Augmented Generation (RAG) systems. The retrieval system identifies relevant context from various data sources, while the reranking system reorders retrieved results to optimize relevance using advanced techniques like LLM-based scoring.

In LlamaIndex, retrieval is handled through a flexible retriever abstraction that supports multiple retrieval strategies including vector-based search, keyword search, and hybrid approaches. Reranking serves as a post-processing step that improves result quality by reordering retrieved nodes based on more sophisticated relevance criteria.

Architecture Overview

graph TD
    A[Query Input] --> B[Retrieval Phase]
    B --> C[Vector/Knowledge Graph Retrieval]
    C --> D[Initial Node Set]
    D --> E[Reranking Phase]
    E --> F[LLM Reranker]
    F --> G[Reordered Results]
    G --> H[Response Generation]
    
    I[Document Sources] --> J[Indexing]
    J --> K[Vector Store / Graph Store]
    K --> C

Retrieval Components

Retriever Abstraction

LlamaIndex provides a base BaseRetriever class that defines the interface for all retrieval implementations. Retrievers work in conjunction with indices to fetch relevant nodes from vector stores or knowledge graphs.

Core Retriever Classes:

ComponentFile PathPurpose
BaseRetrieverllama-index-core/llama_index/core/retrievers/Abstract base for all retrievers
RecursiveRetrieverllama-index-core/llama_index/core/retrievers/recursive_retriever.pyMulti-level recursive retrieval
PropertyGraphRetrieverllama-index-core/llama_index/core/indices/property_graph/retriever.pyGraph-based retrieval

Recursive Retriever

The RecursiveRetriever enables multi-level, hierarchical retrieval across different data sources and node types. It supports recursive traversal of indices and can fetch related nodes across different retrieval strategies.

Key Features:

  • Recursive node resolution across index hierarchies
  • Support for multiple retriever types in a chain
  • Handling of nested document structures

Source: llama-index-core/llama_index/core/retrievers/recursive_retriever.py

Property Graph Retriever

The Property Graph Retriever leverages knowledge graphs for retrieval, enabling structured queries over entity-relationship data. This retriever is particularly effective for complex queries requiring relationship-aware context.

Capabilities:

  • Graph traversal-based retrieval
  • Entity filtering and relationship queries
  • Support for hybrid graph + vector search

Source: llama-index-core/llama_index/core/indices/property_graph/retriever.py:1-100

Reranking System

Purpose and Role

Reranking improves retrieval quality by reordering initially retrieved candidates using more sophisticated relevance models. After an initial retrieval pass identifies candidate nodes, rerankers evaluate and reorder these results to maximize relevance to the query.

LLM Reranker

The LLMRerank post-processor uses a Language Model to score and reorder retrieved nodes based on semantic relevance. This approach provides higher quality ranking compared to simple vector similarity.

Key Parameters:

ParameterTypeDefaultDescription
top_nintNoneNumber of top results to return after reranking
choice_batch_sizeint10Batch size for LLM ranking choices
llmBaseLLMNoneLanguage model for scoring
verboseboolFalseEnable verbose output

Source: llama-index-core/llama_index/core/postprocessor/llm_rerank.py

Node Post-Processors

The NodePostprocessor class provides additional filtering and transformation capabilities for retrieved nodes. These processors operate on the node level and can apply various transformations before final output.

Common Post-Processing Operations:

  • Duplicate removal
  • Similarity threshold filtering
  • Metadata-based filtering

Source: llama-index-core/llama_index/core/postprocessor/node.py

Data Flow

graph LR
    A[User Query] --> B[Vector Search]
    B --> C[Top-K Nodes]
    C --> D[Post-Processors]
    D --> E[LLM Reranker]
    E --> F[Reranked Nodes]
    F --> G[Context for LLM]
    
    H[Documents] --> I[Indexing Pipeline]
    I --> J[Embedding Model]
    J --> K[Vector Store]
    K --> B

Integration with Data Loaders

LlamaIndex's retrieval system integrates seamlessly with various data loaders that prepare documents for indexing and retrieval.

Supported Data Sources

ReaderUse CaseIntegration
DoclingReaderPDF, DOCX, HTMLllama-index-readers-docling
SimpleWebPageReaderStatic websitesllama-index-readers-web
RemoteDepthReaderMulti-level URL crawlingllama-index-readers-remote-depth
WikipediaReaderWikipedia articlesllama-index-readers-wikipedia
SemanticScholarReaderAcademic papersllama-index-readers-semanticscholar

Source: llama-index-integrations/readers/llama-index-readers-docling/README.md

Document Processing Pipeline

Documents loaded through readers undergo the following processing:

  1. Parsing - Extract text content from various formats (PDF, DOCX, HTML)
  2. Node Parsing - Split documents into semantic chunks (nodes)
  3. Embedding - Generate vector embeddings for each node
  4. Indexing - Store nodes and embeddings in appropriate stores
  5. Retrieval - Fetch relevant nodes based on queries

Source: llama-index-integrations/readers/llama-index-readers-web/llama_index/readers/web/simple_web/README.md

Usage Patterns

Basic Retrieval with Reranking

from llama_index.core import VectorStoreIndex
from llama_index.core.postprocessor import LLMRerank

# Load documents and create index
index = VectorStoreIndex.from_documents(documents)

# Configure reranking
reranker = LLMRerank(
    top_n=5,
    choice_batch_size=10
)

# Query with reranking
query_engine = index.as_query_engine(
    node_postprocessors=[reranker]
)

response = query_engine.query("Your question here")

Recursive Retrieval

from llama_index.core.retrievers import RecursiveRetriever

# Configure recursive retrieval across multiple levels
recursive_retriever = RecursiveRetriever(
    retriever_dict={
        "root": vector_retriever,
        "documents": document_retriever
    }
)

Configuration Options

Retrieval Configuration

OptionDescriptionApplies To
similarity_top_kNumber of initial candidatesVector retrieval
retrieval_modeVector, keyword, or hybridHybrid search
node_postprocessorsList of post-processing stepsAll retrievers

Reranking Configuration

OptionDescriptionDefault
top_nFinal number of results5
score_thresholdMinimum relevance scoreNone
modelReranking modelgpt-3.5-turbo

Advanced Topics

Hybrid Retrieval with Reranking

Combining vector and keyword search with LLM reranking provides robust retrieval across diverse query types:

  1. Vector Search - Captures semantic similarity
  2. Keyword Search - Captures exact term matching
  3. LLM Reranking - Optimizes final ordering

Custom Retrievers

Developers can create custom retrievers by extending BaseRetriever:

from llama_index.core.retrievers import BaseRetriever

class CustomRetriever(BaseRetriever):
    def _retrieve(self, query_bundle):
        # Custom retrieval logic
        pass

Summary

Retrieval and Reranking in LlamaIndex form a two-phase system where initial retrieval identifies candidate nodes and reranking optimizes their ordering. The architecture supports multiple retrieval strategies (vector, graph, recursive) and leverages LLM-based reranking for improved result quality. Integration with various data loaders enables seamless indexing from diverse sources, while the post-processor abstraction allows flexible pipeline customization.

Source: https://github.com/run-llama/llama_index / Human Manual

Agent Framework

Related topics: Memory Systems

Section Related Pages

Continue reading this section for the full explanation and source context.

Section ReAct Formatter

Continue reading this section for the full explanation and source context.

Section ReAct Output Parsing

Continue reading this section for the full explanation and source context.

Section Base Agent

Continue reading this section for the full explanation and source context.

Related topics: Memory Systems

Agent Framework

Overview

The LlamaIndex Agent Framework provides a flexible, extensible system for building AI agents that can reason, plan, and execute actions using tools. The framework enables the creation of both single-agent and multi-agent systems capable of interacting with external data sources, performing complex reasoning tasks, and orchestrating workflows.

Agents in LlamaIndex are designed to combine large language model (LLM) capabilities with structured tool usage, memory management, and workflow orchestration. The framework supports various agent types including ReAct (Reasoning + Acting) agents and workflow-based agents.

Sources: llama-index-core/llama_index/core/agent/workflow/base_agent.py:1-50

Architecture Overview

graph TD
    A[User Query] --> B[Agent]
    B --> C[Reasoning Engine]
    C --> D[Tool System]
    D --> E[External Tools]
    C --> F[Memory]
    B --> G[Workflow Orchestrator]
    G --> H[Sub-Agents]
    H --> D

The framework is built on several key components that work together to enable sophisticated agent behaviors:

ComponentPurpose
AgentCore entity that processes queries and generates responses
Reasoning EngineHandles thought processes and decision making
Tool SystemProvides access to external functions and APIs
MemoryStores conversation history and intermediate results
Workflow OrchestratorManages complex multi-step tasks

Sources: llama-index-core/llama_index/core/agent/workflow/base_agent.py:50-100

ReAct Agent

The ReAct (Synergizing Reasoning and Acting) agent implements a reasoning loop that combines thought processes with tool actions. This agent type is particularly effective for tasks requiring logical deduction and external information retrieval.

ReAct Formatter

The ReAct formatter is responsible for constructing prompts that guide the agent through the reasoning-action-observation cycle. It defines the structure of thoughts, actions, and observations in the agent's prompt.

graph LR
    A[Thought] --> B[Action]
    B --> C[Observation]
    C --> A

#### Key Components

ComponentDescription
system_promptInstructions for the agent's role and behavior
tool_promptDescription of available tools
formatterDefines the format for thoughts, actions, observations
examplesFew-shot examples for better performance

Sources: llama-index-core/llama_index/core/agent/react/formatter.py:1-80

ReAct Output Parsing

The ReAct agent uses specialized output parsers to extract structured information from LLM responses:

class ReActOutputParser:
    def parse(self, output: str) -> ActionOutput:
        # Parse thought, action, and action input from output
        pass

This parsing enables the agent to:

  1. Extract the reasoning thought process
  2. Identify the tool to invoke
  3. Extract the tool's input parameters
  4. Process the tool's output as an observation

Sources: llama-index-core/llama_index/core/agent/react/formatter.py:80-150

Workflow-Based Agents

Workflow-based agents provide a more structured approach to agent execution, using state machines and defined steps to process queries.

Base Agent

The BaseAgent class provides the foundation for all agent implementations in the workflow system:

graph TD
    A[Input] --> B[State Machine]
    B --> C{Step Execution}
    C -->|Step 1| D[Process Step]
    D --> E[Update State]
    E --> C
    C -->|Complete| F[Generate Response]

#### Base Agent API

MethodDescription
run()Execute the agent with input
reset()Reset agent state
get_state()Retrieve current agent state
set_state()Set agent state

Sources: llama-index-core/llama_index/core/agent/workflow/base_agent.py:100-200

Agent State Management

Agents maintain state throughout their execution, which includes:

State ComponentTypePurpose
inputstrOriginal user input
current_stepintCurrent execution step
memoryMemoryConversation history
contextdictAdditional context data
stepsList[Step]Executed steps
outputAnyFinal output

Sources: llama-index-core/llama_index/core/agent/workflow/base_agent.py:200-300

Tool System

The Tool System enables agents to interact with external resources and perform actions beyond text generation.

Function Tool

FunctionTool provides a decorator-based interface for creating tools from Python functions:

from llama_index.core.tools import FunctionTool

@FunctionTool.from_defaults
def search_database(query: str) -> str:
    """Search the knowledge base for relevant information."""
    # Implementation here
    return results

#### FunctionTool Parameters

ParameterTypeDefaultDescription
fnCallableRequiredThe function to wrap
namestrFunction nameTool identifier
descriptionstrFunction docstringTool description for LLM
fn_schemaBaseModelAuto-generatedInput schema
return_directboolFalseReturn raw output

Sources: llama-index-core/llama_index/core/tools/function_tool.py:1-100

Tool Execution Flow

sequenceDiagram
    participant Agent
    participant ToolRegistry
    participant FunctionTool
    participant External

    Agent->>ToolRegistry: Request tool by name
    ToolRegistry->>FunctionTool: Get tool instance
    FunctionTool->>External: Execute function
    External-->>FunctionTool: Return result
    FunctionTool-->>Agent: Format response

Creating Custom Tools

Tools can be created using the @FunctionTool.from_defaults decorator:

@FunctionTool.from_defaults(name="calculator", description="Perform mathematical calculations")
def calculate(expression: str) -> str:
    """Evaluate a mathematical expression."""
    return str(eval(expression))

Or programmatically:

from llama_index.core.tools import FunctionTool

def my_function(arg1: str, arg2: int) -> str:
    return f"{arg1} repeated {arg2} times"

tool = FunctionTool.from_defaults(
    fn=my_function,
    name="my_tool",
    description="Custom tool description"
)

Sources: llama-index-core/llama_index/core/tools/function_tool.py:100-200

Multi-Agent Workflows

Multi-agent systems enable complex task decomposition where different specialized agents collaborate to solve problems.

Multi-Agent Workflow Architecture

graph TD
    A[Coordinator Agent] --> B[Specialist Agent 1]
    A --> C[Specialist Agent 2]
    A --> D[Specialist Agent N]
    B --> E[Tool 1]
    C --> F[Tool 2]
    D --> G[Tool N]
    B --> A
    C --> A
    D --> A

Workflow Communication

Agents communicate through a shared state and message-passing mechanism:

Message TypeDirectionPurpose
taskCoordinator → SpecialistAssign task
resultSpecialist → CoordinatorReturn results
queryAny → AnyRequest information
responseAny → AnyProvide information

Sources: llama-index-core/llama_index/core/agent/workflow/multi_agent_workflow.py:1-100

Creating Multi-Agent Systems

from llama_index.core.agent.workflow import MultiAgentWorkflow

# Create specialized agents
research_agent = ReActAgent.from_tools(tools=[search_tool], name="researcher")
analysis_agent = ReActAgent.from_tools(tools=[analysis_tool], name="analyst")

# Create multi-agent workflow
workflow = MultiAgentWorkflow(agents=[research_agent, analysis_agent])

# Execute workflow
result = workflow.run(user_input="Analyze the latest research on AI")

Sources: llama-index-core/llama_index/core/agent/workflow/multi_agent_workflow.py:100-200

Tool Integration with LlamaIndex Readers

The Agent Framework integrates seamlessly with LlamaIndex's document readers, enabling agents to query and reason over loaded documents:

from llama_index.core import VectorStoreIndex
from llama_index.core.agent import ReActAgent

# Load documents
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)

# Create query engine tool
query_tool = index.as_query_engine()

# Create agent with query tool
agent = ReActAgent.from_tools(tools=[query_tool])
response = agent.chat("What is the main topic of these documents?")

This integration allows agents to:

  • Query vector databases
  • Retrieve relevant context
  • Synthesize information from multiple sources
  • Perform RAG (Retrieval-Augmented Generation)

Best Practices

Designing Effective Tools

GuidelineRationale
Clear descriptionsHelps LLM understand when to use the tool
Structured outputsEasier for agent to parse and use results
Error handlingPrevents agent crashes from tool failures
Idempotent operationsEnables safe retries

Agent Configuration

ParameterRecommendation
max_iterationsSet based on task complexity (default: 10)
timeoutAllow sufficient time for tool execution
memory_typeUse conversation memory for multi-turn interactions
tool_retrieverImplement for large tool collections

Debugging Agents

  1. Enable verbose mode to see agent's reasoning traces
  2. Log tool inputs/outputs to verify correct tool usage
  3. Test tools independently before combining with agent
  4. Monitor token usage to prevent excessive spending

See Also

Sources: [llama-index-core/llama_index/core/agent/workflow/base_agent.py:1-50]()

Memory Systems

Related topics: Agent Framework, Storage Systems

Section Related Pages

Continue reading this section for the full explanation and source context.

Section ChatMemoryBuffer

Continue reading this section for the full explanation and source context.

Section ChatSummaryMemoryBuffer

Continue reading this section for the full explanation and source context.

Section VectorMemory

Continue reading this section for the full explanation and source context.

Related topics: Agent Framework, Storage Systems

Memory Systems

Memory Systems in LlamaIndex provide persistent conversation history management for chat engines and agents. They enable AI applications to maintain context across multiple interactions, store user preferences, and retrieve relevant historical information during conversations.

Architecture Overview

Memory Systems follow a modular architecture that allows different memory implementations to be composed and used interchangeably. The core memory system supports multiple storage strategies including buffer-based, summary-based, and vector-based retrieval.

graph TD
    A[Chat Engine / Agent] --> B[Memory System]
    B --> C[ChatMemoryBuffer]
    B --> D[ChatSummaryMemoryBuffer]
    B --> E[VectorMemory]
    B --> F[Mem0Memory]
    C --> G[SimpleComposableMemory]
    D --> G
    E --> G
    F --> H[External Memory Services]
    
    G --> I[Storage Backend]
    H --> J[Mem0 Platform API]

Core Memory Components

ChatMemoryBuffer

ChatMemoryBuffer is the foundational memory component that stores conversation history in a simple buffer structure. It maintains a list of chat messages and provides methods for adding, retrieving, and managing conversation context.

ParameterTypeDescription
chat_historyList[ChatMessage]List of conversation messages
sizeintMaximum number of messages to retain
tokenizerCallableFunction to count tokens

Sources: llama-index-core/llama_index/core/memory/chat_memory_buffer.py

ChatSummaryMemoryBuffer

ChatSummaryMemoryBuffer extends the basic buffer with summarization capabilities. When the conversation exceeds the configured size, older messages are condensed into a summary rather than being discarded entirely.

ParameterTypeDescription
llmLLMLLM instance for generating summaries
chat_historyList[ChatMessage]Initial conversation history
sizeintMaximum buffer size before summarization
summary_existsboolFlag indicating if summary is generated

Sources: llama-index-core/llama_index/core/memory/chat_summary_memory_buffer.py

VectorMemory

VectorMemory uses vector embeddings to store and retrieve conversation history. This enables semantic search within the conversation history, allowing the system to find relevant past messages based on meaning rather than exact matches.

ParameterTypeDescription
vector_storeVectorStoreStorage backend for embeddings
embed_modelEmbeddingModelModel for generating embeddings
indexVectorStoreIndexIndex for efficient retrieval
retrieverBaseRetrieverRetrieval mechanism

Sources: llama-index-core/llama_index/core/memory/vector_memory.py

SimpleComposableMemory

SimpleComposableMemory provides a framework for combining multiple memory types into a unified interface. This allows different memory strategies to work together, leveraging the strengths of each approach.

FeatureDescription
Memory CompositionCombine buffer, summary, and vector memories
Unified InterfaceSingle API for all memory operations
Flexible RetrievalQuery multiple memory sources simultaneously

Sources: llama-index-core/llama_index/core/memory/simple_composable_memory.py

Mem0 Memory Integration

The Mem0Memory integration provides access to the Mem0 Platform for advanced memory management. Mem0 offers enhanced capabilities for semantic memory storage, user preference tracking, and cross-session persistence.

Configuration Options

#### Client-Based Initialization

from llama_index.memory.mem0 import Mem0Memory

context = {"user_id": "user_1"}
memory = Mem0Memory.from_client(
    context=context,
    api_key="<your-mem0-api-key>",
    search_msg_limit=4,
)

#### Config Dictionary Initialization

memory = Mem0Memory.from_config(
    context=context,
    config={
        "provider": "openai",
        "config": {"model": "text-embedding-3-small"},
        "version": "v1.1",
    },
    search_msg_limit=4,
)

Context Parameters

The Mem0 context identifies the entity for which memory is stored:

ParameterDescription
user_idUnique identifier for the user
agent_idUnique identifier for the agent
run_idUnique identifier for the conversation run

Sources: llama-index-integrations/memory/llama-index-memory-mem0/README.md

Usage Patterns

Integration with SimpleChatEngine

from llama_index.core import SimpleChatEngine
from llama_index.memory.mem0 import Mem0Memory

memory = Mem0Memory.from_client(
    context={"user_id": "user_1"},
    api_key="<your-api-key>",
)

chat_engine = SimpleChatEngine.from_defaults(
    llm=llm,
    memory=memory
)

response = chat_engine.chat("Hi, My name is Mayank")

Integration with FunctionAgent

from llama_index.core.tools import FunctionTool
from llama_index.memory.mem0 import Mem0Memory

memory = Mem0Memory.from_client(
    context={"user_id": "user_1"},
    api_key="<your-api-key>",
)

# Use memory with agent for persistent context
agent = FunctionAgent(
    llm=llm,
    tools=[call_tool, email_tool],
    memory=memory
)

Sources: llama-index-integrations/memory/llama-index-memory-mem0/README.md

Memory Workflow

sequenceDiagram
    participant User
    participant ChatEngine
    participant Memory
    participant Storage
    
    User->>ChatEngine: Send message
    ChatEngine->>Memory: Get context (search_msg_limit messages)
    Memory->>Storage: Query recent messages
    Storage-->>Memory: Return relevant messages
    Memory-->>ChatEngine: Context messages
    ChatEngine->>ChatEngine: Generate response
    ChatEngine->>Memory: Store new message
    Memory->>Storage: Persist message
    ChatEngine-->>User: Return response

Comparison of Memory Types

Memory TypeStorage MethodUse CaseScalability
ChatMemoryBufferList/BufferShort conversationsLimited by token size
ChatSummaryMemoryBufferCondensed summariesLong conversationsBetter for extended chats
VectorMemoryEmbeddingsSemantic searchScales with vector store
Mem0MemoryExternal APIProduction applicationsCloud-native scaling

Environment Configuration

For Mem0 integration, set the API key as an environment variable:

export MEM0_API_KEY="<your-mem0-api-key>"

For LLM integration within memory operations:

export OPENAI_API_KEY="<your-openai-api-key>"

Sources: llama-index-integrations/memory/llama-index-memory-mem0/README.md

Best Practices

  1. Choose Appropriate Memory Type: Select based on conversation length and retrieval needs
  2. Configure Token Limits: Set appropriate search_msg_limit to balance context and performance
  3. Use Context Parameters: Always provide user_id, agent_id, or run_id for proper memory isolation
  4. Consider Composability: Use SimpleComposableMemory for complex memory requirements
  5. Monitor API Costs: When using Mem0, track API usage for cost optimization

Source: https://github.com/run-llama/llama_index / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

medium README/documentation is current enough for a first validation pass.

The project should not be treated as fully validated until this signal is reviewed.

medium Maintainer activity is unknown

Users cannot judge support quality until recent activity, releases, and issue response are checked.

medium no_demo

The project may affect permissions, credentials, data exposure, or host boundaries.

medium no_demo

The project may affect permissions, credentials, data exposure, or host boundaries.

Doramagic Pitfall Log

Doramagic extracted 6 source-linked risk signals. Review them before installing or handing real data to the project.

1. Capability assumption: README/documentation is current enough for a first validation pass.

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: The project should not be treated as fully validated until this signal is reviewed.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: capability.assumptions | github_repo:560704231 | https://github.com/run-llama/llama_index | README/documentation is current enough for a first validation pass.

2. Maintenance risk: Maintainer activity is unknown

  • Severity: medium
  • Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:560704231 | https://github.com/run-llama/llama_index | last_activity_observed missing

3. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: downstream_validation.risk_items | github_repo:560704231 | https://github.com/run-llama/llama_index | no_demo; severity=medium

4. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: risks.scoring_risks | github_repo:560704231 | https://github.com/run-llama/llama_index | no_demo; severity=medium

5. Maintenance risk: issue_or_pr_quality=unknown

  • Severity: low
  • Finding: issue_or_pr_quality=unknown。
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:560704231 | https://github.com/run-llama/llama_index | issue_or_pr_quality=unknown

6. Maintenance risk: release_recency=unknown

  • Severity: low
  • Finding: release_recency=unknown。
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:560704231 | https://github.com/run-llama/llama_index | release_recency=unknown

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using llama_index with real data or production workflows.

  • [[Feature Request]: Built-in LLM Failover for Reliability](https://github.com/run-llama/llama_index/issues/19631) - github / github_issue
  • [[Feature Request]: add (detailed) usage info to raw when using Structure](https://github.com/run-llama/llama_index/issues/19845) - github / github_issue
  • [[Bug]: thinking_delta not populated on AgentStream events when thinkin](https://github.com/run-llama/llama_index/issues/20349) - github / github_issue
  • [[Bug]: [llama-index-core] async_acquire() in TokenBucketRateLimiter and](https://github.com/run-llama/llama_index/issues/21603) - github / github_issue
  • [[Question]: how to add human-in-the-loop capability to ReActAgent?](https://github.com/run-llama/llama_index/issues/21599) - github / github_issue
  • Proposal: Agent Threat Rules detection integration for LlamaIndex - github / github_issue
  • Improve developer error message for unrecognized embedding names in `loa - github / github_issue
  • [[Bug]: Bedrock Converse streaming produces string tool_kwargs in `Tool](https://github.com/run-llama/llama_index/issues/21579) - github / github_issue
  • [[Bug]: Breaking Image/Index node fetching behavior after refactor](https://github.com/run-llama/llama_index/issues/19499) - github / github_issue
  • [[Bug]: PydanticUserError: The __modify_schema__ method is not supporte](https://github.com/run-llama/llama_index/issues/16540) - github / github_issue
  • [[Bug]: gemini-embedding-2 task instructions not implemented (task_type d](https://github.com/run-llama/llama_index/issues/21535) - github / github_issue
  • v0.14.21 - github / github_release

Source: Project Pack community evidence and pitfall evidence