Doramagic Project Pack · Human Manual
llama_index
LlamaIndex serves as the foundational layer for building AI applications that require sophisticated data ingestion, indexing, and querying capabilities. The framework enables developers to:
Introduction to LlamaIndex
Related topics: Core Architecture, Quick Start Guide
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Core Architecture, Quick Start Guide
Introduction to LlamaIndex
LlamaIndex is a comprehensive data framework designed for building LLM (Large Language Model) applications. It provides the essential tools, abstractions, and integrations needed to connect custom data sources to LLMs for retrieval-augmented generation (RAG), question-answering systems, and other AI-powered applications.
Overview
LlamaIndex serves as the foundational layer for building AI applications that require sophisticated data ingestion, indexing, and querying capabilities. The framework enables developers to:
- Ingest data from various sources (PDFs, documents, websites, databases)
- Process and chunk data into optimal segments for LLM consumption
- Create vector indices for efficient semantic search
- Build query engines and retrieval pipelines
- Integrate with hundreds of external services and model providers
Sources: README.md:1-20
Core Architecture
The LlamaIndex framework follows a modular architecture with distinct components that work together to provide end-to-end data pipeline capabilities.
Package Structure
LlamaIndex offers two primary installation methods to accommodate different use cases:
| Package | Description | Use Case |
|---|---|---|
llama-index | Starter package with core + selected integrations | Quick start, common setups |
llama-index-core | Core package only | Custom, minimal deployments |
Sources: README.md:45-55
Import Patterns
The framework uses a namespaced import system that distinguishes between core modules and integration packages:
# Core modules (included in llama-index-core)
from llama_index.core.xxx import ClassABC
# Integration modules (from separate packages)
from llama_index.xxx.yyy import SubclassABC
# Concrete examples
from llama_index.core.llms import LLM
from llama_index.llms.openai import OpenAI
Sources: README.md:56-68
Data Flow Architecture
The following diagram illustrates the typical data flow in a LlamaIndex application:
graph TD
A[Data Sources] --> B[Readers/Loaders]
B --> C[Documents]
C --> D[Node Parsers]
D --> E[Nodes/Chunks]
E --> F[Vector Index]
F --> G[Retriever]
G --> H[Query Engine]
H --> I[LLM Response]
A1[Web Pages] --> B
A2[PDFs] --> B
A3[Databases] --> B
A4[APIs] --> BKey Components
1. Document Loaders
Document loaders (Readers) are responsible for ingesting data from external sources. LlamaIndex provides a vast ecosystem of readers:
| Reader | Purpose | Source |
|---|---|---|
WikipediaReader | Load Wikipedia pages | llama-index-readers-wikipedia |
WholeSiteReader | Scrape entire websites | llama-index-readers-web |
DoclingReader | Parse PDFs, DOCX, HTML | llama-index-readers-docling |
RemoteDepthReader | Extract from URLs recursively | llama-index-readers-remote-depth |
#### Wikipedia Reader Example
from llama_index.readers.wikipedia import WikipediaReader
reader = WikipediaReader()
documents = reader.load_data(pages=["Page Title 1", "Page Title 2"])
Sources: llama-index-readers-wikipedia/README.md:1-25
#### Docling Reader Example
from llama_index.readers.docling import DoclingReader
reader = DoclingReader()
docs = reader.load_data(file_path="https://arxiv.org/pdf/2408.09869")
Sources: llama-index-readers-docling/README.md:1-30
2. Indices
Indices organize documents for efficient retrieval. LlamaIndex supports both managed indices and customizable self-hosted options.
#### Managed Indices
Managed indices like VectaraIndex provide fully hosted solutions:
from llama_index.indices.managed.vectara import VectaraIndex
from llama_index.core.schema import Document, MediaResource
docs = [
Document(
id_="doc1",
text_resource=MediaResource(
text="This is test text for Vectara integration.",
),
),
]
index = VectaraIndex.from_documents(docs)
Sources: llama-index-indices-managed-vectara/README.md:30-50
3. LLM Integrations
LlamaIndex provides integrations with numerous LLM providers through a standardized interface:
# Example: Contextual LLM Integration
from llama_index.llms.contextual import Contextual
llm = Contextual(model="contextual-clm", api_key="your_api_key")
response = llm.complete("Explain the importance of Grounded Language Models.")
Sources: llama-index-llms-contextual/README.md:1-20
Usage Patterns
Building a Simple RAG Pipeline
The most common pattern involves loading documents, creating an index, and querying it:
from llama_index.core import VectorStoreIndex
from llama_index.readers.docling import DoclingReader
# Step 1: Load documents
reader = DoclingReader()
documents = reader.load_data(file_path="document.pdf")
# Step 2: Create index
index = VectorStoreIndex.from_documents(documents)
# Step 3: Query
query_engine = index.as_query_engine()
response = query_engine.query("Summarize this document")
Retrieval-Only Pattern
For applications requiring only retrieval without generation:
retriever = index.as_retriever(similarity_top_k=2)
results = retriever.retrieve("How will users feel about this new tool?")
Sources: llama-index-indices-managed-vectara/README.md:50-65
LangChain Integration
LlamaIndex components can be used as tools within LangChain agents:
from langchain.agents import initialize_agent, Tool
from langchain.llms import OpenAI
from llama_index.core import VectorStoreIndex
from llama_index.readers.web import WholeSiteReader
# Initialize scraper
scraper = WholeSiteReader(prefix="https://docs.llamaindex.ai/en/stable/", max_depth=10)
documents = scraper.load_data(base_url="https://docs.llamaindex.ai/en/stable/")
# Create index
index = VectorStoreIndex.from_documents(documents)
# Define tools
tools = [
Tool(
name="Website Index",
func=lambda q: index.query(q),
description="Useful for answering questions about text on websites.",
),
]
Sources: llama-index-readers-web/llama_index/readers/web/whole_site/README.md:1-40
LlamaParse Platform
LlamaParse is a complementary platform (separate from the open-source LlamaIndex framework) focused on document agents and agentic OCR:
| Component | Function |
|---|---|
| Parse | Agentic OCR and document parsing (130+ formats) |
| Extract | Structured data extraction from documents |
| Index | Ingest, index, and RAG pipelines |
| Split | Split large documents into subcategories |
Sources: README.md:75-85
Ecosystem Overview
LlamaIndex maintains an extensive ecosystem with over 300 integration packages available through LlamaHub:
graph LR
subgraph "Data Sources"
Web[Web]
PDFs[PDFs]
DB[Databases]
APIs[APIs]
end
subgraph "LlamaIndex Core"
Docs[Documents]
Nodes[Nodes]
Indices[Indices]
end
subgraph "LLM Providers"
OpenAI[OpenAI]
HuggingFace[HF]
Local[Local Models]
end
Web --> Docs
PDFs --> Docs
DB --> Docs
APIs --> Docs
Docs --> Indices
Indices --> OpenAI
Indices --> HuggingFace
Indices --> LocalConfiguration Options
Common Reader Configuration Parameters
| Parameter | Type | Description | Example |
|---|---|---|---|
file_path | str | Path to input file/URL | "document.pdf" |
prefix | str | URL prefix for filtering | "https://example.com/" |
max_depth | int | Maximum recursion depth | 10 |
where | dict | Metadata filter condition | {"category": "AI"} |
query | list | Search query text | ["search term"] |
Sources: llama-index-readers-chroma/README.md:1-20
Installation
Quick Start (Recommended)
pip install llama-index
Minimal Installation
pip install llama-index-core
Individual Integrations
pip install llama-index-readers-wikipedia
pip install llama-index-readers-docling
pip install llama-index-llms-openai
Citation
If you use LlamaIndex in academic work, cite as:
@software{Liu_LlamaIndex_2022,
author = {Liu, Jerry},
doi = {10.5281/zenodo.1234},
month = {11},
title = {{LlamaIndex}},
url = {https://github.com/jerryjliu/llama_index},
year = {2022}
}
Sources: README.md:95-105
Next Steps
To continue learning LlamaIndex:
- Getting Started - Follow the starter example
- Concepts - Understand core concepts like Documents, Nodes, and Indices
- LlamaHub - Browse 300+ integrations for various data sources and LLM providers
- Examples - Explore Jupyter notebooks for detailed use cases
Sources: [README.md:1-20](https://github.com/run-llama/llama_index/blob/main/README.md)
Quick Start Guide
Related topics: Introduction to LlamaIndex, Documents and Nodes
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Introduction to LlamaIndex, Documents and Nodes
Quick Start Guide
This guide provides a comprehensive introduction to getting started with LlamaIndex, covering environment setup, core installation methods, and essential development workflows.
Prerequisites
Before beginning, ensure your environment meets the following requirements:
| Requirement | Version/Details |
|---|---|
| Python | 3.8 or higher |
| Package Manager | uv (recommended) or pip |
| Operating System | Unix-like (Linux, macOS), Windows with WSL |
| Git | Latest stable version |
Environment Setup
Creating a Virtual Environment
LlamaIndex recommends using uv for dependency management. Create a virtual environment as follows:
uv venv
source .venv/bin/activate
Sources: llama-dev/README.md:11
Installing the Development CLI
The llama-dev CLI tool is the official command-line interface for development, testing, and automation in the LlamaIndex monorepo.
Install it in editable mode:
uv pip install -e .
After installation, verify the CLI is available:
llama-dev --help
Sources: llama-dev/README.md:12-18
Core Concepts
graph TD
A[LlamaIndex Project] --> B[Core Package: llama-index-core]
A --> C[LLM Integrations]
A --> D[Reader Integrations]
A --> E[Callback Integrations]
B --> F[VectorStoreIndex]
B --> G[ServiceContext]
B --> H[Document Loading]The LlamaIndex framework consists of several key components:
| Component | Purpose |
|---|---|
llama-index-core | Core framework functionality including indexing and querying |
| LLM Integrations | Connectors for various language model providers |
| Reader Integrations | Data loaders for different document sources |
| Callback Integrations | Monitoring and logging capabilities |
Package Management
Querying Package Information
View information about specific packages in the monorepo:
# Get info for a specific package
llama-dev pkg info llama-index-core
# Get info for all packages
llama-dev pkg info --all
Executing Commands in Package Directories
Run commands within the context of specific packages:
# Run a command in a specific package
llama-dev pkg exec --cmd "uv sync" llama-index-core
# Run a command in all packages
llama-dev pkg exec --cmd "uv sync" --all
# Exit at first error
llama-dev pkg exec --cmd "uv" --all --fail-fast
Sources: llama-dev/README.md:26-41
Testing
Running Tests Across the Monorepo
Execute tests for specific packages or across all packages:
# Run tests for a specific package
llama-dev pkg test llama-index-core
# Run tests for all packages
llama-dev pkg test --all
Quick Test Verification
After making changes, verify core functionality:
llama-dev pkg exec --cmd "python -m pytest" llama-index-core
Basic LLM Integration Usage
Initializing an LLM
Different LLM providers follow similar initialization patterns:
from llama_index.llms.ollama import Ollama
# Initialize Ollama LLM
llm = Ollama(model="llama3.1:latest", request_timeout=120.0)
from llama_index.llms.mistralai import MistralAI
llm = MistralAI(api_key="<your-api-key>")
Sources: llama-index-integrations/llms/llama-index-llms-ollama/README.md:30-35 Sources: llama-index-integrations/llms/llama-index-llms-mistralai/README.md:16-18
Generating Completions
# Simple completion
resp = llm.complete("Who is Paul Graham?")
print(resp)
# Chat completion with messages
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(
role=MessageRole.SYSTEM,
content="You are a helpful assistant."
),
ChatMessage(role=MessageRole.USER, content="How to make cake?"),
]
resp = llm.chat(messages)
print(resp)
Sources: llama-index-integrations/llms/llama-index-llms-modelscope/README.md:24-37
Streaming Responses
# Stream completions
resp = llm.stream_complete("Paul Graham is ")
for r in resp:
print(r.delta, end="")
# Stream chat responses
resp = llm.stream_chat([message])
for r in resp:
print(r.delta, end="")
Sources: llama-index-integrations/llms/llama-index-llms-mistralai/README.md:40-48
Building an Index from Documents
Basic Index Creation
from llama_index.core import VectorStoreIndex
# Create index from documents
index = VectorStoreIndex.from_documents(documents)
# Query the index
query_engine = index.as_query_engine()
response = query_engine.query("What is the main topic?")
print(response)
Loading Data from URLs
from llama_index.readers.web import WholeSiteReader
# Initialize the scraper
scraper = WholeSiteReader(
prefix="https://docs.llamaindex.ai/en/stable/",
max_depth=10,
)
# Start scraping from a base URL
documents = scraper.load_data(
base_url="https://docs.llamaindex.ai/en/stable/"
)
# Create index
index = VectorStoreIndex.from_documents(documents)
index.query("What language is on this website?")
Configuration Options
Key Parameters
| Parameter | Description | Default Value |
|---|---|---|
model | LLM model identifier | Required |
api_key | API key for the provider | Required for cloud providers |
request_timeout | Request timeout in seconds | 30.0 |
temperature | Sampling temperature | 0.7 |
max_tokens | Maximum tokens to generate | Provider-specific |
context_window | Maximum context length | Provider-specific |
Environment Variables
Set API keys as environment variables before initialization:
export KONKO_API_KEY=<your-api-key>
export OPENAI_API_KEY=<your-api-key>
import os
os.environ["KONKO_API_KEY"] = "<your-api-key>"
Sources: llama-index-integrations/llms/llama-index-llms-konko/README.md:15-20
Common Workflows
graph LR
A[Setup Environment] --> B[Install llama-dev]
B --> C[Explore Packages]
C --> D{Development Goal}
D -->|Testing| E[Run Tests]
D -->|Integration| F[Configure LLM]
D -->|Data Loading| G[Set up Readers]
E --> H[Modify Code]
F --> H
G --> H
H --> I[Verify Changes]
I --> ETroubleshooting
Common Issues
| Issue | Solution |
|---|---|
| CLI not found | Ensure virtual environment is activated |
| API key errors | Verify environment variables are set |
| Package import errors | Run uv sync in the package directory |
| Timeout errors | Increase request_timeout parameter |
Verification Commands
# Check installation
llama-dev --version
# Verify package structure
llama-dev pkg info --all
# Test core imports
python -c "import llama_index; print(llama_index.__version__)"
Next Steps
After completing this quick start guide:
- Explore specific LLM integrations for your preferred provider
- Review reader integrations for your data sources
- Study the core API documentation for advanced indexing strategies
- Join the community for support and updates
Sources: [llama-dev/README.md:11]()
Core Architecture
Related topics: Introduction to LlamaIndex, Integration Architecture
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Introduction to LlamaIndex, Integration Architecture
Core Architecture
Overview
LlamaIndex is a data framework for building LLM-powered applications. The Core Architecture establishes the fundamental building blocks that enable developers to connect large language models with their custom data sources. This architectural foundation provides a layered, modular approach where each component—from language model interfaces to response handling—follows consistent patterns and abstractions.
The core architecture serves as the abstraction layer between raw data ingestion and sophisticated LLM-powered querying. It separates concerns by defining clear interfaces for language models (LLMs), embedding services, document processing, indexing, and response generation. This design allows developers to swap implementations, extend functionality, and maintain clean separation between components.
System Components
High-Level Architecture Diagram
graph TD
subgraph "Data Layer"
Documents[Documents]
Nodes[Nodes]
Index[Index]
end
subgraph "Core Abstractions"
LLMs[LLM Base]
Embeddings[Embedding Base]
Response[Response Schema]
end
subgraph "Service Layer"
VectorStore[Vector Store]
StorageContext[Storage Context]
end
subgraph "Application Layer"
Query[Query Engine]
Chat[Chat Engine]
Agent[Agent]
end
Documents --> NodeParser
NodeParser --> Nodes
Nodes --> Index
Index --> Query
Query --> Response
LLMs --> Query
Embeddings --> IndexLanguage Model (LLM) Abstraction
Purpose and Role
The LLM base abstraction (llama_index.core.base.llms.base) defines the contract that all language model implementations must follow. This abstraction enables LlamaIndex to support multiple LLM providers—including OpenAI, Anthropic, local models, and custom implementations—through a unified interface.
Sources: llama-index-core/llama_index/core/base/llms/base.py:1-50
Base LLM Interface
The LLM base class provides the following core methods:
| Method | Purpose | Parameters |
|---|---|---|
complete() | Synchronous text completion | prompt: str, formatted: bool = False, **kwargs |
stream_complete() | Streaming text completion | prompt: str, formatted: bool = False, **kwargs |
chat() | Synchronous chat completion | messages: List[ChatMessage], **kwargs |
stream_chat() | Streaming chat completion | messages: List[ChatMessage], **kwargs |
LLM Class Hierarchy
classDiagram
class LLM {
<<abstract>>
+complete()
+stream_complete()
+chat()
+stream_chat()
+metadata: LLMMetadata
}
class LLMMetadata {
+model: str
+temperature: float
+top_p: int
+max_tokens: Optional[int]
+context_window: int
+is_chat_model: bool
+is_function_calling_model: bool
}
class ChatMessage {
+role: MessageRole
+content: str
+additional_kwargs: Dict
}
LLM --> LLMMetadata
LLM --> ChatMessageSources: llama-index-core/llama_index/core/base/llms/base.py:50-120
Message Roles
The MessageRole enum defines valid roles for chat messages:
| Role | Description |
|---|---|
SYSTEM | System-level instructions |
USER | User-generated content |
ASSISTANT | Model-generated responses |
FUNCTION | Function call results |
Embedding Abstraction
Purpose and Role
The embedding base (llama_index.core.base.embeddings.base) provides the interface for text vectorization. Embeddings transform textual content into numerical vectors that enable semantic similarity searches. This abstraction supports various embedding providers while maintaining a consistent API.
Sources: llama-index-core/llama_index/core/base/embeddings/base.py:1-60
Embedding Interface Methods
| Method | Purpose | Return Type |
|---|---|---|
get_query_embedding() | Embed a single query string | List[float] |
get_text_embedding() | Embed a single text string | List[float] |
get_text_embedding_batch() | Embed multiple texts in batch | List[List[float]] |
get_query_embedding_batch() | Embed multiple queries in batch | List[List[float]]] |
Embedding Configuration
graph LR
A[Text Input] --> B[Embedding Model]
B --> C[Dimension: 384-1536]
C --> D[Normalized Vector]Sources: llama-index-core/llama_index/core/base/embeddings/base.py:60-100
Response Schema
Purpose and Role
The response schema (llama_index.core.base.response.schema) defines the data structures used throughout LlamaIndex for returning query results, streaming responses, and structured outputs. This ensures consistent response handling across different query types and engines.
Sources: llama-index-core/llama_index/core/base/response/schema.py:1-80
Core Response Models
| Class | Purpose |
|---|---|
Response | Wraps text responses with sources |
StreamingResponse | Handles streaming token outputs |
ResponseMode | Enum for response generation modes |
Sources | Container for source nodes and metadata |
Response Mode Enumeration
graph TD
A[Query] --> B{Response Mode}
B --> C[default]
B --> D[refine]
B --> E[compact]
B --> F[accumulate]
B --> G[compact_accumulate]
C --> H[Single pass response]
D --> I[Iterative refinement]
E --> J[Compact and respond]
F --> K[Aggregate node responses]
G --> L[Compact then accumulate]Sources: llama-index-core/llama_index/core/base/response/schema.py:30-50
Core Types System
Type Definitions
The types module (llama_index.core.types) defines foundational enumerations and type aliases used throughout the framework:
| Type | Description |
|---|---|
ModelType | Defines model categories (e.g., LLM, EMBEDDING) |
PromptType | Categorizes prompts (e.g., SUMMARY, QUERY) |
NodeType | Defines node kinds (e.g., TEXT, DOCUMENT) |
Sources: llama-index-core/llama_index/core/types.py:1-60
Node Parser Types
classDiagram
class Node {
<<abstract>>
+id_: str
+embedding: Optional[List[float]]
+metadata: Dict[str, Any]
+relationships: Dict[NodeRelationship, Node]
+excluded_embed_metadata_keys: List[str]
+excluded_llm_metadata_keys: List[str]
}
class TextNode {
+text: str
+start_char_idx: Optional[int]
+end_char_idx: Optional[int]
}
class Document {
+text: str
+doc_id: str
+embedding: Optional[List[float]]
}
Node <|-- TextNode
Node <|-- DocumentDocument and Node Model
Document Structure
Documents represent the top-level container for source data. Each document contains metadata and can be broken down into smaller nodes for indexing:
| Field | Type | Description |
|---|---|---|
doc_id | str | Unique document identifier |
text | str | Full text content |
metadata | Dict[str, Any] | Associated metadata |
embedding | Optional[List[float]] | Pre-computed embedding |
Node Relationships
Nodes maintain relationships with other nodes through the NodeRelationship enum:
| Relationship | Description |
|---|---|
SOURCE | Parent document relationship |
PREVIOUS | Previous sibling node |
NEXT | Next sibling node |
PARENT | Parent node in hierarchy |
CHILD | Child node in hierarchy |
Sources: llama-index-core/llama_index/core/node_parser/node.py:30-80
Storage Architecture
Storage Context
The StorageContext manages persistence layers for various data components:
graph TD
StorageContext --> VectorStore
StorageContext --> DocStore
StorageContext --> IndexStore
StorageContext --> GraphStore
VectorStore --> Milvus[Milvus]
VectorStore --> Chroma[Chroma]
VectorStore --> Pinecone[Pinecone]
DocStore --> MongoDB[MongoDB]
DocStore --> Redis[Redis]
DocStore --> Simple[SimpleKVStore]Sources: llama-index-core/llama_index/core/storage/storage_context.py:1-50
Storage Components
| Component | Purpose |
|---|---|
vector_store | Stores embedding vectors for similarity search |
doc_store | Stores serialized nodes and documents |
index_store | Stores index metadata and configurations |
graph_store | Stores knowledge graph relationships |
Index Architecture
Base Index Structure
Indexes provide the mechanism for organizing and querying documents. The base index class establishes the contract for all index implementations:
graph LR
A[Documents] --> B[Index Construction]
B --> C[Node Parsing]
C --> D[Embedding Generation]
D --> E[Vector Storage]
E --> F[Queryable Index]Index Types
| Index Type | Use Case |
|---|---|
VectorStoreIndex | Semantic search over embeddings |
SummaryIndex | Document summarization |
KeywordTableIndex | Keyword-based retrieval |
KnowledgeGraphIndex | Graph-based knowledge representation |
Sources: llama-index-core/llama_index/core/indices/base.py:1-80
Query Engine Architecture
Query Flow
sequenceDiagram
participant User
participant QueryEngine
participant Retriever
participant LLM
participant Response
User->>QueryEngine: Query Request
QueryEngine->>Retriever: Retrieve Nodes
Retriever-->>QueryEngine: Source Nodes
QueryEngine->>LLM: Synthesize Response
LLM-->>QueryEngine: Response
QueryEngine->>Response: Format Output
Response-->>User: Formatted AnswerRetriever Types
| Retriever | Description |
|---|---|
VectorRetriever | Embedding-based similarity search |
KeywordRetriever | BM25 or keyword matching |
HybridRetriever | Combined vector and keyword search |
SentenceWindowRetriever | Contextual window retrieval |
Configuration and Extensibility
Service Context
The ServiceContext bundles together the core service components:
| Parameter | Type | Default | Description |
|---|---|---|---|
llm | LLM | OpenAI() | Language model instance |
embed_model | Embedding | OpenAIEmbedding() | Embedding model instance |
node_parser | NodeParser | SentenceSplitter() | Text chunking strategy |
prompt_helper | PromptHelper | Auto-calculated | Prompt size optimization |
Customization Patterns
graph TD
subgraph "Extension Points"
CustomLLM[Custom LLM Implementation]
CustomEmbed[Custom Embedding Model]
CustomParser[Custom Node Parser]
CustomStore[Custom Storage Backend]
end
CustomLLM -->|inherits| LLMBase[LLM Base]
CustomEmbed -->|inherits| EmbedBase[Embedding Base]
CustomParser -->|inherits| NodeParserBase[NodeParser Base]
CustomStore -->|inherits| StorageContextBase[StorageContext Base]Summary
The Core Architecture of LlamaIndex establishes a modular, extensible framework built on well-defined abstractions. The layered architecture—from base interfaces like LLM and Embedding through storage and indexing components to application-layer query engines—enables developers to:
- Swap implementations without changing application code
- Extend functionality through inheritance and composition
- Maintain clean separation between concerns
- Support multiple providers through unified interfaces
The architecture follows consistent patterns across components, making the framework predictable and learnable while supporting the diverse requirements of production LLM applications.
Sources: [llama-index-core/llama_index/core/base/llms/base.py:1-50]()
Integration Architecture
Related topics: Core Architecture, Retrieval and Reranking
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Core Architecture, Retrieval and Reranking
Integration Architecture
Overview
LlamaIndex employs a modular integration architecture that extends the core framework's capabilities through a comprehensive ecosystem of pluggable components. The integration system allows developers to connect LlamaIndex with external services, APIs, local models, and specialized tools without modifying the core library. This architecture follows a provider-based pattern where each integration package implements standardized interfaces to ensure compatibility and consistent behavior across different external systems.
The integration architecture serves as the bridge between LlamaIndex's core data structures and the diverse landscape of LLM providers, embedding services, document loaders, and auxiliary tools. By maintaining well-defined contracts between components, the system enables seamless swapping of implementations while preserving the overall workflow of building retrieval-augmented generation (RAG) pipelines and query engines.
Integration Categories
LlamaIndex organizes its integrations into distinct categories, each addressing a specific aspect of the LLM application development workflow. The categorization ensures logical separation of concerns and simplifies dependency management for end users.
LLM Integrations
LLM (Large Language Model) integrations provide adapters for connecting to various language model providers. These integrations implement the unified LLM interface defined in llama_index.core.llms, allowing developers to switch between providers without changing application code. Each LLM integration handles provider-specific authentication, request formatting, response parsing, and streaming behavior.
| Integration Package | Provider | Key Features |
|---|---|---|
llama-index-llms-contextual | Contextual | Contextual LLM wrapper |
llama-index-llms-konko | Konko | Supports both Konko and OpenAI models |
llama-index-llms-lmstudio | LM Studio | Local server integration |
llama-index-llms-monsterapi | MonsterAPI | Private deployments and GA models |
llama-index-llms-modelscope | ModelScope | Qwen and other ModelScope models |
llama-index-llms-langchain | LangChain | LangChain LLM wrapper |
llama-index-llms-optimum-intel | Intel Optimum | CPU-optimized inference |
Sources: llama-index-integrations/llms/llama-index-llms-contextual/README.md
Reader Integrations
Reader integrations enable data ingestion from various document sources and web content. These loaders transform external data formats into LlamaIndex's internal Document schema, providing a unified representation regardless of the source type.
| Reader Type | Source Format | Package |
|---|---|---|
| Document Readers | PDF, DOCX, HTML | llama-index-readers-docling |
| Web Readers | URLs, Articles | llama-index-readers-web |
| Wikipedia | Wikipedia pages | llama-index-readers-wikipedia |
| Remote Content | Deep link crawling | llama-index-readers-remote-depth |
| Cloud Storage | Box files | llama-index-readers-box |
| Preprocessed | Chunks from Preprocess API | llama-index-readers-preprocess |
Sources: llama-index-integrations/readers/llama-index-readers-wikipedia/README.md
Embedding Integrations
Embedding integrations provide vectorization capabilities through external embedding models. These components convert text into dense vector representations suitable for semantic search and similarity operations.
| Provider | Model Examples | Package |
|---|---|---|
| Ollama | nomic-embed-text, embeddinggemma, mxbai-embed-large | llama-index-embeddings-ollama |
Sources: llama-index-integrations/embeddings/llama-index-embeddings-ollama/README.md
Index Integrations
Index integrations connect to managed vector search services, providing fully-hosted indexing and retrieval capabilities. These integrations abstract the complexity of distributed vector databases behind LlamaIndex's retriever interface.
| Managed Service | Package | Features |
|---|---|---|
| Vectara | llama-index-indices-managed-vectara | RAG pipeline, retriever, query engine |
Sources: llama-index-integrations/indices/llama-index-indices-managed-vectara/README.md
Tool Integrations
Tool integrations extend LlamaIndex's agent capabilities by providing access to external services that can be invoked during agent execution.
| Tool Provider | Features | Package |
|---|---|---|
| Moss | Hybrid search (keyword + semantic) | llama-index-tools-moss |
Callback Integrations
Callback integrations enable observability and feedback collection by integrating with external monitoring and evaluation platforms.
| Platform | Purpose | Package |
|---|---|---|
| Argilla | Feedback loop, LLM monitoring | llama-index-callbacks-argilla |
Sources: llama-index-integrations/callbacks/llama-index-callbacks-argilla/README.md
System Architecture
The integration architecture follows a layered approach where core abstractions define the contracts, and integration packages provide concrete implementations. This design enables horizontal scalability of integrations while maintaining vertical consistency with the core framework.
graph TD
A[Application Layer] --> B[Core LlamaIndex]
B --> C[Interface Abstractions]
C --> D[LLM Abstraction]
C --> E[Reader Abstraction]
C --> F[Embedding Abstraction]
C --> G[Retriever Abstraction]
D --> H[LLM Integrations]
E --> I[Reader Integrations]
F --> J[Embedding Integrations]
G --> K[Index Integrations]
H --> L[Konko, LMStudio, MonsterAPI, etc.]
I --> M[Docling, Wikipedia, Web, Box, etc.]
J --> N[Ollama Embeddings]
K --> O[Vectara]Common Integration Patterns
LLM Integration Pattern
LLM integrations follow a consistent initialization pattern that accepts provider-specific configuration parameters. The typical constructor accepts a model identifier, base URL for API endpoints, and optional generation parameters such as temperature and maximum tokens.
from llama_index.llms.provider_name import ProviderLLM
llm = ProviderLLM(
model="model-identifier",
api_key="your-api-key",
temperature=0.7,
max_tokens=256
)
Sources: llama-index-integrations/llms/llama-index-llms-konko/README.md
Reader Integration Pattern
Reader integrations follow a loader pattern where initialization may require credentials, and the load_data method accepts source-specific parameters such as URLs, file paths, or query filters.
from llama_index.readers.source_type import SourceReader
reader = SourceReader(credentials="your-credentials")
documents = reader.load_data(source="document-source")
Sources: llama-index-integrations/readers/llama-index-readers-remote-depth/README.md
Data Flow Architecture
The integration architecture enables a complete RAG pipeline where each component plays a specific role in transforming input data into actionable insights.
graph LR
A[Document Sources] --> B[Readers]
B --> C[Documents]
C --> D[Node Parsers]
D --> E[Nodes]
E --> F[Vector Index]
E --> G[Storage Context]
F --> H[Retriever]
G --> H
H --> I[Query Engine]
I --> J[LLM]
J --> K[Response]Installation and Dependency Management
Each integration package follows the naming convention llama-index-{category}-{provider} and can be installed independently via pip. This modular approach minimizes dependency overhead by allowing users to install only the packages required for their specific use case.
| Category | Package Naming Pattern | Installation Command |
|---|---|---|
| LLM | llama-index-llms-{provider} | pip install llama-index-llms-{provider} |
| Reader | llama-index-readers-{source} | pip install llama-index-readers-{source} |
| Embedding | llama-index-embeddings-{provider} | pip install llama-index-embeddings-{provider} |
| Index | llama-index-indices-{type}-{provider} | pip install llama-index-indices-{type}-{provider} |
| Tool | llama-index-tools-{provider} | pip install llama-index-tools-{provider} |
| Callback | llama-index-callbacks-{platform} | pip install llama-index-callbacks-{platform} |
Configuration Management
Integrations typically support configuration through both constructor parameters and environment variables. This dual approach accommodates both explicit configuration in code and secret management through environment-based configuration.
Environment Variable Pattern
Many integrations follow a pattern where API keys can be set as environment variables for security and convenience:
export PROVIDER_API_KEY="your-api-key"
export OPENAI_API_KEY="your-openai-key"
Constructor Parameter Pattern
Alternatively, credentials can be passed directly to the integration constructor:
llm = ProviderLLM(
model="model-name",
api_key="explicit-api-key",
base_url="https://api.provider.com"
)
Sources: llama-index-integrations/llms/llama-index-llms-lmstudio/README.md
Extending the Architecture
The integration architecture is designed for extensibility. New integrations can be created by implementing the appropriate abstract base classes defined in llama_index.core. Each integration category has its own interface specification that ensures consistency across implementations.
Creating a New LLM Integration
To create a new LLM integration, implement the following interface contract:
- Inherit from the base LLM class
- Implement
complete(),chat(), and streaming methods - Handle provider-specific authentication and error handling
- Follow the naming convention for the package
Creating a New Reader Integration
To create a new reader integration:
- Implement a loader class with
load_data()method - Transform source data into
Documentobjects - Handle pagination, filtering, and error cases appropriately
- Document supported source formats and parameters
Integration Testing Considerations
Each integration package maintains its own test suite to verify compatibility with the external service. Integration tests typically require actual API credentials and network access, distinguishing them from unit tests that mock external dependencies.
Best Practices
When working with LlamaIndex integrations, consider the following best practices:
- Dependency Isolation: Install only required integration packages to minimize potential conflicts
- Credential Management: Use environment variables for sensitive credentials in production
- Error Handling: Implement appropriate retry logic and fallback strategies for external service calls
- Resource Management: Close connections and release resources properly when using streaming responses
- Version Compatibility: Check integration package versions against the core LlamaIndex version for compatibility
Deprecated Integrations
Some integration packages may be discontinued over time as external services evolve or change their offerings. When an integration is deprecated, it will receive no further updates or support. Users should migrate to alternative solutions before removing deprecated packages from their projects.
Sources: llama-index-integrations/readers/llama-index-readers-preprocess/README.md
Conclusion
The integration architecture provides a flexible, extensible framework for connecting LlamaIndex with the broader ecosystem of LLM providers, data sources, and tools. By maintaining standardized interfaces while allowing provider-specific implementations, the architecture enables developers to build sophisticated RAG applications without being locked into a single vendor or service. The modular design supports incremental adoption, allowing teams to integrate new capabilities as their requirements evolve.
Sources: [llama-index-integrations/llms/llama-index-llms-contextual/README.md](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-contextual/README.md)
Documents and Nodes
Related topics: Storage Systems, Query Engines
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Storage Systems, Query Engines
Documents and Nodes
Overview
In LlamaIndex, Documents and Nodes are the fundamental data structures that represent information to be indexed, searched, and retrieved. Documents serve as the primary unit of input data, while Nodes are the granular chunks created during document processing for optimal embedding and retrieval.
Document Model
Purpose and Scope
A Document in LlamaIndex represents a single unit of data to be indexed. It encapsulates the content along with associated metadata that provides context about the source, type, and additional information useful for retrieval and processing.
Core Document Schema
The Document model is defined in llama-index-core/llama_index/core/schema.py and includes the following key attributes:
| Attribute | Type | Description |
|---|---|---|
text | str | The main text content of the document |
id_ | str | Unique identifier for the document |
metadata | Dict[str, Any] | Additional metadata about the document |
mimetype | str | MIME type of the document content |
relationships | Dict[str, RelationshipType] | Relationships to other nodes/documents |
Document Construction
Documents can be created with varying levels of detail:
from llama_index.core import Document
# Basic document
doc = Document(text="Your content here")
# Document with metadata
doc = Document(
text="Your content here",
metadata={
"source": "review.txt",
"author": "John Doe",
"date": "2024-01-15"
}
)
Node Model
Purpose and Scope
Nodes are the result of parsing and chunking Documents into smaller, semantically coherent pieces. Each Node inherits document-like properties but adds relationship information linking back to its parent Document and sibling Nodes.
Node Structure
Nodes extend the Document schema with additional attributes defined in llama-index-core/llama_index/core/schema.py:
| Attribute | Type | Description |
|---|---|---|
node_id | str | Unique identifier for the node |
start_char_idx | int | Starting character index in parent document |
end_char_idx | int | Ending character index in parent document |
text_template | str | Template for rendering the node text |
relationships | Dict[RelationshipType, RelatedNodeType] | Relationships including PARENT, PREVIOUS, NEXT |
Architecture Diagram
graph TD
A[Raw Input Data] --> B[Document]
B --> C[Node Parser]
C --> D[Nodes]
D --> E[Embedding Model]
E --> F[Vector Index]
G[Metadata] --> B
H[Relationships] --> D
B -->|PARENT| D
D -->|CHILD| BReaders and Loading
Base Reader Interface
Readers are responsible for loading data from various sources and converting them into Documents. The base reader interface is defined in llama-index-core/llama_index/core/readers/base.py.
| Method | Description |
|---|---|
load_data() | Load documents from a data source |
lazy_load_data() | Lazily load documents for memory efficiency |
Supported Reader Types
LlamaIndex provides numerous reader integrations for different data sources:
| Category | Reader | Description |
|---|---|---|
| Document | Docling Reader | PDF, DOCX, HTML extraction to Markdown or JSON |
| Document | MarkItDown Reader | Converts various formats to Markdown |
| Document | Docugami Loader | XML knowledge graph from PDF/DOCX |
| Web | NewsArticleReader | Parses news article URLs |
| Web | UnstructuredURLLoader | URL text extraction via Unstructured.io |
| Web | TrafilaturaWebReader | Web scraping with trafilatura |
| Web | MainContentExtractorReader | Main content extraction from websites |
| Web | ReadabilityWebPageReader | Readability-based web extraction |
| Web | RemoteDepthReader | Recursive URL loading with depth control |
| Web | WholeSiteReader | Full site scraping with prefix/depth |
| Academic | SemanticScholarReader | Scholarly articles and papers |
| Database | Chroma Reader | Loading from Chroma vector store |
Usage Example
from llama_index.readers.docling import DoclingReader
reader = DoclingReader()
docs = reader.load_data(file_path="document.pdf")
Node Parsers
Purpose and Scope
Node Parsers transform Documents into Nodes by splitting content based on semantic boundaries. The interface is defined in llama-index-core/llama_index/core/node_parser/interface.py.
Core Interface Methods
| Method | Description |
|---|---|
get_nodes_from_documents() | Parse documents into nodes |
get_batch_nodes() | Process documents in batches |
Sentence Splitter Parser
The sentence-based node parser in llama-index-core/llama_index/core/node_parser/text/sentence.py provides configurable text chunking:
| Parameter | Type | Default | Description |
|---|---|---|---|
separator | str | "\n\n" | Chunk separator |
chunk_size | int | 1024 | Maximum characters per chunk |
chunk_overlap | int | 0 | Overlap between chunks |
chunking_tokenizer | callable | None | Custom tokenizer function |
callback_manager | CallbackManager | None | Event callbacks |
Docling Node Parser
The Docling Node Parser (llama-index-integrations/node_parser/llama-index-node-parser-docling/README.md) parses Docling JSON output into LlamaIndex nodes with rich metadata:
from llama_index.node_parser.docling import DoclingNodeParser
node_parser = DoclingNodeParser()
nodes = node_parser.get_nodes_from_documents(documents=docs)
Document-Node Relationships
Relationship Types
Nodes maintain typed relationships to other components:
| Relationship | Description |
|---|---|
PARENT | Link to parent Document |
CHILD | Link to child elements |
PREVIOUS | Previous sibling Node |
NEXT | Next sibling Node |
SOURCE | Source Document reference |
Metadata Preservation
Nodes automatically inherit and extend document metadata:
# Node metadata includes provenance information
{
'doc_items': [{'self_ref': '#/main-text/21'}],
'prov': [{'page_no': 2, 'bbox': {...}}],
'headings': ['2 Getting Started']
}
Workflow
graph LR
A[Load Data] --> B[Create Document]
B --> C[Parse Document]
C --> D[Generate Nodes]
D --> E[Create Embeddings]
E --> F[Build Index]
A1[Readers] --> A
C1[Node Parsers] --> CBest Practices
Document Creation
- Always assign unique
id_attributes for tracking - Include comprehensive metadata for filtering
- Specify
mimetypewhen content type matters
Node Parsing
- Choose appropriate
chunk_sizefor your embedding model - Configure
chunk_overlapfor context continuity - Use semantic-aware parsers (Docling) for complex documents
Memory Management
- Use
lazy_load_data()for large document collections - Consider batch processing for node parsing
- Leverage streaming for very large files
Related Integrations
| Integration | Use Case |
|---|---|
| VectaraIndex | Managed semantic search |
| ChromaReader | Vector database loading |
| AlibabaCloud AISearch | Cloud-based document parsing |
| Ollama Embeddings | Local embedding generation |
Summary
Documents serve as the primary data ingestion point in LlamaIndex, encapsulating raw content and metadata from various sources. Nodes are the processed, chunked representations optimized for embedding generation and retrieval. Together with Readers and Node Parsers, they form the foundation of the LlamaIndex data pipeline.
Source: https://github.com/run-llama/llama_index / Human Manual
Storage Systems
Related topics: Documents and Nodes, Retrieval and Reranking
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Documents and Nodes, Retrieval and Reranking
Storage Systems
Overview
LlamaIndex provides a comprehensive storage system that allows users to persist indexes, documents, and chat histories to disk for later retrieval and reuse. The storage architecture is built around the StorageContext class, which serves as the central coordinator for managing various storage backends including document stores, index stores, and chat stores.
The storage system enables:
- Persistence: Save index data to disk for long-term storage
- Retrieval: Reload previously persisted indexes without recomputation
- In-memory fallback: Default in-memory storage when persistence is not configured
- Customizable backends: Pluggable storage implementations for different use cases
Architecture
graph TD
A[StorageContext] --> B[VectorStore]
A --> C[DocStore]
A --> D[IndexStore]
A --> E[ChatStore]
A --> F[ImageStore]
A --> G[GraphStore]
C --> H[SimpleDocStore]
C --> I[MongoDocStore]
C --> J[KVDocStore]
D --> K[SimpleIndexStore]
D --> L[MongoIndexStore]
D --> M[KVIndexStore]
E --> N[SimpleChatStore]
E --> O[MongoChatStore]StorageContext
The StorageContext class is the main entry point for configuring storage in LlamaIndex. It aggregates all storage components and provides methods for persistence and retrieval.
Initialization
from llama_index.core import StorageContext, load_index_from_storage
# Create with default in-memory stores
storage_context = StorageContext.from_defaults()
# Create with persistence to disk
storage_context = StorageContext.from_defaults(persist_dir="./storage")
# Load existing index from disk
index = load_index_from_storage(storage_context=storage_context)
Configuration Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
persist_dir | str | None | Directory path for persistence |
vector_store | BaseVectorStore | InMemoryVectorStore | Vector storage backend |
docstore | BaseDocstore | SimpleDocumentStore | Document storage backend |
index_store | BaseIndexStore | SimpleIndexStore | Index metadata storage |
graph_store | BaseGraphStore | None | Knowledge graph storage |
chat_store | BaseChatStore | SimpleChatStore | Chat history storage |
image_store | BaseImageStore | None | Image storage backend |
Persistence Methods
| Method | Description |
|---|---|
persist(persist_dir, ...) | Save all storage components to disk |
from_defaults(**kwargs) | Create context with default or specified settings |
load_index_from_storage() | Class method to load index from persisted storage |
Document Store
The document store manages the storage and retrieval of BaseDocument objects. LlamaIndex provides several document store implementations.
SimpleDocumentStore
The default in-memory document store backed by SQLite for persistence.
from llama_index.core.storage.docstore import SimpleDocumentStore
docstore = SimpleDocumentStore(
persist_path="./docstore.json",
redis_host="localhost",
redis_port=6379,
redis_password=None
)
Document Store API
| Method | Description |
|---|---|
add_documents(documents, batch_size) | Add documents to the store |
get_document(doc_id) | Retrieve a document by ID |
delete(doc_id) | Remove a document by ID |
get_nodes(node_ids) | Retrieve nodes by their IDs |
get_all_nodes() | Retrieve all nodes from the store |
persist(persist_path) | Persist the document store to disk |
Data Model
Documents are stored with the following structure:
class BaseDocument:
id_: str # Unique identifier
embedding: List[float] # Vector embedding
metadata: Dict[str, Any] # User-defined metadata
text: str # Document text content
excluded_embed_metadata_keys: List[str]
excluded_llm_metadata_keys: List[str]
relationships: Dict[DocumentRelationship, str]
hash: str # Computed hash for caching
__class__: type # Document type (optional)
Index Store
The index store manages index metadata and structure, enabling efficient retrieval of index components.
SimpleIndexStore
The default index store implementation using JSON file storage.
from llama_index.core.storage.index_store import SimpleIndexStore
index_store = SimpleIndexStore(
persist_path="./index_store.json"
)
Index Store API
| Method | Description |
|---|---|
add_index_struct(index_struct) | Store an index structure |
get_index_struct(struct_id) | Retrieve index structure by ID |
get_index_structs() | List all stored index structures |
delete_index_struct(struct_id) | Remove an index structure |
Supported Index Types
| Index Type | Description |
|---|---|
VectorStoreIndex | Dense vector-based retrieval |
SummaryIndex | Summary-based indexing |
KeywordTableIndex | Keyword-based retrieval |
KnowledgeGraphIndex | Graph-based knowledge indexing |
Chat Store
The chat store manages conversation history for multi-turn interactions with language models.
SimpleChatStore
A persistent chat store implementation for storing and retrieving chat messages.
from llama_index.core.storage.chat_store import SimpleChatStore
chat_store = SimpleChatStore(
persist_path="./chat_store.json"
)
Chat Store API
| Method | Description |
|---|---|
add_message(chat_id, message, role) | Append a message to a chat session |
get_messages(chat_id) | Retrieve all messages for a chat |
get_chat(chat_id) | Get chat session details |
delete_chat(chat_id) | Remove a chat session |
persist(persist_path) | Save chat history to disk |
Message Structure
| Field | Type | Description |
|---|---|---|
role | str | Message role (user/assistant/system) |
content | str | Message text content |
additional_kwargs | Dict | Extra metadata for the message |
Storage Workflow
graph LR
A[Create Documents] --> B[Initialize StorageContext]
B --> C{Configure Backends}
C --> D[In-Memory]
C --> E[Persistent]
D --> F[Build Index]
E --> F
F --> G[Index Created]
G --> H[Persist to Disk]
H --> I[StorageContext.persist]
J[Load Index] --> K[load_index_from_storage]
K --> L[Index Ready]Usage Examples
Basic Persistence
from llama_index.core import VectorStoreIndex, StorageContext
# Create documents
documents = [...]
# Create index with storage context
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context
)
# Explicitly persist (optional - also happens on garbage collection)
index.storage_context.persist()
Loading Persisted Index
from llama_index.core import StorageContext, load_index_from_storage
# Rebuild storage context from disk
storage_context = StorageContext.from_defaults(persist_dir="./storage")
# Load existing index
index = load_index_from_storage(storage_context=storage_context)
# Query the loaded index
query_engine = index.as_query_engine()
response = query_engine.query("Your question here")
Custom Storage Configuration
from llama_index.core import StorageContext
from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.core.storage.index_store import SimpleIndexStore
# Create custom stores
docstore = SimpleDocumentStore(persist_path="./custom_docstore.json")
index_store = SimpleIndexStore(persist_path="./custom_index_store.json")
# Configure storage context with custom stores
storage_context = StorageContext(
docstore=docstore,
index_store=index_store,
persist_dir="./custom_storage"
)
# Use with index
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context
)
Storage Backend Comparison
| Backend | Persistence | Performance | Scalability | Use Case |
|---|---|---|---|---|
SimpleDocumentStore | JSON/SQLite | Medium | Low-Medium | Development, small datasets |
RedisDocumentStore | Redis | High | High | Production, distributed systems |
MongoDocumentStore | MongoDB | High | Very High | Large-scale deployments |
KVDocumentStore | Key-Value | High | Medium-High | General purpose |
Best Practices
``python Document(id_="unique_doc_1", text="content") ``
- Always specify unique document IDs: Prevents duplicate entries and enables predictable retrieval
- Configure persistence early: Set up storage context before building indexes to avoid data loss
- Use appropriate batch sizes: When adding many documents, use batch operations for better performance
- Handle persistence errors: Wrap persistence calls in try-except blocks for robustness
- Backup important data: Regularly backup persisted storage directories
Related Components
- Vector Stores: Manage embedding vectors for semantic search
- Graph Stores: Handle knowledge graph data structures
- Image Stores: Store image data for multimodal applications
- Query Engines: Use storage to retrieve relevant documents for queries
- Retrievers: Access stored data for retrieval-augmented generation
Source: https://github.com/run-llama/llama_index / Human Manual
Query Engines
Related topics: Retrieval and Reranking, Documents and Nodes
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Retrieval and Reranking, Documents and Nodes
Query Engines
Query Engines are the core components in LlamaIndex responsible for processing user queries and returning relevant responses by retrieving, synthesizing, and formatting information from indexed data.
Overview
Query Engines serve as the primary interface for querying indexed documents in LlamaIndex. They coordinate the retrieval of relevant context from the index and synthesize this information into coherent, helpful responses using Large Language Models (LLMs).
Key Responsibilities:
- Receive user queries and transform them into retrieval operations
- Coordinate with retrievers to fetch relevant documents or data chunks
- Route queries to appropriate response synthesizers
- Handle query-time configuration such as similarity thresholds and response modes
Sources: llama-index-core/llama_index/core/query_engine/__init__.py
Architecture
The query engine architecture follows a modular pipeline pattern where different components handle specific stages of query processing.
graph TD
A[User Query] --> B[Query Engine]
B --> C[Retriever]
C --> D[Node Postprocessor]
D --> E[Response Synthesizer]
E --> F[LLM]
F --> G[Response]
H[Vector Store Index] --> C
I[Summary Index] --> C
J[Knowledge Graph Index] --> CCore Components
| Component | Purpose | Location |
|---|---|---|
| BaseQueryEngine | Abstract base class defining the query interface | llama_index.core.query_engine |
| RetrieverQueryEngine | Default query engine using retrievers | retriever_query_engine.py |
| SubQuestionQueryEngine | Decomposes complex queries into sub-questions | sub_question_query_engine.py |
| ResponseSynthesizer | Generates responses from retrieved context | llama_index.core.response_synthesizers |
Sources: llama-index-core/llama_index/core/query_engine/retriever_query_engine.py
RetrieverQueryEngine
The RetrieverQueryEngine is the default query engine implementation that combines retrieval with response synthesis.
Initialization
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
Constructor Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| retriever | BaseRetriever | Required | The retriever used to fetch relevant nodes |
| response_synthesizer | BaseSynthesizer | None | Synthesizer for generating responses |
| node_postprocessors | List[BaseNodePostprocessor] | [] | Post-processors applied after retrieval |
| callback_manager | CallbackManager | None | Manages callbacks for query events |
Sources: llama-index-core/llama_index/core/query_engine/retriever_query_engine.py:40-60
Query Flow
sequenceDiagram
participant User
participant QueryEngine
participant Retriever
participant Postprocessor
participant Synthesizer
participant LLM
User->>QueryEngine: query(question)
QueryEngine->>Retriever: retrieve(query_str)
Retriever-->>QueryEngine: nodes[]
QueryEngine->>Postprocessor: postprocess(nodes)
Postprocessor-->>QueryEngine: filtered_nodes[]
QueryEngine->>Synthesizer: synthesize(query_str, nodes)
Synthesizer->>LLM: generate(prompt)
LLM-->>Synthesizer: response
Synthesizer-->>QueryEngine: Response
QueryEngine-->>User: ResponseSubQuestionQueryEngine
The SubQuestionQueryEngine handles complex queries by decomposing them into simpler sub-questions that can be answered independently.
Use Cases
- Queries requiring information from multiple data sources
- Complex questions that benefit from step-by-step reasoning
- Multi-hop questions requiring logical deduction
Configuration
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.callbacks import CallbackManager
query_engine = SubQuestionQueryEngine.from_defaults(
query_engine_tools=query_engine_tools,
callback_manager=CallbackManager([callback]),
verbose=True
)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| query_engine_tools | List[QueryEngineTool] | Required | List of query engines and their descriptions |
| response_synthesizer | BaseSynthesizer | None | Response synthesizer to use |
| sub_question_name | str | "sub_question" | Name for sub-question events |
| parent_name | str | "parent_question" | Name for parent question events |
| callback_manager | CallbackManager | None | Callback manager for events |
| verbose | bool | False | Enable verbose output |
Sources: llama-index-core/llama_index/core/query_engine/sub_question_query_engine.py:50-80
Response Synthesizers
Response Synthesizers transform retrieved context into natural language responses.
Available Synthesizer Types
| Synthesizer | Description | Use Case |
|---|---|---|
| CompactAndRefine | Compacts retrieved context before generating | Large retrieval results |
| TreeSummarize | Hierarchically summarizes retrieved nodes | Comprehensive responses |
| SimpleSummarize | Direct concatenation and summarization | Quick, simple responses |
| Refine | Iteratively improves response quality | High-quality refinement |
| Accumulate | Combines responses from multiple sources | Multi-source queries |
| Generation | Direct LLM generation from context | Simple generation tasks |
Base Interface
class BaseSynthesizer(ABC):
@abstractmethod
async def synthesize(
self,
query: QueryBundle,
nodes: List[NodeWithScore],
**kwargs: Any
) -> Response:
pass
Sources: llama-index-core/llama_index/core/response_synthesizers/base.py:30-50
Vector Store Index Query Engine
The VectorStoreIndex provides built-in query engine creation through the as_query_engine() method.
Factory Method Parameters
index.as_query_engine(
query_mode: str = "default",
similarity_top_k: int = 10,
vector_store_query_mode: str = "default",
alpha: Optional[float] = None,
**kwargs: Any
) -> BaseQueryEngine
| Parameter | Type | Default | Description |
|---|---|---|---|
| query_mode | str | "default" | Query execution mode |
| similarity_top_k | int | 10 | Number of top results to retrieve |
| vector_store_query_mode | str | "default" | Vector store specific query mode |
| alpha | float | None | Hybrid search weight (0-1, default 0.5) |
Sources: llama-index-core/llama_index/core/indices/vector_store/base.py
Query Modes
| Mode | Description |
|---|---|
default | Standard retrieval based on similarity |
mmr | Maximum Marginal Relevance for diverse results |
hybrid | Combines sparse and dense retrieval |
Query Engine Tool
For agent-based workflows, query engines can be wrapped as tools using the QueryEngineTool class.
from llama_index.core.tools import QueryEngineTool
tool = QueryEngineTool(
query_engine=query_engine,
metadata=ToolMetadata(
name="website_index",
description="Useful for answering questions about text on websites",
)
)
Sources: llama-index-core/llama_index/core/query_engine/sub_question_query_engine.py:100-120
Advanced Configuration
Node Post-processors
Post-processors filter and enhance retrieved nodes before synthesis.
from llama_index.core.postprocessor import SimilarityPostprocessor
query_engine = index.as_query_engine(
node_postprocessors=[
SimilarityPostprocessor(similarity_cutoff=0.7)
]
)
Custom Query Engines
Create custom query engines by extending the base class:
from llama_index.core.query_engine import BaseQueryEngine
class CustomQueryEngine(BaseQueryEngine):
def __init__(self, retriever, synthesizer):
self._retriever = retriever
self._synthesizer = synthesizer
async def _aquery(self, query_bundle: QueryBundle) -> Response:
nodes = await self._retriever.aretrieve(query_bundle)
response = await self._synthesizer.synthesize(
query_bundle, nodes
)
return response
Async Query Execution
Query engines support both sync and async execution patterns:
# Synchronous
response = query_engine.query("What is LlamaIndex?")
# Asynchronous
response = await query_engine.aquery("What is LlamaIndex?")
Integration with Vector Indices
Query engines integrate with various index types:
| Index Type | Default Query Engine | Features |
|---|---|---|
| VectorStoreIndex | RetrieverQueryEngine | Semantic similarity search |
| SummaryIndex | RetrieverQueryEngine | Full document retrieval |
| KnowledgeGraphIndex | RetrieverQueryEngine | Graph-based traversal |
| ComposableGraph | SubQuestionQueryEngine | Multi-index queries |
Best Practices
- Choose appropriate top_k: Balance between response quality and speed (typically 3-10 for most use cases)
- Use sub-question engine for complex queries: When queries require reasoning across multiple sources
- Configure similarity thresholds: Filter low-quality matches using post-processors
- Enable callbacks for debugging: Monitor query execution flow and performance
- Select appropriate synthesizers: Match the synthesizer type to your response quality requirements
Summary
Query Engines in LlamaIndex provide a flexible, extensible framework for retrieving and synthesizing information from indexed data. The modular architecture allows for customization at every stage of the query pipeline, from retrieval configuration to response generation.
Key Takeaways:
- Query engines orchestrate the retrieval-synthesis pipeline
RetrieverQueryEnginehandles standard query flowsSubQuestionQueryEnginedecomposes complex queries- Response synthesizers generate final output from context
- Extensive configuration options enable fine-tuned control
Sources: llama-index-core/llama_index/core/query_engine/retriever_query_engine.py Sources: llama-index-core/llama_index/core/query_engine/sub_question_query_engine.py Sources: llama-index-core/llama_index/core/response_synthesizers/base.py Sources: llama-index-core/llama_index/core/indices/vector_store/base.py
Sources: [llama-index-core/llama_index/core/query_engine/__init__.py]()
Retrieval and Reranking
Related topics: Query Engines, Storage Systems
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Query Engines, Storage Systems
Retrieval and Reranking
Overview
Retrieval and Reranking are fundamental components in LlamaIndex's architecture for building effective Retrieval-Augmented Generation (RAG) systems. The retrieval system identifies relevant context from various data sources, while the reranking system reorders retrieved results to optimize relevance using advanced techniques like LLM-based scoring.
In LlamaIndex, retrieval is handled through a flexible retriever abstraction that supports multiple retrieval strategies including vector-based search, keyword search, and hybrid approaches. Reranking serves as a post-processing step that improves result quality by reordering retrieved nodes based on more sophisticated relevance criteria.
Architecture Overview
graph TD
A[Query Input] --> B[Retrieval Phase]
B --> C[Vector/Knowledge Graph Retrieval]
C --> D[Initial Node Set]
D --> E[Reranking Phase]
E --> F[LLM Reranker]
F --> G[Reordered Results]
G --> H[Response Generation]
I[Document Sources] --> J[Indexing]
J --> K[Vector Store / Graph Store]
K --> CRetrieval Components
Retriever Abstraction
LlamaIndex provides a base BaseRetriever class that defines the interface for all retrieval implementations. Retrievers work in conjunction with indices to fetch relevant nodes from vector stores or knowledge graphs.
Core Retriever Classes:
| Component | File Path | Purpose |
|---|---|---|
BaseRetriever | llama-index-core/llama_index/core/retrievers/ | Abstract base for all retrievers |
RecursiveRetriever | llama-index-core/llama_index/core/retrievers/recursive_retriever.py | Multi-level recursive retrieval |
PropertyGraphRetriever | llama-index-core/llama_index/core/indices/property_graph/retriever.py | Graph-based retrieval |
Recursive Retriever
The RecursiveRetriever enables multi-level, hierarchical retrieval across different data sources and node types. It supports recursive traversal of indices and can fetch related nodes across different retrieval strategies.
Key Features:
- Recursive node resolution across index hierarchies
- Support for multiple retriever types in a chain
- Handling of nested document structures
Source: llama-index-core/llama_index/core/retrievers/recursive_retriever.py
Property Graph Retriever
The Property Graph Retriever leverages knowledge graphs for retrieval, enabling structured queries over entity-relationship data. This retriever is particularly effective for complex queries requiring relationship-aware context.
Capabilities:
- Graph traversal-based retrieval
- Entity filtering and relationship queries
- Support for hybrid graph + vector search
Source: llama-index-core/llama_index/core/indices/property_graph/retriever.py:1-100
Reranking System
Purpose and Role
Reranking improves retrieval quality by reordering initially retrieved candidates using more sophisticated relevance models. After an initial retrieval pass identifies candidate nodes, rerankers evaluate and reorder these results to maximize relevance to the query.
LLM Reranker
The LLMRerank post-processor uses a Language Model to score and reorder retrieved nodes based on semantic relevance. This approach provides higher quality ranking compared to simple vector similarity.
Key Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
top_n | int | None | Number of top results to return after reranking |
choice_batch_size | int | 10 | Batch size for LLM ranking choices |
llm | BaseLLM | None | Language model for scoring |
verbose | bool | False | Enable verbose output |
Source: llama-index-core/llama_index/core/postprocessor/llm_rerank.py
Node Post-Processors
The NodePostprocessor class provides additional filtering and transformation capabilities for retrieved nodes. These processors operate on the node level and can apply various transformations before final output.
Common Post-Processing Operations:
- Duplicate removal
- Similarity threshold filtering
- Metadata-based filtering
Source: llama-index-core/llama_index/core/postprocessor/node.py
Data Flow
graph LR
A[User Query] --> B[Vector Search]
B --> C[Top-K Nodes]
C --> D[Post-Processors]
D --> E[LLM Reranker]
E --> F[Reranked Nodes]
F --> G[Context for LLM]
H[Documents] --> I[Indexing Pipeline]
I --> J[Embedding Model]
J --> K[Vector Store]
K --> BIntegration with Data Loaders
LlamaIndex's retrieval system integrates seamlessly with various data loaders that prepare documents for indexing and retrieval.
Supported Data Sources
| Reader | Use Case | Integration |
|---|---|---|
DoclingReader | PDF, DOCX, HTML | llama-index-readers-docling |
SimpleWebPageReader | Static websites | llama-index-readers-web |
RemoteDepthReader | Multi-level URL crawling | llama-index-readers-remote-depth |
WikipediaReader | Wikipedia articles | llama-index-readers-wikipedia |
SemanticScholarReader | Academic papers | llama-index-readers-semanticscholar |
Source: llama-index-integrations/readers/llama-index-readers-docling/README.md
Document Processing Pipeline
Documents loaded through readers undergo the following processing:
- Parsing - Extract text content from various formats (PDF, DOCX, HTML)
- Node Parsing - Split documents into semantic chunks (nodes)
- Embedding - Generate vector embeddings for each node
- Indexing - Store nodes and embeddings in appropriate stores
- Retrieval - Fetch relevant nodes based on queries
Usage Patterns
Basic Retrieval with Reranking
from llama_index.core import VectorStoreIndex
from llama_index.core.postprocessor import LLMRerank
# Load documents and create index
index = VectorStoreIndex.from_documents(documents)
# Configure reranking
reranker = LLMRerank(
top_n=5,
choice_batch_size=10
)
# Query with reranking
query_engine = index.as_query_engine(
node_postprocessors=[reranker]
)
response = query_engine.query("Your question here")
Recursive Retrieval
from llama_index.core.retrievers import RecursiveRetriever
# Configure recursive retrieval across multiple levels
recursive_retriever = RecursiveRetriever(
retriever_dict={
"root": vector_retriever,
"documents": document_retriever
}
)
Configuration Options
Retrieval Configuration
| Option | Description | Applies To |
|---|---|---|
similarity_top_k | Number of initial candidates | Vector retrieval |
retrieval_mode | Vector, keyword, or hybrid | Hybrid search |
node_postprocessors | List of post-processing steps | All retrievers |
Reranking Configuration
| Option | Description | Default |
|---|---|---|
top_n | Final number of results | 5 |
score_threshold | Minimum relevance score | None |
model | Reranking model | gpt-3.5-turbo |
Advanced Topics
Hybrid Retrieval with Reranking
Combining vector and keyword search with LLM reranking provides robust retrieval across diverse query types:
- Vector Search - Captures semantic similarity
- Keyword Search - Captures exact term matching
- LLM Reranking - Optimizes final ordering
Custom Retrievers
Developers can create custom retrievers by extending BaseRetriever:
from llama_index.core.retrievers import BaseRetriever
class CustomRetriever(BaseRetriever):
def _retrieve(self, query_bundle):
# Custom retrieval logic
pass
Summary
Retrieval and Reranking in LlamaIndex form a two-phase system where initial retrieval identifies candidate nodes and reranking optimizes their ordering. The architecture supports multiple retrieval strategies (vector, graph, recursive) and leverages LLM-based reranking for improved result quality. Integration with various data loaders enables seamless indexing from diverse sources, while the post-processor abstraction allows flexible pipeline customization.
Source: https://github.com/run-llama/llama_index / Human Manual
Agent Framework
Related topics: Memory Systems
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Memory Systems
Agent Framework
Overview
The LlamaIndex Agent Framework provides a flexible, extensible system for building AI agents that can reason, plan, and execute actions using tools. The framework enables the creation of both single-agent and multi-agent systems capable of interacting with external data sources, performing complex reasoning tasks, and orchestrating workflows.
Agents in LlamaIndex are designed to combine large language model (LLM) capabilities with structured tool usage, memory management, and workflow orchestration. The framework supports various agent types including ReAct (Reasoning + Acting) agents and workflow-based agents.
Sources: llama-index-core/llama_index/core/agent/workflow/base_agent.py:1-50
Architecture Overview
graph TD
A[User Query] --> B[Agent]
B --> C[Reasoning Engine]
C --> D[Tool System]
D --> E[External Tools]
C --> F[Memory]
B --> G[Workflow Orchestrator]
G --> H[Sub-Agents]
H --> DThe framework is built on several key components that work together to enable sophisticated agent behaviors:
| Component | Purpose |
|---|---|
| Agent | Core entity that processes queries and generates responses |
| Reasoning Engine | Handles thought processes and decision making |
| Tool System | Provides access to external functions and APIs |
| Memory | Stores conversation history and intermediate results |
| Workflow Orchestrator | Manages complex multi-step tasks |
Sources: llama-index-core/llama_index/core/agent/workflow/base_agent.py:50-100
ReAct Agent
The ReAct (Synergizing Reasoning and Acting) agent implements a reasoning loop that combines thought processes with tool actions. This agent type is particularly effective for tasks requiring logical deduction and external information retrieval.
ReAct Formatter
The ReAct formatter is responsible for constructing prompts that guide the agent through the reasoning-action-observation cycle. It defines the structure of thoughts, actions, and observations in the agent's prompt.
graph LR
A[Thought] --> B[Action]
B --> C[Observation]
C --> A#### Key Components
| Component | Description |
|---|---|
system_prompt | Instructions for the agent's role and behavior |
tool_prompt | Description of available tools |
formatter | Defines the format for thoughts, actions, observations |
examples | Few-shot examples for better performance |
Sources: llama-index-core/llama_index/core/agent/react/formatter.py:1-80
ReAct Output Parsing
The ReAct agent uses specialized output parsers to extract structured information from LLM responses:
class ReActOutputParser:
def parse(self, output: str) -> ActionOutput:
# Parse thought, action, and action input from output
pass
This parsing enables the agent to:
- Extract the reasoning thought process
- Identify the tool to invoke
- Extract the tool's input parameters
- Process the tool's output as an observation
Sources: llama-index-core/llama_index/core/agent/react/formatter.py:80-150
Workflow-Based Agents
Workflow-based agents provide a more structured approach to agent execution, using state machines and defined steps to process queries.
Base Agent
The BaseAgent class provides the foundation for all agent implementations in the workflow system:
graph TD
A[Input] --> B[State Machine]
B --> C{Step Execution}
C -->|Step 1| D[Process Step]
D --> E[Update State]
E --> C
C -->|Complete| F[Generate Response]#### Base Agent API
| Method | Description |
|---|---|
run() | Execute the agent with input |
reset() | Reset agent state |
get_state() | Retrieve current agent state |
set_state() | Set agent state |
Sources: llama-index-core/llama_index/core/agent/workflow/base_agent.py:100-200
Agent State Management
Agents maintain state throughout their execution, which includes:
| State Component | Type | Purpose |
|---|---|---|
input | str | Original user input |
current_step | int | Current execution step |
memory | Memory | Conversation history |
context | dict | Additional context data |
steps | List[Step] | Executed steps |
output | Any | Final output |
Sources: llama-index-core/llama_index/core/agent/workflow/base_agent.py:200-300
Tool System
The Tool System enables agents to interact with external resources and perform actions beyond text generation.
Function Tool
FunctionTool provides a decorator-based interface for creating tools from Python functions:
from llama_index.core.tools import FunctionTool
@FunctionTool.from_defaults
def search_database(query: str) -> str:
"""Search the knowledge base for relevant information."""
# Implementation here
return results
#### FunctionTool Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
fn | Callable | Required | The function to wrap |
name | str | Function name | Tool identifier |
description | str | Function docstring | Tool description for LLM |
fn_schema | BaseModel | Auto-generated | Input schema |
return_direct | bool | False | Return raw output |
Sources: llama-index-core/llama_index/core/tools/function_tool.py:1-100
Tool Execution Flow
sequenceDiagram
participant Agent
participant ToolRegistry
participant FunctionTool
participant External
Agent->>ToolRegistry: Request tool by name
ToolRegistry->>FunctionTool: Get tool instance
FunctionTool->>External: Execute function
External-->>FunctionTool: Return result
FunctionTool-->>Agent: Format responseCreating Custom Tools
Tools can be created using the @FunctionTool.from_defaults decorator:
@FunctionTool.from_defaults(name="calculator", description="Perform mathematical calculations")
def calculate(expression: str) -> str:
"""Evaluate a mathematical expression."""
return str(eval(expression))
Or programmatically:
from llama_index.core.tools import FunctionTool
def my_function(arg1: str, arg2: int) -> str:
return f"{arg1} repeated {arg2} times"
tool = FunctionTool.from_defaults(
fn=my_function,
name="my_tool",
description="Custom tool description"
)
Sources: llama-index-core/llama_index/core/tools/function_tool.py:100-200
Multi-Agent Workflows
Multi-agent systems enable complex task decomposition where different specialized agents collaborate to solve problems.
Multi-Agent Workflow Architecture
graph TD
A[Coordinator Agent] --> B[Specialist Agent 1]
A --> C[Specialist Agent 2]
A --> D[Specialist Agent N]
B --> E[Tool 1]
C --> F[Tool 2]
D --> G[Tool N]
B --> A
C --> A
D --> AWorkflow Communication
Agents communicate through a shared state and message-passing mechanism:
| Message Type | Direction | Purpose |
|---|---|---|
task | Coordinator → Specialist | Assign task |
result | Specialist → Coordinator | Return results |
query | Any → Any | Request information |
response | Any → Any | Provide information |
Sources: llama-index-core/llama_index/core/agent/workflow/multi_agent_workflow.py:1-100
Creating Multi-Agent Systems
from llama_index.core.agent.workflow import MultiAgentWorkflow
# Create specialized agents
research_agent = ReActAgent.from_tools(tools=[search_tool], name="researcher")
analysis_agent = ReActAgent.from_tools(tools=[analysis_tool], name="analyst")
# Create multi-agent workflow
workflow = MultiAgentWorkflow(agents=[research_agent, analysis_agent])
# Execute workflow
result = workflow.run(user_input="Analyze the latest research on AI")
Sources: llama-index-core/llama_index/core/agent/workflow/multi_agent_workflow.py:100-200
Tool Integration with LlamaIndex Readers
The Agent Framework integrates seamlessly with LlamaIndex's document readers, enabling agents to query and reason over loaded documents:
from llama_index.core import VectorStoreIndex
from llama_index.core.agent import ReActAgent
# Load documents
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
# Create query engine tool
query_tool = index.as_query_engine()
# Create agent with query tool
agent = ReActAgent.from_tools(tools=[query_tool])
response = agent.chat("What is the main topic of these documents?")
This integration allows agents to:
- Query vector databases
- Retrieve relevant context
- Synthesize information from multiple sources
- Perform RAG (Retrieval-Augmented Generation)
Best Practices
Designing Effective Tools
| Guideline | Rationale |
|---|---|
| Clear descriptions | Helps LLM understand when to use the tool |
| Structured outputs | Easier for agent to parse and use results |
| Error handling | Prevents agent crashes from tool failures |
| Idempotent operations | Enables safe retries |
Agent Configuration
| Parameter | Recommendation |
|---|---|
max_iterations | Set based on task complexity (default: 10) |
timeout | Allow sufficient time for tool execution |
memory_type | Use conversation memory for multi-turn interactions |
tool_retriever | Implement for large tool collections |
Debugging Agents
- Enable verbose mode to see agent's reasoning traces
- Log tool inputs/outputs to verify correct tool usage
- Test tools independently before combining with agent
- Monitor token usage to prevent excessive spending
See Also
Sources: [llama-index-core/llama_index/core/agent/workflow/base_agent.py:1-50]()
Memory Systems
Related topics: Agent Framework, Storage Systems
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Agent Framework, Storage Systems
Memory Systems
Memory Systems in LlamaIndex provide persistent conversation history management for chat engines and agents. They enable AI applications to maintain context across multiple interactions, store user preferences, and retrieve relevant historical information during conversations.
Architecture Overview
Memory Systems follow a modular architecture that allows different memory implementations to be composed and used interchangeably. The core memory system supports multiple storage strategies including buffer-based, summary-based, and vector-based retrieval.
graph TD
A[Chat Engine / Agent] --> B[Memory System]
B --> C[ChatMemoryBuffer]
B --> D[ChatSummaryMemoryBuffer]
B --> E[VectorMemory]
B --> F[Mem0Memory]
C --> G[SimpleComposableMemory]
D --> G
E --> G
F --> H[External Memory Services]
G --> I[Storage Backend]
H --> J[Mem0 Platform API]Core Memory Components
ChatMemoryBuffer
ChatMemoryBuffer is the foundational memory component that stores conversation history in a simple buffer structure. It maintains a list of chat messages and provides methods for adding, retrieving, and managing conversation context.
| Parameter | Type | Description |
|---|---|---|
chat_history | List[ChatMessage] | List of conversation messages |
size | int | Maximum number of messages to retain |
tokenizer | Callable | Function to count tokens |
Sources: llama-index-core/llama_index/core/memory/chat_memory_buffer.py
ChatSummaryMemoryBuffer
ChatSummaryMemoryBuffer extends the basic buffer with summarization capabilities. When the conversation exceeds the configured size, older messages are condensed into a summary rather than being discarded entirely.
| Parameter | Type | Description |
|---|---|---|
llm | LLM | LLM instance for generating summaries |
chat_history | List[ChatMessage] | Initial conversation history |
size | int | Maximum buffer size before summarization |
summary_exists | bool | Flag indicating if summary is generated |
Sources: llama-index-core/llama_index/core/memory/chat_summary_memory_buffer.py
VectorMemory
VectorMemory uses vector embeddings to store and retrieve conversation history. This enables semantic search within the conversation history, allowing the system to find relevant past messages based on meaning rather than exact matches.
| Parameter | Type | Description |
|---|---|---|
vector_store | VectorStore | Storage backend for embeddings |
embed_model | EmbeddingModel | Model for generating embeddings |
index | VectorStoreIndex | Index for efficient retrieval |
retriever | BaseRetriever | Retrieval mechanism |
Sources: llama-index-core/llama_index/core/memory/vector_memory.py
SimpleComposableMemory
SimpleComposableMemory provides a framework for combining multiple memory types into a unified interface. This allows different memory strategies to work together, leveraging the strengths of each approach.
| Feature | Description |
|---|---|
| Memory Composition | Combine buffer, summary, and vector memories |
| Unified Interface | Single API for all memory operations |
| Flexible Retrieval | Query multiple memory sources simultaneously |
Sources: llama-index-core/llama_index/core/memory/simple_composable_memory.py
Mem0 Memory Integration
The Mem0Memory integration provides access to the Mem0 Platform for advanced memory management. Mem0 offers enhanced capabilities for semantic memory storage, user preference tracking, and cross-session persistence.
Configuration Options
#### Client-Based Initialization
from llama_index.memory.mem0 import Mem0Memory
context = {"user_id": "user_1"}
memory = Mem0Memory.from_client(
context=context,
api_key="<your-mem0-api-key>",
search_msg_limit=4,
)
#### Config Dictionary Initialization
memory = Mem0Memory.from_config(
context=context,
config={
"provider": "openai",
"config": {"model": "text-embedding-3-small"},
"version": "v1.1",
},
search_msg_limit=4,
)
Context Parameters
The Mem0 context identifies the entity for which memory is stored:
| Parameter | Description |
|---|---|
user_id | Unique identifier for the user |
agent_id | Unique identifier for the agent |
run_id | Unique identifier for the conversation run |
Sources: llama-index-integrations/memory/llama-index-memory-mem0/README.md
Usage Patterns
Integration with SimpleChatEngine
from llama_index.core import SimpleChatEngine
from llama_index.memory.mem0 import Mem0Memory
memory = Mem0Memory.from_client(
context={"user_id": "user_1"},
api_key="<your-api-key>",
)
chat_engine = SimpleChatEngine.from_defaults(
llm=llm,
memory=memory
)
response = chat_engine.chat("Hi, My name is Mayank")
Integration with FunctionAgent
from llama_index.core.tools import FunctionTool
from llama_index.memory.mem0 import Mem0Memory
memory = Mem0Memory.from_client(
context={"user_id": "user_1"},
api_key="<your-api-key>",
)
# Use memory with agent for persistent context
agent = FunctionAgent(
llm=llm,
tools=[call_tool, email_tool],
memory=memory
)
Sources: llama-index-integrations/memory/llama-index-memory-mem0/README.md
Memory Workflow
sequenceDiagram
participant User
participant ChatEngine
participant Memory
participant Storage
User->>ChatEngine: Send message
ChatEngine->>Memory: Get context (search_msg_limit messages)
Memory->>Storage: Query recent messages
Storage-->>Memory: Return relevant messages
Memory-->>ChatEngine: Context messages
ChatEngine->>ChatEngine: Generate response
ChatEngine->>Memory: Store new message
Memory->>Storage: Persist message
ChatEngine-->>User: Return responseComparison of Memory Types
| Memory Type | Storage Method | Use Case | Scalability |
|---|---|---|---|
| ChatMemoryBuffer | List/Buffer | Short conversations | Limited by token size |
| ChatSummaryMemoryBuffer | Condensed summaries | Long conversations | Better for extended chats |
| VectorMemory | Embeddings | Semantic search | Scales with vector store |
| Mem0Memory | External API | Production applications | Cloud-native scaling |
Environment Configuration
For Mem0 integration, set the API key as an environment variable:
export MEM0_API_KEY="<your-mem0-api-key>"
For LLM integration within memory operations:
export OPENAI_API_KEY="<your-openai-api-key>"
Sources: llama-index-integrations/memory/llama-index-memory-mem0/README.md
Best Practices
- Choose Appropriate Memory Type: Select based on conversation length and retrieval needs
- Configure Token Limits: Set appropriate
search_msg_limitto balance context and performance - Use Context Parameters: Always provide user_id, agent_id, or run_id for proper memory isolation
- Consider Composability: Use
SimpleComposableMemoryfor complex memory requirements - Monitor API Costs: When using Mem0, track API usage for cost optimization
Source: https://github.com/run-llama/llama_index / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
The project should not be treated as fully validated until this signal is reviewed.
Users cannot judge support quality until recent activity, releases, and issue response are checked.
The project may affect permissions, credentials, data exposure, or host boundaries.
The project may affect permissions, credentials, data exposure, or host boundaries.
Doramagic Pitfall Log
Doramagic extracted 6 source-linked risk signals. Review them before installing or handing real data to the project.
1. Capability assumption: README/documentation is current enough for a first validation pass.
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: The project should not be treated as fully validated until this signal is reviewed.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: capability.assumptions | github_repo:560704231 | https://github.com/run-llama/llama_index | README/documentation is current enough for a first validation pass.
2. Maintenance risk: Maintainer activity is unknown
- Severity: medium
- Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | github_repo:560704231 | https://github.com/run-llama/llama_index | last_activity_observed missing
3. Security or permission risk: no_demo
- Severity: medium
- Finding: no_demo
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: downstream_validation.risk_items | github_repo:560704231 | https://github.com/run-llama/llama_index | no_demo; severity=medium
4. Security or permission risk: no_demo
- Severity: medium
- Finding: no_demo
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: risks.scoring_risks | github_repo:560704231 | https://github.com/run-llama/llama_index | no_demo; severity=medium
5. Maintenance risk: issue_or_pr_quality=unknown
- Severity: low
- Finding: issue_or_pr_quality=unknown。
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | github_repo:560704231 | https://github.com/run-llama/llama_index | issue_or_pr_quality=unknown
6. Maintenance risk: release_recency=unknown
- Severity: low
- Finding: release_recency=unknown。
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | github_repo:560704231 | https://github.com/run-llama/llama_index | release_recency=unknown
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using llama_index with real data or production workflows.
- [[Feature Request]: Built-in LLM Failover for Reliability](https://github.com/run-llama/llama_index/issues/19631) - github / github_issue
- [[Feature Request]: add (detailed) usage info to raw when using Structure](https://github.com/run-llama/llama_index/issues/19845) - github / github_issue
- [[Bug]:
thinking_deltanot populated on AgentStream events when thinkin](https://github.com/run-llama/llama_index/issues/20349) - github / github_issue - [[Bug]: [llama-index-core] async_acquire() in TokenBucketRateLimiter and](https://github.com/run-llama/llama_index/issues/21603) - github / github_issue
- [[Question]: how to add human-in-the-loop capability to ReActAgent?](https://github.com/run-llama/llama_index/issues/21599) - github / github_issue
- Proposal: Agent Threat Rules detection integration for LlamaIndex - github / github_issue
- Improve developer error message for unrecognized embedding names in `loa - github / github_issue
- [[Bug]: Bedrock Converse streaming produces string
tool_kwargsin `Tool](https://github.com/run-llama/llama_index/issues/21579) - github / github_issue - [[Bug]: Breaking Image/Index node fetching behavior after refactor](https://github.com/run-llama/llama_index/issues/19499) - github / github_issue
- [[Bug]: PydanticUserError: The
__modify_schema__method is not supporte](https://github.com/run-llama/llama_index/issues/16540) - github / github_issue - [[Bug]: gemini-embedding-2 task instructions not implemented (task_type d](https://github.com/run-llama/llama_index/issues/21535) - github / github_issue
- v0.14.21 - github / github_release
Source: Project Pack community evidence and pitfall evidence